0%

Local STT Solutions

Translation note: This English version was translated by Codex (GPT-5) on 2026-04-20 18:01:46 CST. The source text is the corresponding Chinese post in this repository.

Date: 2026-02-07

I had a batch of local video files that needed transcription. I had previously used CapCut’s subtitle recognition, but for heavier batch use, better privacy, and lower cost, I started looking for open-source STT solutions.

After discussing the options with Gemini and Kimi, I chose OpenAI’s Whisper-large-v3-turbo model for a laptop with an RTX 4060 and an i9 CPU.

Hugging Face link: https://huggingface.co/openai/whisper-large-v3-turbo

Whisper local transcription speed test

I then found an open-source GUI wrapper called Buzz.

Project link: https://github.com/chidiwilliams/buzz

After downloading the Windows executable and the corresponding bin files, the tool was ready to use. The first launch takes a while because the model needs to be downloaded.

Buzz configuration panel

The speed and transcription quality were both good, and it also supports Cantonese recognition and live recording.

Subtitle export result

That was enough to get “STT freedom.”