Question 1

Is my video uploaded to a server?

Accepted Answer

No. Both steps run 100% in your browser: ffmpeg (WebAssembly) extracts the audio and Whisper transcribes it on your device. Your media never leaves your computer. The only network request is a one-time download of the open-source model weights from a public CDN.

Question 2

Which languages and formats are supported?

Accepted Answer

Whisper is multilingual and handles 90+ languages, including Korean, English, Japanese, Chinese, Spanish and more, with auto-detection. You can export SRT, WebVTT, or a plain-text transcript, and optionally translate non-English speech to English subtitles.

Question 3

Which model should I choose?

Accepted Answer

Small is the recommended default and the practical minimum for good Korean and other CJK languages. Tiny is fastest and lightest but less accurate; Turbo (large-v3-turbo) is the most accurate but downloads several hundred megabytes and runs best with WebGPU. All models are downloaded once and cached.

Question 4

Why is the first run slow?

Accepted Answer

The first time you use a model, its weights download once (tens of MB for Tiny/Small, more for Turbo) and are then cached for next time. Transcription itself is much faster with WebGPU-capable browsers; without a GPU it falls back to the CPU and long videos can take a while.

Question 5

Are the captions accurate enough to publish?

Accepted Answer

Auto-generated captions are a strong first draft but not perfect — they can mishear names or add stray text on music or silence. That is why every line is editable here: review and fix the transcript before you download it, especially for accessibility.

Question 6

Is there a file size limit?

Accepted Answer

Everything runs in your browser's memory, so very large or very long files can be slow or run out of memory. Files over about 500 MB show a warning and files over 2 GB are blocked. For long recordings, a shorter clip or a smaller model helps.

Video to Subtitles (Speech to Text)

What is Video to Subtitles (Speech to Text)?

How to use Video to Subtitles (Speech to Text)

Examples

Caption a Korean interview as an SRT file

Make WebVTT captions for a web video

Translate a Japanese lecture into English subtitles

Frequently asked questions

Related tools

Image Format Converter

Excel/CSV to JSON

EXIF Viewer & Metadata Remover

Favicon Generator