Record or drop in an audio file and get an editable transcript — generated entirely on your own device by on-browser Whisper AI. Add timestamps, label speakers, and export ready-to-use SRT/VTT subtitles. Nothing is uploaded, and there's no per-minute cost.
⚙ detecting…🔒 no upload💸 no API cost🎬 SRT / VTT export📴 offline after load
or
Press Record or upload a file to begin. The AI model downloads once on first use.
0 words
🔎 Find & replace:
✨ EchoScribe Pro — turn transcripts into deliverables
The free tool transcribes and exports plain text. Pro adds the production features content creators and note-takers ask for:
Want to try it first? Use demo code AV-ECHOSCRIBE-PRO-DEMO to preview every Pro feature on this device.
PRO UNLOCKED ✓
🎬 Export subtitles & documents
Exports use the timestamps captured during transcription. Edit any segment below first if you like.
🕒 Timestamp editor & speaker labels
No segments yet — record or upload audio above.
🗂️ Batch transcribe
Add multiple audio files and transcribe them one after another, all on your device.
Add multiple audio files, then Transcribe all. Open any result to edit/export it.
How private on-device transcription works
1
Capture audio
Record from your mic or upload a file. It's read straight into memory in your browser.
2
AI runs locally
OpenAI's Whisper model runs on your device's own processor (WebGPU or WebAssembly) — no server.
3
Edit & export
Get an editable transcript with timestamps, then copy it or export TXT, SRT, VTT, Markdown or JSON.
Why a browser-based transcriber is different
Most transcription services upload your audio to the cloud and charge per minute — which means your private recordings sit on someone else's server and your bill grows with every file. EchoScribe takes the opposite approach: the Whisper speech-recognition model downloads once (about 40 MB for the tiny model), caches in your browser, and from then on every transcription runs on your own processor. That makes it free, completely private, and able to work with no internet connection at all. It's a clear demonstration of how capable modern browsers have become — real AI, no cloud bill, no data leaving your device.
Make subtitles and captions without an editor
Because EchoScribe records a start and end time for every segment, it can write industry-standard caption files directly. Export SRT for most video editors and social platforms, or WebVTT for HTML5 video and the web. Need a script instead? Export a timestamped TXT or Markdown document, or pull structured JSON (with per-segment start, end, speaker and text) into your own tooling. The built-in timestamp editor lets you correct any wording and assign speaker labels before you export, so captions are clean on the first pass — no separate subtitle app required.
Who EchoScribe is for
Podcasters and YouTubers turning episodes into show notes and captions; journalists and researchers transcribing interviews without sending sources to the cloud; students capturing lectures; support and sales teams writing up calls; and anyone who wants a fast, private way to get spoken words into editable text. Switch the model to Base for higher accuracy, or to the multilingual model to transcribe other languages and optionally translate non-English speech into English. With Pro's batch queue you can drop a folder's worth of recordings in at once and walk away while they process locally.
No. The Whisper model runs entirely in your browser. Your audio never leaves your device and there's no account. The only network use is a one-time download of the model itself.
How is it free with no API cost?
The AI runs on your own device, not a cloud server, so there's no per-minute fee. The model downloads once (~40 MB), caches, then works offline.
Can I make subtitles (SRT or VTT) from audio?
Yes. EchoScribe captures a timestamp for every segment, so Pro can export ready-to-use SRT or WebVTT caption files — plus timestamped TXT, Markdown and JSON. Plain transcription and TXT download are always free.
Can it transcribe several files at once?
Yes — Pro adds a batch queue. Add multiple audio files and EchoScribe transcribes them one after another on your device, then lets you open any result to edit and export, or download all transcripts combined.
How accurate is it, and can it handle other languages?
It uses OpenAI's Whisper (tiny English by default). It handles clear speech well; accuracy depends on audio quality and accent. Switch to the Base model for more accuracy, or the multilingual model to transcribe other languages — and optionally translate non-English speech to English.
Does it work offline?
Yes — after the first load. Once the model has cached, transcription needs no internet at all.
Note: first use downloads the AI model (about 40–145 MB depending on the model you pick), which can take a moment on slower connections. Best performance is on a desktop/laptop with a modern browser; very old devices may be slow or unsupported. Speaker labels are assigned by you as you review (EchoScribe does not auto-identify speakers). Transcripts are AI-generated and may contain errors — review before relying on them.