How can transcription be free with no API cost?

Because the AI runs on your own device's processor, not on a cloud server. There is no per-minute API fee like cloud transcription services charge. The model downloads once (about 40 MB), is cached, and then works offline.

🎙️ EchoScribe More free tools →

Audio to text & subtitles, 100% in your browser

Record or drop in an audio file and get an editable transcript — generated entirely on your own device by on-browser Whisper AI. Add timestamps, label speakers, and export ready-to-use SRT/VTT subtitles. Nothing is uploaded, and there's no per-minute cost.

⚙ detecting…🔒 no upload💸 no API cost🎬 SRT / VTT export📴 offline after load

or ⬆ Upload audio
mp3 · wav · m4a · webm

Model:

Press Record or upload a file to begin. The AI model downloads once on first use.

0 words

🔎 Find & replace: match case

✨ EchoScribe Pro — turn transcripts into deliverables

The free tool transcribes and exports plain text. Pro adds the production features content creators and note-takers ask for:

Subtitle & document export: SRT, VTT (WebVTT), timestamped TXT, Markdown and JSON.
Timestamp editor & speaker labels: review every segment, fix wording, and assign speakers — it all flows into your exports.
Batch processing: queue several recordings and transcribe them back-to-back, then export each or combine.

Get Pro →

Want to try it first? Use demo code AV-ECHOSCRIBE-PRO-DEMO to preview every Pro feature on this device.

How private on-device transcription works

Capture audio

Record from your mic or upload a file. It's read straight into memory in your browser.

AI runs locally

OpenAI's Whisper model runs on your device's own processor (WebGPU or WebAssembly) — no server.

Edit & export

Get an editable transcript with timestamps, then copy it or export TXT, SRT, VTT, Markdown or JSON.

Why a browser-based transcriber is different

Most transcription services upload your audio to the cloud and charge per minute — which means your private recordings sit on someone else's server and your bill grows with every file. EchoScribe takes the opposite approach: the Whisper speech-recognition model downloads once (about 40 MB for the tiny model), caches in your browser, and from then on every transcription runs on your own processor. That makes it free, completely private, and able to work with no internet connection at all. It's a clear demonstration of how capable modern browsers have become — real AI, no cloud bill, no data leaving your device.

Make subtitles and captions without an editor

Because EchoScribe records a start and end time for every segment, it can write industry-standard caption files directly. Export SRT for most video editors and social platforms, or WebVTT for HTML5 video and the web. Need a script instead? Export a timestamped TXT or Markdown document, or pull structured JSON (with per-segment start, end, speaker and text) into your own tooling. The built-in timestamp editor lets you correct any wording and assign speaker labels before you export, so captions are clean on the first pass — no separate subtitle app required.

Who EchoScribe is for

Podcasters and YouTubers turning episodes into show notes and captions; journalists and researchers transcribing interviews without sending sources to the cloud; students capturing lectures; support and sales teams writing up calls; and anyone who wants a fast, private way to get spoken words into editable text. Switch the model to Base for higher accuracy, or to the multilingual model to transcribe other languages and optionally translate non-English speech into English. With Pro's batch queue you can drop a folder's worth of recordings in at once and walk away while they process locally.

Keep going with these free tools

Browse the AppVitamins store → — own EchoScribe Pro and other apps as a one-time purchase, or get the All-Access pass.

Frequently asked questions

Is my audio uploaded anywhere?

No. The Whisper model runs entirely in your browser. Your audio never leaves your device and there's no account. The only network use is a one-time download of the model itself.

How is it free with no API cost?

The AI runs on your own device, not a cloud server, so there's no per-minute fee. The model downloads once (~40 MB), caches, then works offline.

Can I make subtitles (SRT or VTT) from audio?

Yes. EchoScribe captures a timestamp for every segment, so Pro can export ready-to-use SRT or WebVTT caption files — plus timestamped TXT, Markdown and JSON. Plain transcription and TXT download are always free.

Can it transcribe several files at once?

Yes — Pro adds a batch queue. Add multiple audio files and EchoScribe transcribes them one after another on your device, then lets you open any result to edit and export, or download all transcripts combined.

How accurate is it, and can it handle other languages?

It uses OpenAI's Whisper (tiny English by default). It handles clear speech well; accuracy depends on audio quality and accent. Switch to the Base model for more accuracy, or the multilingual model to transcribe other languages — and optionally translate non-English speech to English.

Does it work offline?

Yes — after the first load. Once the model has cached, transcription needs no internet at all.

Note: first use downloads the AI model (about 40–145 MB depending on the model you pick), which can take a moment on slower connections. Best performance is on a desktop/laptop with a modern browser; very old devices may be slow or unsupported. Speaker labels are assigned by you as you review (EchoScribe does not auto-identify speakers). Transcripts are AI-generated and may contain errors — review before relying on them.