Does my text get sent to a server with the browser speech engine?

Maybe. Chrome and Edge route certain system voices through Google's or Microsoft's cloud for synthesis. Firefox and Safari use purely local system voices and send nothing. The exact behaviour depends on your browser and operating system. If you need full privacy guarantees, pick the offline AI engine — that one is verifiably local after the one-time model download.

How large is the offline AI model?

About 92 MB. It downloads once into your browser cache and stays there. After that the offline engine works without any internet connection — including on flights, trains, or air-gapped machines.

Which languages does the browser speech engine cover?

All languages your operating system has installed. Windows, macOS, iOS and Android typically ship English, French, German, Spanish, Italian, Portuguese, Mandarin, Japanese and several more. The voice dropdown lists every voice your system exposes.

Why does the offline AI model only support English?

The current open-source AI voice models that fit in a browser (~100 MB) only cover English in American and British dialects. As soon as a comparable freely-licensed German, French, Spanish or other-language model becomes available, we'll add it here. For non-English text, the browser speech engine is the practical option.

How long can the text be?

Up to 50,000 characters per run. Long text is automatically split at sentence boundaries and synthesized chunk-by-chunk — you won't notice anything in the audio player. For genuinely book-length material, split into chapters and run them separately.

Yes. Write `[pause 500ms]` anywhere in the text — the tool inserts a 500-millisecond silence at that point. Values between 50 and 5,000 milliseconds work. Useful for audiobook drafts, tutorial voice-overs, or measured presentation recordings where you want a deliberate breath.

Which audio formats can I download?

MP3 (default, ID3-watermarked per EU AI Act Article 50) or uncompressed WAV. MP3 covers most use cases and is much smaller. Download is available only for the offline AI engine — browser speech can't be reliably captured to a file by browser-specs.

Is it free for commercial use?

Yes. The tool is open-source and free. Generated audio is yours to use — privately or commercially. With the offline AI engine, note that EU AI Act Article 50 §4 requires you to disclose AI-generated speech when publishing it; the ID3 watermark in the MP3 fulfils this machine-readable disclosure obligation.

Text to Speech — Read Aloud in Browser, Free

What is text-to-speech and why local in the browser?

Text-to-speech (speech synthesis) turns written sentences into spoken audio. Common uses run from audiobooks and read-aloud aids for vision-impaired readers to voice-overs for tutorial videos, audio newsletters, and proofreading long drafts by ear.

Most cloud services work by uploading your text, synthesizing it server-side, then returning an audio file. For long or confidential text that’s a problem — content gets logged, commercially analysed, and pulls you under GDPR data-processing agreements the moment you involve a third party.

This tool flips that: synthesis runs inside your browser. Two engines are available behind a single switch, and the switch is transparent about what data leaves your device.

Two engines — which fits when?

The hybrid design is the core decision. Both engines have trade-offs:

Engine	Availability	Privacy	Voice quality	Model download
Fast & online	Every language your OS has	Possible cloud round-trip	Depends on platform	0 MB
Private & offline (AI)	English only	Fully local after download	Very natural	~92 MB once

Browser speech is universally available — Windows, macOS, iOS, Android and Linux all ship system voices. Quality varies sharply by platform: Apple devices have very natural voices, Windows is solid, Android varies wildly by vendor.

The offline AI model delivers the most natural pronunciation and runs completely without internet after the initial download. The trade-off: it only covers American and British English. For other languages, browser speech remains the practical path.

When is browser speech the right choice?

Browser-based speech is best for:

Quick proofreading. Have your blog draft or essay read back — clunky sentence structure jumps out instantly when you hear it.
Vision accessibility. Ad-hoc reading of long texts without installing extra software.
Language learning. Hear correct pronunciation of English (or any other) sentences across different system voices.
First-pass voice-over drafts. Get a feel for how a planned voice-over recording will sound before booking studio time.

Watch the privacy notice above the engine switch: with Chrome and Edge, certain system voices route text to Google or Microsoft for cloud synthesis. Firefox and Safari use only local system voices. If you need verifiable privacy, use the offline AI engine (English) or a browser that uses only local voices.

How do the pause markers work?

When recording audiobooks, tutorial voice-overs or presentation tracks you usually need deliberate breaths. Embed them in the text directly:

Today we'll talk about privacy. [pause 800ms] It concerns everyone.

The marker [pause 500ms] inserts 500 milliseconds of silence at that point. Allowed range: 50 to 5,000 milliseconds. Out-of-range values are clamped to the nearest bound — that prevents accidental multi-second gaps from typos.

On browser speech the pauses are realised as two separate utterances with a setTimeout between them. On the offline AI engine the pauses are inserted as actual silence samples in the audio stream.

What is the EU AI Act Article 50 watermark?

Starting August 2026, Article 50 of the EU AI Act requires AI-generated audio, video and text to be disclosed as such. For speech synthesis that means: if you publish AI-generated voice (podcast, ad, audiobook), you must disclose it.

This tool fulfils the obligation in two layers:

Visible in the UI — above the audio player you’ll see an “AI-generated” badge.
Machine-readable in the file — when you download MP3, the tool embeds an ID3 tag in the subtitle frame recording the engine and voice. Platforms can read this automatically and surface their own disclosure UI.

You must not remove this marking when publishing the audio — Article 50 §4 mandates it for AI-generated speech content.

Which tips improve the result?

Punctuate clearly. Periods, commas, colons and dashes drive natural prosody. Sloppy punctuation produces monotone reading.
Expand abbreviations. “e.g.” may be read as “ee gee” rather than “for example”. Spell out the full form in the read-aloud text.
Watch numbers. “1,500” may be parsed as “one comma five hundred”. For long numbers spell them as words when the engine garbles them.
Quotation marks are tricky. Some voices read the character literally. For quotations, mark them textually — e.g. “quote … unquote”.
Tune the rate. Audiobook narration sounds more natural at 0.9×; explainer voice-overs benefit from 1.1×. Test three speeds.

When does the offline AI engine pay off?

If you need very natural English narration and might even publish the audio, the offline AI engine is meaningfully more convincing than browser speech. The output resembles a human recording; individual voices carry distinct character.

Practical use cases:

English voice-overs for tutorials. No studio, no voice talent — drop the script in, pick a voice, download the MP3, import into your video editor.

Audiobook drafts for self-publishers. Before committing to a real recording, validate flow and pronunciation with the AI voice. Saves expensive studio hours on script revisions.

Language learning for non-native English speakers. Hear English texts (vocabulary lists, exercise sentences, lecture transcripts) read aloud in consistently natural English voices.

Accessible publishing. Generate MP3 versions of English blog posts for blind readers — fully GDPR-compliant because neither the text nor the audio ever leaves your device.

From the kittokit ecosystem for the full voice workflow:

Audio Transcription — The opposite direction: speech to text, multi-language (German, English, French, Spanish).
Fast English Transcription — When you need English audio transcribed quickly. Dedicated model, up to 6× faster than the multi-language one.
Speech Enhancer — For your own recordings: remove noise, echo and background hum locally.

Text to Speech — Read Aloud in Browser

How It Works

Paste your text

Pick engine, voice and rate

Listen or download

Privacy

How do you use this tool?

What is text-to-speech and why local in the browser?

Two engines — which fits when?

When is browser speech the right choice?

How do the pause markers work?

What is the EU AI Act Article 50 watermark?

Which tips improve the result?

When does the offline AI engine pay off?

How It Works

Paste your text

Pick engine, voice and rate

Listen or download

Privacy

What is text-to-speech and why local in the browser?

Two engines — which fits when?

When is browser speech the right choice?

How do the pause markers work?

What is the EU AI Act Article 50 watermark?

Which tips improve the result?

When does the offline AI engine pay off?

Which related tools help next?