How do I remove silence from an audio file?

Drop your audio file into the upload zone above (or click to pick one), set the threshold in dBFS, check the waveform to see which regions count as silence, and export the trimmed stream as WAV. Everything runs locally in your browser — the file is never uploaded. The threshold sets the loudness below which a region counts as silence: spoken voice typically uses around -40 dBFS, music with ambience -50 dBFS or quieter.

What is dBFS and how do I pick the right threshold?

dBFS stands for decibels relative to full scale — how loud a signal is relative to the digital maximum. 0 dBFS is the ceiling (clipping point); -60 dBFS is very quiet. For clean studio speech pick -38 to -42 dBFS. For speech with light ambience use -45 to -50 dBFS. For speech with strong background noise drop to -50 to -55 dBFS. If the tool cuts too much, lower the threshold (toward -50). If it misses silence, raise it (toward -30). See the [Wikipedia article on dBFS](https://en.wikipedia.org/wiki/DBFS) for the underlying signal-level convention.

What does the minimum silence length do?

It prevents short breath pauses or hesitation gaps from being treated as cuts. With the 500-millisecond default, only quiet stretches of at least half a second qualify — shorter pauses (breathing, brief mid-sentence stalls) survive the cut. Drop to 200-300 ms for aggressive trimming of voice memos, raise to 800-1500 ms for a softer cut that only removes long gaps between topics. Podcasts usually sound best at 500-700 ms; voice notes often work well at 200-400 ms.

What is the maximum file size?

200 megabytes. That comfortably covers typical podcast episodes (60-90 minutes at 192 kbps MP3). Beyond that, browser memory becomes the bottleneck because waveform rendering and RMS analysis both keep the mono PCM data in RAM. For multi-hour live recordings or 24-bit high-resolution material a native desktop tool (Audacity, Reaper) is the better fit. Within the 200 MB cap the entire silence trim stays in the browser and requires no account.

Is audio quality preserved on export?

WAV export re-encodes the audio as 16-bit PCM — no additional lossy compression is added. Voice and podcast material is acoustically indistinguishable from the source waveform after a roundtrip. If your source was already an MP3, the lossy artefacts it picked up in the original encode do not come back — but the silence-trim itself adds no further lossy step. For truly bit-identical edits inside a codec container, you need codec-specific software (e.g. mp3DirectCut).

Why is there a padding (lead/tail) slider?

Padding keeps a short audio buffer before each speech segment starts and after it ends, so the cut does not land hard on the first or last consonant. 80 ms is a good speech default — roughly the length of a soft breath. For very tight cuts, drop to 50 ms; for a more generous editorial feel, raise to 200-300 ms. Without padding, auto-cut transitions often sound mechanical, particularly around plosives (P, B, T, K, D) at the start or end of a word.

Can I tweak the cut after running it?

Yes — the cut is applied only when you click Export trim. While you drag the sliders (threshold, minimum silence, padding) the result is a live preview on the waveform: silent regions are tinted, the result card updates in real time. You can change the values as often as you like until the trim looks right. Only the export button creates the final WAV file and starts the download.

Is the tool free and private?

Yes — free, no sign-up, no tracking. Your audio is decoded locally in your browser through the native Web Audio API, the waveform is drawn on a Canvas, and the RMS analysis runs in plain JavaScript with no server round-trip. There is no upload, no account, no hidden data cap. The exported file is also assembled client-side — the download link points at a Blob URL, not at a server endpoint.

Audio Silence Remover — auto-cut for podcasts & voice memos

What does the audio silence remover do?

Three jobs in one tool: detect silent regions in an audio file, mark them on the waveform, and download the trimmed track as WAV. Drop your MP3, WAV, M4A, OGG or FLAC, and the tool decodes it locally in the browser through the native Web Audio API, downmixes to mono (mean of all channels), computes an RMS amplitude for every 20-millisecond window, and compares each window value against the threshold you set in dBFS. Silent windows merge into regions, and regions shorter than the minimum silence length are intentionally not cut — so breath pauses and brief mid-sentence stalls survive.

The output: a waveform with silent regions tinted, a result card with original length, new length, and time saved in seconds, and an Export button that assembles a ready WAV file locally. No account, no server, no hidden quota counter.

Why dBFS rather than a 0-to-100 slider?

dBFS — decibels relative to full scale — is the standard professional scale in digital audio. 0 dBFS is the loudest a digital format can represent (anything louder is clipped and adds distortion); -6 dBFS is half-volume; -20 dBFS is the typical voice level in professional recordings; -40 dBFS is far quieter than whispered speech; -60 dBFS sits at the noise floor of a good microphone.

A 0-100 scale would feel simpler at first, but it introduces three problems. First, the tool still has to compute everything in dBFS internally because RMS values are logarithmically distributed. Second, the threshold would not match values exposed by other audio software — Audacity, Adobe Audition, and Reaper all work in dBFS. Third, the slider resolution would have to be coarser in the loud range and finer in the quiet range to feel useful, which is hard to design.

Working ranges for most use cases:

Clean studio voice: -38 to -42 dBFS
Voice with light ambience: -45 to -50 dBFS
Voice with strong background noise: -50 to -55 dBFS
Music with pianissimo passages: -55 to -60 dBFS

If the tool cuts too much, lower the threshold (toward -50). If it misses silence, raise it (toward -30).

How does the RMS analysis work?

RMS stands for root mean square — the square root of the mean of the squared samples in a window. It correlates well with perceived loudness (unlike peak detection, which over-weights short clicks) and is the standard measure in audio forensics, speech-codec design, and loudness normalisation (EBU R128).

The tool splits the mono track into non-overlapping 20-millisecond windows. At 48 kHz that is 960 samples per window; at 44.1 kHz exactly 882. Per window it computes:

RMS = sqrt( (s_0² + s_1² + ... + s_n²) / n )

Each RMS value is then compared with the threshold — directly in linear amplitude (the dBFS-to-linear conversion happens once at the start of the analysis). Below the threshold the window is flagged silent; above, it is flagged loud. The sequence of flags is grouped into runs, and runs shorter than the minimum silence length are dropped.

This approach has two advantages over peak detection. First, it does not react to single transient peaks — a mouse click in the background does not flip the detection. Second, it roughly matches what the human ear perceives as loudness, which is why Audacity, Reaper, and most podcast tools converge on the same primitive.

How is the minimum silence length tuned?

Defaults are speech-tuned. 500 ms is the threshold above which a silent stretch counts as a real pause. Shorter silences (breath, consonant joins, mid-sentence pauses) survive — otherwise the result sounds robotic and chopped.

Tuning rules of thumb:

200-300 ms: aggressive cut; good for voice memos where every pause should go
400-600 ms: natural podcast feel; keeps breath pauses
800-1500 ms: gentle cut; only removes long stalls (e.g. a topic switch)

Voice-over for picture commonly uses 200 ms. Conversational podcasts with two hosts work well at 500-700 ms. Interviews with long thinking pauses before answers sit at 1000-1500 ms.

What does the padding (lead/tail) slider do?

Padding extends every speech segment by that many milliseconds on each side before the cut is taken. The effect: cuts do not land hard on the first or last consonant of a sentence, but on a quieter audio fragment — usually room tone or a soft breath.

Without padding, auto-cut transitions often feel mechanical, especially around plosives (P, B, T, K, D) at the start or end of a word. With 80 ms of padding the transition softens; with 200 ms a clear room-tone buffer sits between consecutive segments.

If two speech segments would overlap after padding, the tool merges them into a single run. That prevents padding from paradoxically re-introducing silent stretches.

When is auto-silence-cut useful — and when is it not?

Good fit:

Voice memos and dictations: strip out long thinking pauses, keep the content tight
Single-speaker podcasts: rough pass before manual fine-cut
Voice-over recordings: quickly drop re-takes and false starts
Speech-practice or language-learning recordings: trim the teacher-student silence
Voice notes destined for automatic transcription: saves time downstream

Poor fit:

Music: RMS-based silence detection will mistakenly cut quiet passages (pianissimo, reverb tails)
Drama and audio plays: dramatic pauses are part of the performance
Live recordings with audience presence: the “breathing pauses” between sentences carry the room
Multi-track sessions that must stay sample-aligned: silence trimming shifts your timing anchors
High-end studio work: here a manual cut in a DAW with monitor headphones makes the difference

For the first five use clusters this is an hours-saving tool. For the last five a desktop DAW such as Audacity (open source) or Reaper is the better choice.

How does this tool differ from other silence trimmers?

Most online silence trimmers send your file to a server. Even when the server “stores nothing,” the audio still passes through someone else’s memory for a moment. For voice memos, therapy notes, confidential interviews, or schoolwork recordings, that is a concrete privacy concern.

Three structural differences here:

Pure-client. The Web Audio API decodes locally, Canvas paints the waveform, RMS analysis runs on the main thread (fast enough for files under 200 MB), and the WAV export is assembled in the browser. No server round-trip.
No account wall. Larger auto-cut services typically hide a free tier behind sign-up with a minute cap. Here there is no cap other than the 200 MB file size and your browser RAM.
Mobile-first and refined minimalism. The waveform responds to touch, sliders use 44 × 44 px touch targets, the type stack is Inter with JetBrains Mono for numerics — and the tool works without any cookie banner because it sets no cookies.

These three points are not “nice to haves” — they are the structural differentiation against established services whose business model rests on email capture and subscription funnels.

How accurate is the silence detection?

Accuracy depends on two factors: the threshold in dBFS, and the quality of the recording. For clean studio voice (RMS speech level around -20 dBFS) with threshold -40 dBFS, detection accuracy and cut placement sit within a 20-millisecond window resolution — about 50 samples per second of the detection curve. That is finer than a human listener can perceive at the cut boundary.

For louder background noise (HVAC, traffic, computer fans) you need a threshold below the noise floor — typically -50 to -55 dBFS. In that range very quiet consonants (s, f, sh) are sometimes mis-classified as silence. The padding slider catches some of those mis-cuts, but for critical recordings it pays to run Audacity noise reduction first and then perform the silence cut at -40 dBFS.

For podcast editing the accuracy is almost always sufficient, provided the padding is set generously (at least 80 ms, ideally 150-200 ms). For forensic audio work or evidentiary transcripts this is the wrong tool — that domain needs dedicated speech-forensics software.

What happens after the export?

The exported WAV file lands in your browser’s default download folder. It is a 16-bit PCM mono track — compatible with every editing application (Audacity, Reaper, Adobe Audition, Logic Pro, Pro Tools), every media player, and most transcription tools.

If you want to keep the original format (MP3, M4A, OGG, FLAC), pick the Original export. That bypasses the cut entirely and downloads the source bytes unchanged — useful when you only inspected the file to confirm no silence was present.

For follow-up transcription, the sibling tool Audio Transcription (browser-side speech recognition) takes WAV directly. For further trimming and region selection, the sibling Audio Trimmer accepts the same files. Both pipelines share the same audio container stack and can chain directly off the exported WAV.

Is there a loudness-normalise mode?

Deliberately not in this version. Silence trimming and loudness normalisation are two separate audio decisions and benefit from being two separate tools. Proper loudness normalisation (EBU R128, -23 LUFS target) needs an actual loudness measurement, not just RMS. That is being developed as its own sibling tool.

If you need normalisation before the silence cut, run it in your DAW: Audacity’s “Loudness Normalisation” effect hits the EBU R128 standard, and ffmpeg-loudnorm is the command-line option. Both are deliberate steps; combining them into a single magic button would obscure that the correct order — silence first, normalise second — is part of a good audio workflow.

Audio silence remover — auto-cut for podcasts and voice memos

How It Works

Paste text or code

Instant processing

Copy result

Privacy

How do you use this tool?