How do you use this tool?
- Drop a CSV file into the upload area or pick one — files up to 50 MB are processed directly
- Encoding and delimiter are detected automatically and shown — override either via dropdown if needed
- Toggle the cleanup options and review the before/after preview
- Pick an output format — CSV with BOM (Excel-friendly), CSV without BOM or Excel workbook (.xlsx) — and download
What does this tool actually do?
The tool accepts a CSV file and runs four steps that almost every German spreadsheet export needs:
- Detect the encoding. From the leading bytes the tool figures out whether the file is UTF-8, Latin-1 or Windows-1252. A leading byte-order mark (BOM) is honoured; otherwise a heuristic uses diagnostic bytes (€, smart quotes, en/em dashes) to disambiguate Windows-1252 from Latin-1.
- Detect the delimiter. From the first ten rows the most consistent column separator is determined — comma, semicolon, tab or pipe. Commas inside quoted fields are excluded so embedded text does not skew the count.
- Clean the data. Blank rows are removed, duplicate column headers receive counter suffixes (
_2,_3), and German number formats like1.234,56are converted to the canonical1234.56— the latter optional, with a conservative heuristic that leaves version strings like1.234untouched. - Build the output. Three formats are available: CSV with UTF-8 BOM (opens cleanly in Excel by double-click), CSV without BOM (for Pandas, SQL, R) and a true Excel workbook (.xlsx) with proper number typing.
All steps run fully in the browser tab. The code does not load remote scripts at runtime, sends no telemetry and stores nothing in browser storage.
How does encoding detection work technically?
A CSV file carries no metadata about its character set. Whoever opens it has to guess — and a wrong guess about UTF-8 is exactly why „Müller” turns into „MÃ``¼ller”.
Detection runs in three stages:
1. BOM probe. A file beginning with the bytes EF BB BF is unambiguously UTF-8 with a byte-order mark — no further analysis needed.
2. Strict UTF-8 validation. The decoder tries to interpret the entire byte stream as UTF-8. If it fails (an invalid multi-byte sequence appears), the file cannot be UTF-8. If it succeeds, UTF-8 is assumed — pure ASCII files always succeed because ASCII is a subset of UTF-8.
3. CP1252-vs-Latin-1 heuristic. When UTF-8 validation fails, the tool inspects the byte range 0x80–0x9F. That is exactly where the two codepages differ: Latin-1 leaves these bytes empty, while Windows-1252 maps them to the Euro sign, smart quotes, em and en dashes. If diagnostic bytes appear, CP1252 wins — otherwise Latin-1.
These three stages cover over 99 % of European spreadsheet exports without loading any external library.
Delimiter detection — when does Excel rely on semicolons?
European Excel versions export CSV with semicolons by default because the comma is reserved as the decimal separator in Germany, Austria, Switzerland and most Romance-language countries. US Excel exports with commas. Opening a US CSV in DE Excel (or vice versa) yields one mega-column because Excel expects the wrong delimiter.
The detection compares the frequency of four candidates in the first ten rows — comma, semicolon, tab, pipe. The score weighs:
- Per-row median. A delimiter showing up three times in every line is more likely than one appearing seven times in some rows and not at all in others.
- Consistency. How many of the sample rows contain the delimiter at all? A winner has to appear in most rows.
- Quote-awareness. Commas inside
"…, …"do not count — they are content, not separators.
When candidates tie, the comma wins as the RFC-4180 default. Manual override is always available via the dropdown.
Which CSV problems hit European data most often?
These five problem classes hit data analysts and accountants almost daily — and the tool is built for exactly them:
Problem 1: umlauts turn into mojibake. Symptom: „Größe” becomes „GrÃ``¶``Ã``Ÿe”. Cause: the file is encoded in Latin-1 or CP1252, the reader interprets it as UTF-8. Fix: auto encoding detection switches to the correct decoder, and the tool emits the file as clean UTF-8.
Problem 2: all columns end up in one cell. Symptom: opening the file in Excel puts the whole row into column A. Cause: the CSV uses commas, the Excel locale expects semicolons (or vice versa). Fix: delimiter detection finds the actual delimiter regardless of locale, and the output delimiter can be switched to the target tool’s expectation.
Problem 3: Power BI / Pandas / SQL refuse to recognise numbers. Symptom: amounts like „1.234,56” import as text instead of numbers, aggregations break. Cause: tools outside the DACH region only understand a dot as decimal point. Fix: the number-normalise option rewrites cells into 1234.56 safely — version strings and IDs are left alone.
Problem 4: duplicate column names. Symptom: Pandas reads the file but silently drops the second „Date” column. Cause: many DataFrame libraries forbid duplicate headers. Fix: duplicates get a _2/_3 suffix, empty headers become column_N. Guaranteed unique column names.
Problem 5: trailing blank rows from Excel exports. Symptom: stats tools throw on empty rows, Pandas creates NaN rows. Cause: Excel often appends a blank row at the end or between sections. Fix: fully empty rows are removed without losing any cell content.
Why does privacy matter for CSV cleanup?
Competing CSV-cleanup services — Convertio, OnlineConvertFree, CSVtoTable, Browserling and similar web converters — upload the file to a server for processing. Most of them disclose this in their terms of use; some retain the file „up to two hours for processing”, others longer.
For CSV data that is a bigger risk than for images: a spreadsheet often contains real names, addresses, accounting entries, bank details or employee IDs. GDPR-compliant server-side processing of such data requires a data-processing agreement — which most free-tier services do not offer.
This tool makes server upload structurally impossible: processing runs exclusively inside the browser tab, served from static hosting. There is no backend endpoint that could accept file content. Even the optional Excel output is assembled entirely in the browser — no server calls.
Which CSV formats are supported?
Accepted inputs:
- Standard CSV with comma, semicolon, tab or pipe as delimiter
- TSV (tab-separated values,
.tsv/.tab) - Plaintext tables (
.txt) with a recognisable column delimiter - UTF-8 (with or without BOM), Latin-1 (ISO-8859-1), Windows-1252
- Quoted fields per RFC-4180 with doubled quotes as escape
- Any line ending (
\n,\r\n,\r)
Available outputs:
- CSV with UTF-8 BOM — opens cleanly in Excel by double-click
- CSV without BOM — fits Pandas, R, SQL importers, Linux tooling
- Excel workbook (
.xlsx) — numbers typed as numeric cells, headers bold
Deliberately out of scope:
- ZIP/GZIP-compressed CSVs — decompress first
- Fixed-width tables without a delimiter — special case, dedicated pipeline needed
- Multi-sheet workbooks — a CSV is one sheet by definition
What do users ask about CSV cleanup?
The most common questions about usage and privacy:
Why does my CSV file show mojibake instead of umlauts?
The file was saved with one character encoding while the program opening it expects another. German Excel and ERP exports often use Windows-1252 or Latin-1 instead of UTF-8 — a UTF-8 reader then interprets each umlaut byte as two characters. The tool detects the original encoding and converts it to UTF-8.
How does the tool decide whether my CSV uses comma or semicolon?
The tool counts how often each candidate delimiter appears outside of quoted fields in the first ten rows. The character with the most consistent per-row count wins. Comma, semicolon, tab and pipe are all candidates. The auto-detection can be overridden via the dropdown.
What does the „convert German numbers” option do?
German spreadsheets write thousands with a dot and decimals with a comma — „1.234,56”. Pandas, R and SQL expect a dot as the decimal point — „1234.56”. This option rewrites every cell that strictly matches the German pattern. Version strings like „1.234” stay unchanged.
Is my CSV uploaded to a server?
No. The entire detection and conversion runs inside your browser tab. The file is never uploaded, stored or analysed.
Which related tools complete the workflow?
Other utilities in the data and document cluster:
- JSON to CSV — export JSON arrays as a CSV table with dot-notation flattening for nested fields.
- CSV to Markdown — convert CSV tables into Markdown pipe tables, ideal for GitHub READMEs and documentation.
- File Hash Checker — compute SHA-256/512/BLAKE3 hashes and verify against sidecar files, fully in the browser.
Last updated: