What file sizes does the tool support?

Up to 50 MB per file are processed directly — that covers typical spreadsheet exports with hundreds of thousands of rows. Larger files usually need to be split anyway because Excel itself caps at 1,048,576 rows per sheet.

How are duplicate column names handled?

Duplicate headers receive a counter suffix — “Date, Date, Date” becomes “Date, Date_2, Date_3”. Empty header cells are renamed to “column_N”. This guarantees the file loads cleanly into Pandas, R or SQL without any column being silently overwritten.

CSV Cleaner — fix encoding, delimiters & decimal commas

What does this tool actually do?

The tool accepts a CSV file and runs four steps that almost every German spreadsheet export needs:

Detect the encoding. From the leading bytes the tool figures out whether the file is UTF-8, Latin-1 or Windows-1252. A leading byte-order mark (BOM) is honoured; otherwise a heuristic uses diagnostic bytes (€, smart quotes, en/em dashes) to disambiguate Windows-1252 from Latin-1.
Detect the delimiter. From the first ten rows the most consistent column separator is determined — comma, semicolon, tab or pipe. Commas inside quoted fields are excluded so embedded text does not skew the count.
Clean the data. Blank rows are removed, duplicate column headers receive counter suffixes (_2, _3), and German number formats like 1.234,56 are converted to the canonical 1234.56 — the latter optional, with a conservative heuristic that leaves version strings like 1.234 untouched.
Build the output. Three formats are available: CSV with UTF-8 BOM (opens cleanly in Excel by double-click), CSV without BOM (for Pandas, SQL, R) and a true Excel workbook (.xlsx) with proper number typing.

All steps run fully in the browser tab. The code does not load remote scripts at runtime, sends no telemetry and stores nothing in browser storage.

How does encoding detection work technically?

A CSV file carries no metadata about its character set. Whoever opens it has to guess — and a wrong guess about UTF-8 is exactly why „Müller” turns into „MÃ``¼ller”.

Detection runs in three stages:

1. BOM probe. A file beginning with the bytes EF BB BF is unambiguously UTF-8 with a byte-order mark — no further analysis needed.

2. Strict UTF-8 validation. The decoder tries to interpret the entire byte stream as UTF-8. If it fails (an invalid multi-byte sequence appears), the file cannot be UTF-8. If it succeeds, UTF-8 is assumed — pure ASCII files always succeed because ASCII is a subset of UTF-8.

3. CP1252-vs-Latin-1 heuristic. When UTF-8 validation fails, the tool inspects the byte range 0x80–0x9F. That is exactly where the two codepages differ: Latin-1 leaves these bytes empty, while Windows-1252 maps them to the Euro sign, smart quotes, em and en dashes. If diagnostic bytes appear, CP1252 wins — otherwise Latin-1.

These three stages cover over 99 % of European spreadsheet exports without loading any external library.

Delimiter detection — when does Excel rely on semicolons?

European Excel versions export CSV with semicolons by default because the comma is reserved as the decimal separator in Germany, Austria, Switzerland and most Romance-language countries. US Excel exports with commas. Opening a US CSV in DE Excel (or vice versa) yields one mega-column because Excel expects the wrong delimiter.

The detection compares the frequency of four candidates in the first ten rows — comma, semicolon, tab, pipe. The score weighs:

Per-row median. A delimiter showing up three times in every line is more likely than one appearing seven times in some rows and not at all in others.
Consistency. How many of the sample rows contain the delimiter at all? A winner has to appear in most rows.
Quote-awareness. Commas inside "…, …" do not count — they are content, not separators.

When candidates tie, the comma wins as the RFC-4180 default. Manual override is always available via the dropdown.

Which CSV problems hit European data most often?

These five problem classes hit data analysts and accountants almost daily — and the tool is built for exactly them:

Problem 1: umlauts turn into mojibake. Symptom: „Größe” becomes „GrÃ``¶``Ã``Ÿe”. Cause: the file is encoded in Latin-1 or CP1252, the reader interprets it as UTF-8. Fix: auto encoding detection switches to the correct decoder, and the tool emits the file as clean UTF-8.

Problem 2: all columns end up in one cell. Symptom: opening the file in Excel puts the whole row into column A. Cause: the CSV uses commas, the Excel locale expects semicolons (or vice versa). Fix: delimiter detection finds the actual delimiter regardless of locale, and the output delimiter can be switched to the target tool’s expectation.

Problem 3: Power BI / Pandas / SQL refuse to recognise numbers. Symptom: amounts like „1.234,56” import as text instead of numbers, aggregations break. Cause: tools outside the DACH region only understand a dot as decimal point. Fix: the number-normalise option rewrites cells into 1234.56 safely — version strings and IDs are left alone.

Problem 4: duplicate column names. Symptom: Pandas reads the file but silently drops the second „Date” column. Cause: many DataFrame libraries forbid duplicate headers. Fix: duplicates get a _2/_3 suffix, empty headers become column_N. Guaranteed unique column names.

Problem 5: trailing blank rows from Excel exports. Symptom: stats tools throw on empty rows, Pandas creates NaN rows. Cause: Excel often appends a blank row at the end or between sections. Fix: fully empty rows are removed without losing any cell content.

Why does privacy matter for CSV cleanup?

Competing CSV-cleanup services — Convertio, OnlineConvertFree, CSVtoTable, Browserling and similar web converters — upload the file to a server for processing. Most of them disclose this in their terms of use; some retain the file „up to two hours for processing”, others longer.

For CSV data that is a bigger risk than for images: a spreadsheet often contains real names, addresses, accounting entries, bank details or employee IDs. GDPR-compliant server-side processing of such data requires a data-processing agreement — which most free-tier services do not offer.

This tool makes server upload structurally impossible: processing runs exclusively inside the browser tab, served from static hosting. There is no backend endpoint that could accept file content. Even the optional Excel output is assembled entirely in the browser — no server calls.

Which CSV formats are supported?

Accepted inputs:

Standard CSV with comma, semicolon, tab or pipe as delimiter
TSV (tab-separated values, .tsv/.tab)
Plaintext tables (.txt) with a recognisable column delimiter
UTF-8 (with or without BOM), Latin-1 (ISO-8859-1), Windows-1252
Quoted fields per RFC-4180 with doubled quotes as escape
Any line ending (\n, \r\n, \r)

Available outputs:

CSV with UTF-8 BOM — opens cleanly in Excel by double-click
CSV without BOM — fits Pandas, R, SQL importers, Linux tooling
Excel workbook (.xlsx) — numbers typed as numeric cells, headers bold

Deliberately out of scope:

ZIP/GZIP-compressed CSVs — decompress first
Fixed-width tables without a delimiter — special case, dedicated pipeline needed
Multi-sheet workbooks — a CSV is one sheet by definition

What do users ask about CSV cleanup?

The most common questions about usage and privacy:

Why does my CSV file show mojibake instead of umlauts?

The file was saved with one character encoding while the program opening it expects another. German Excel and ERP exports often use Windows-1252 or Latin-1 instead of UTF-8 — a UTF-8 reader then interprets each umlaut byte as two characters. The tool detects the original encoding and converts it to UTF-8.

How does the tool decide whether my CSV uses comma or semicolon?

The tool counts how often each candidate delimiter appears outside of quoted fields in the first ten rows. The character with the most consistent per-row count wins. Comma, semicolon, tab and pipe are all candidates. The auto-detection can be overridden via the dropdown.

What does the „convert German numbers” option do?

German spreadsheets write thousands with a dot and decimals with a comma — „1.234,56”. Pandas, R and SQL expect a dot as the decimal point — „1234.56”. This option rewrites every cell that strictly matches the German pattern. Version strings like „1.234” stay unchanged.

Is my CSV uploaded to a server?

No. The entire detection and conversion runs inside your browser tab. The file is never uploaded, stored or analysed.

Other utilities in the data and document cluster:

JSON to CSV — export JSON arrays as a CSV table with dot-notation flattening for nested fields.
CSV to Markdown — convert CSV tables into Markdown pipe tables, ideal for GitHub READMEs and documentation.
File Hash Checker — compute SHA-256/512/BLAKE3 hashes and verify against sidecar files, fully in the browser.

CSV import cleaner

How It Works

Load the file

Verify auto-detection

Clean & download

Privacy

How do you use this tool?

What does this tool actually do?

How does encoding detection work technically?

Delimiter detection — when does Excel rely on semicolons?

Which CSV problems hit European data most often?

Why does privacy matter for CSV cleanup?

Which CSV formats are supported?

What do users ask about CSV cleanup?

Why does my CSV file show mojibake instead of umlauts?

How does the tool decide whether my CSV uses comma or semicolon?

What does the „convert German numbers” option do?

Is my CSV uploaded to a server?

How It Works

Load the file

Verify auto-detection

Clean &amp; download

Privacy

What does this tool actually do?

How does encoding detection work technically?

Delimiter detection — when does Excel rely on semicolons?

Which CSV problems hit European data most often?

Why does privacy matter for CSV cleanup?

Which CSV formats are supported?

What do users ask about CSV cleanup?

Why does my CSV file show mojibake instead of umlauts?

How does the tool decide whether my CSV uses comma or semicolon?

What does the „convert German numbers” option do?

Is my CSV uploaded to a server?

Which related tools complete the workflow?

Clean & download