Is my PDF uploaded to a server?

No. Parsing and Markdown generation run entirely in your browser tab. There is no server endpoint, no telemetry, no upload. You can verify this in the Network panel of your browser's developer tools.

What happens with tables?

Simple tables with clear row and column structure are emitted as GFM pipe tables. Complex tables with merged cells, nested rows or floating captions are flagged with a hint block (`⚠ table detected — manual review recommended`). We don't fabricate pipe structure when the source doesn't support it.

Does this work with scanned PDFs?

Yes. If the tool finds no text layer, it switches automatically into OCR mode and reads the text from the page image via a WebAssembly OCR model. The first page may take a few seconds because the model is cached on first use — after that everything runs offline.

How are math and formulas handled?

Math blocks are detected and flagged with a hint block (`⚠ formula region detected`). Reliable 1:1 LaTeX reconstruction isn't realistic in the browser today — we mark honestly instead of inventing wrong code that you'd have to fix anyway.

What happens with images embedded in the PDF?

Embedded images are referenced (`![Image N](image-N.png)`) and dropped into the ZIP as separate files. If you only need the text, you can disable image extraction — then the output is pure Markdown paragraphs.

What PDF sizes are realistic?

Up to 50 MB per file and 50 files per batch. Larger PDFs aren't actively blocked, but the browser's RAM is the limit — long scans with OCR can hit memory limits on older devices.

Are encrypted PDFs supported?

Encrypted PDFs are detected and rejected with a clear error — we don't try to bypass passwords. If you have the password, unlock the file with the [PDF Password Remover](/en/pdf-password-remover) first and convert afterwards.

Are annotations and form fields included?

Not in this version. Annotation layers and AcroForm fields are out of scope — they live below the documented text extraction. If you need that content, let us know and we'll consider it for Phase 2.

PDF to Markdown — convert locally in your browser

Why convert PDF to Markdown?

Markdown has become the lingua franca for AI workflows, wikis and personal knowledge systems. Obsidian, Logseq, Hugo, Astro content collections, Claude Code files and almost every RAG index expects Markdown — not PDFs. Anyone who needs to feed a stack of contracts, papers or whitepapers into a knowledge base hits the same wall: PDFs are designed for humans, not machines.

This tool makes the reverse trip practical. From a PDF you get a clean .md file with detectable structure: headings as #-headers, lists as bullet points, paragraphs as paragraphs. Anything that can’t be reliably converted — complex tables, mathematical formulas, multi-column layouts with marginalia — is flagged honestly instead of being half-fabricated.

How does the conversion work technically?

If the PDF has an embedded text layer, an established open-source PDF library extracts the text along with position and font size. A layout heuristic groups text blocks into paragraphs, infers heading levels from font size and position, and recognises bullet markers (•, -, numeral + period) as lists. The result is a GitHub Flavored Markdown document that renders natively in Obsidian, VS Code and any standard Markdown pipeline.

Scanned PDFs have no text layer — the pages are images. Here the tool switches into OCR mode: a proven WebAssembly OCR model reads the text out of the image, with language packs for English, German and other European languages. The model is cached in the browser on first use (~12 MB); after that the tool keeps working without an internet connection.

What is the tool actually used for?

Filling an Obsidian vault. A pile of academic papers becomes Markdown files where you can set links and backlinks.
Claude Code or coding wiki seed. Architecture PDFs become Markdown that lives next to the code files.
RAG index preparation. Markdown chunks much more cleanly than PDF — splitters work along heading boundaries.
Logseq block import. Markdown headings become Logseq blocks.
Hugo / Astro content migration. Existing PDF documentation becomes static-site content.

What survives the conversion — and what doesn’t?

Preserved: headings (with detectable hierarchy), paragraphs, lists (ordered and unordered), inline formatting like bold and italic, links with anchor text, simple tables, images as referenced files.

Flagged with a hint block instead of 1:1 conversion: complex tables with merged cells, mathematical formulas, multi-column layouts with cross-references, footnote linking. The hint block makes clear where the conversion sees its limit — you decide how to clean up.

Not in this version: annotations, form-field data, embedded files, OCG layers. These live structurally below text extraction and need separate handling — Phase 2 will catch up once the MVP runs stably.

How does the tool keep my PDF private?

Many free PDF-to-Markdown services upload the file to a server, convert there, and send the result back. The business model often piggybacks on that, because the server sees the content — even when it claims to delete after 24 hours. For confidential contracts, medical records or internal strategy PDFs that’s rarely acceptable.

None of that happens here. The PDF is parsed in your browser tab, the OCR model runs as a WebAssembly module inside the same tab, the Markdown is assembled in memory and offered as a download. Open the Network panel of your developer tools and watch: not a single byte of your PDF leaves your machine.

This tool is part of the Markdown converter family — a set of browser-only converters that prepare office formats for AI and wiki workflows:

DOCX to Markdown — Word documents straight to Markdown with heading structure and lists preserved.
XLSX to Markdown — Excel and ODS sheets as GFM pipe tables, multi-sheet support included.
HTML to Markdown — web pages or HTML snippets, file or paste mode.
Remove Metadata — strip EXIF, GPS and XMP fields from images and PDFs locally in the browser.

PDF to Markdown

How It Works

Pick a PDF

Check the mode

Download Markdown

Privacy

How do you use this tool?

Why convert PDF to Markdown?

How does the conversion work technically?

What is the tool actually used for?

What survives the conversion — and what doesn’t?

How does the tool keep my PDF private?

How It Works

Pick a PDF

Check the mode

Download Markdown

Privacy

Why convert PDF to Markdown?

How does the conversion work technically?

What is the tool actually used for?

What survives the conversion — and what doesn’t?

How does the tool keep my PDF private?

Which related converters exist?