How do you use this tool?
- Drag an HTML file in — or paste HTML source directly into the paste box
- Optional: review sanitize options (strip scripts, keep external images?)
- Click 'Convert' and download the Markdown or grab it via the copy button
Why convert HTML to Markdown?
HTML is the source-code format of the web — and very often the data form in which you find knowledge: an email newsletter that’s only searchable as HTML source; a note backup from Evernote or OneNote; a scraped article; a CMS post export. Markdown, by contrast, is the format in which you maintain knowledge long-term — diffable, plain-text, readable in any editor, natively understood by Obsidian and Logseq.
This tool builds the bridge. Drop in an HTML file or HTML source code, the tool parses the DOM structure, sanitizes scripts and inline handlers out, and writes GitHub Flavored Markdown with headings, lists, tables and inline formats preserved.
How does the conversion work technically?
HTML is parsed into a DOM tree via the browser’s native HTML parser —
the same mechanism every web page uses internally, but in sandboxed mode
without script execution. A sanitize pass strips <script> tags,
<style> blocks, inline event handlers, and <iframe> embeds before
further processing. That makes it safe to convert even scraped or
foreign HTML files.
A proven open-source HTML-to-Markdown library translates the DOM tree
into Markdown: headings (<h1> → #, <h2> → ##), paragraphs (<p>
→ empty line between blocks), lists (<ul> → -, <ol> → 1.),
tables (<table> → GFM pipe tables), inline formats (<strong> → **,
<em> → *, <code> → `).
What are typical use cases?
- Archive email newsletters. Newsletters in HTML format come into your Obsidian vault as Markdown.
- Prepare web content for AI prompts. A long article becomes Markdown that fits inside your Claude or GPT prompt.
- Note migration from Evernote / OneNote. HTML note exports land as
clean
.mdfiles in your new note system. - Blog migration to Hugo / Astro. Existing HTML posts become Markdown posts with frontmatter that static-site generators understand.
- Wiki content from Confluence exports. HTML exports from Confluence/SharePoint become Markdown pages for Notion alternatives.
What stays — what doesn’t?
Preserved: heading hierarchies (<h1> through <h6>), paragraphs,
ordered and unordered lists with nesting, tables without merges as GFM
pipe tables, inline formats (bold, italic, inline code, strikethrough),
hyperlinks with anchor text, block quotes, code blocks with language
hint, images as Markdown references (original src is kept).
Deliberately stripped: scripts, <style> blocks, inline event
handlers, external iframes (sanitize pass). That’s non-negotiable —
especially with scraped or clipped HTML, it protects against hidden
script payloads.
Not 1:1 convertible: tables with colspan/rowspan (hint block,
because Markdown pipe tables don’t support cell merging), CSS layout
tricks, JavaScript-generated content (the sanitize pass runs before
script execution — JS content is not rendered).
How does the tool keep my HTML private?
When you’re parsing an HTML export from a private note app, the last thing you want is the content going to a foreign server. Newsletter archives can also carry confidential tracking IDs or personal greeting fields embedded in the HTML.
None of that happens here. The HTML is parsed and converted to Markdown
inside the browser tab via web standards (native HTML parser, WebAssembly).
Open the Network panel of your developer tools and watch: no request,
no upload, no server communication. The paste box runs entirely
client-side as well.
Which related converters exist?
This tool is part of the Markdown converter family:
- DOCX to Markdown — Word documents with heading structure and lists.
- PDF to Markdown — including scanned PDFs via OCR.
- CSV to Markdown — also TSV with delimiter auto-detection.
- XLSX to Markdown — Excel including XLS and ODS, multi-sheet support.
Last updated: