Defuddle Turns Any Web Page Into Clean Markdown From the Terminal
Source: Dev.to
Overview
Defuddle extracts the main content from a web page and returns it as clean HTML or Markdown, stripping away sidebars, comments, cookie notices, related posts, and other extraneous elements that a CMS might inject.
Usage
CLI
No installation is required; you can run Defuddle directly with npx:
npx defuddle parse https://example.com/article --markdownThe command prints clean Markdown to stdout, which you can pipe wherever you need:
npx defuddle parse https://example.com/article --markdown > output.mdIf you also want the article’s metadata, add the --json flag:
npx defuddle parse https://example.com/article --jsonThe JSON output includes the title, author, description, domain, publication date, word count, and more, making it easy to generate front‑matter automatically.
Node API
Defuddle can be used programmatically in a Node environment:
import { JSDOM } from 'jsdom';
import { Defuddle } from 'defuddle/node';
const dom = await JSDOM.fromURL('https://example.com/article');
const result = await Defuddle(dom, 'https://example.com/article', { markdown: true });
console.log(result.title);
console.log(result.content);The result object provides the same set of fields as the CLI JSON output: author, content, description, domain, published, wordCount, etc.
Metadata
When you request JSON output, Defuddle returns the following metadata fields:
- title
- author
- description
- domain
- published (publication date)
- wordCount
- Additional fields as needed for front‑matter generation.
Implementation Details
Defuddle builds on Mozilla’s Readability library—the same engine that powers Firefox’s Reader Mode. It adds a more forgiving extraction strategy, removing less content when the structure is uncertain, and standardises the HTML before conversion. This results in consistent handling of footnotes, code blocks, and math elements, which is especially important when feeding the output into a Markdown converter.
Status
Defuddle is an actively developed work‑in‑progress. It was created by kepano (the person behind Obsidian) as the extraction layer for the Obsidian Web Clipper. If you use the Web Clipper, you’re already benefiting from Defuddle downstream.
Playground
Try the interactive playground before building anything on top of the library.