I tried Cloudflare’s “Markdown for Agents” idea in NGINX (Rust module) — early prototype
Source: Dev.to
Introduction
Cloudflare recently shipped Markdown for Agents: if a client sends Accept: text/markdown, Cloudflare can fetch your HTML and return a Markdown variant.
Inspired by this, I built a self‑hostable NGINX dynamic module that does something similar on your own infrastructure. This is a very early starter prototype, mainly meant to help people try the workflow and share feedback.
Repository:
How it works
| Client | Request header | Result |
|---|---|---|
| Browser | Accept: text/html | Original HTML |
| Agent | Accept: text/markdown | NGINX converts upstream HTML to Markdown and returns text/markdown |
- No application changes are required; the conversion sits entirely at the reverse‑proxy layer.
- Best for: documentation, blogs, news, knowledge‑base pages.
- Not suitable for: APIs, streaming responses, or authenticated pages (unless you handle caching carefully).
Agents and LLM tools often fetch full HTML and waste tokens on navigation, footers, cookie banners, layout markup, scripts, and noisy attributes. A Markdown variant can make downstream parsing cheaper and more predictable.
Installation
# Install the module
curl -sSL https://raw.githubusercontent.com/cnkang/nginx-markdown-for-agents/main/tools/install.sh | sudo bash
# Test and reload NGINX
sudo nginx -t && sudo nginx -s reload
Note: Dynamic modules must match your exact NGINX patch version (
nginx -v). If a matching build isn’t available, you may need to compile the module yourself.
Verifying the Markdown Variant
# Request Markdown
curl -sD - -o /dev/null -H "Accept: text/markdown" http://localhost:8080/ | grep -iE 'content-type|vary'
# Expected output:
# content-type: text/markdown; charset=utf-8
# vary: Accept
Verifying the HTML Variant
curl -sD - -o /dev/null -H "Accept: text/html" http://localhost:8080/ | grep -i 'content-type'
Sample Request
curl -s -H "Accept: text/markdown" http://localhost:8080/ | head -40
Configuration
Start small—enable the filter on a single route first.
load_module modules/ngx_http_markdown_filter_module.so;
http {
markdown_filter off;
server {
listen 8080;
location /docs/ {
markdown_filter on;
# Recommended: avoid upstream compression for clean conversion
proxy_set_header Accept-Encoding "";
proxy_pass http://backend;
}
}
}
Fail‑open (recommended for trials)
If conversion fails, the original HTML is returned:
markdown_on_error pass;
Limiting Work
markdown_max_size 10m;
markdown_timeout 5s;
Metrics Endpoint (localhost only)
location /markdown-metrics {
markdown_metrics;
}
Cache Considerations
If you cache at NGINX or a CDN, ensure variants are split by the Accept header:
proxy_cache_key "$scheme$request_method$host$request_uri$http_accept";
Caveats
- Edge cases will exist (weird HTML, giant pages, odd encodings).
- The module focuses on HTML → Markdown only (no PDFs or arbitrary binaries).
- Caching needs care (variant keys + auth‑aware behavior).
If you encounter a broken page, a very slow page, or a caching issue, please open an issue with:
- A sample URL (or anonymized HTML)
- Output of
nginx -v - Whether the upstream is compressed
- Any cache/CDN in front
References
- Cloudflare inspiration – Blog:
- Cloudflare docs:
- Project repository: