Markdown pages, are they a good solution?
Source: Dev.to
Introduction
I already wrote a post about a HTML to markdown converter solution. In that post I suggested that the functionality on the application level would be a better solution. In the last week, projects like Laravel markdown response and Symfony markdown response bundle appeared, and I expect similar solutions for other web frameworks.
I consider those solutions to be partial fixes because they lack tools to trim or augment page content for an LLM to act on. If you want to provide LLM‑friendly content, a backend solution is preferable to a frontend one; fully rendered HTML belongs to the frontend.
The elephant in the room is that webpages are a human construct. An AI scraper doesn’t need to follow human navigation—page links are useless if it can grep the content.
How did we get here?
With search‑engine bots, the goal was to discover all pages of a website, index them, and rank them. The purpose of AI bots, however, is to scrape content from websites to use as additional knowledge for an LLM.
While scraping was part of search‑engine bots, the content was not the main objective. Search‑engine bots also represent a minor portion of traffic and serve a marketing purpose by exposing the site to a larger audience.
AI bots are becoming a substantial part of traffic, yet they haven’t demonstrated clear marketing value or other benefits. It seems logical that the first reaction was to block AI traffic. When people discovered that food wrappers contained less food, or that food companies lowered product quality once sales plateaued, they were unhappy—but they kept buying the product. I think we are at a similar point with AI: websites allow AI scrapers because it could be beneficial.
What is the solution?
If you want to provide data for an LLM, consider separating the LLM‑focused site from the human‑focused site.
- LLM website – can be nothing more than a collection of linked markdown files.
- Search layer – returns data that an LLM or agent can use, providing specific information not found on the LLM site.
RESTful or GraphQL endpoints are not ideal because their output isn’t LLM‑specific.
Benefits
- Static markdown pages – can be hosted on edge servers, scaling efficiently with regional traffic spikes.
- Search with paywall – allows you to monetize AI scrapers, offering searchable or extra content for a fee.
- Human‑centric traffic – HTML page traffic will become more human again once AI scraper users adopt these options.