I got tired of building the same link preview function, so I made it an API
Source: Dev.to

Every app I’ve built in the last few years has needed the same thing: paste a URL, show a preview card. Slack does it, Discord does it, every CMS does it. Each time I ended up writing the same Cheerio scraping code, handling the same edge cases with Open Graph tags, and debugging the same issue where Twitter Cards use name instead of property and half the internet gets it wrong.
A little while ago I finally extracted all of that into a standalone API and put it on RapidAPI. Figured other people are writing the same code too.
What it actually returns
You pass a URL. You get back structured metadata across six layers:
- Open Graph – title, description, images with dimensions, article metadata
- Twitter Cards – card type, site, creator; only tags that are actually present, no fallback guessing
- HTML meta – title tag, meta description, canonical, theme color
- Icons – auto‑selects the highest‑quality favicon with a priority chain
- Feeds – discovers RSS, Atom, and JSON Feed links
- JSON‑LD – parses all script blocks, prefers
Article/ProductoverBreadcrumbList
The response gives you both a merged top‑level view (where the title comes from whichever source has it; OG first, then Twitter, then the title tag) and the raw parsed layers, so you can apply your own logic.
{
"title": "GitHub · Build and ship software on a single, collaborative platform",
"description": "Join the world's most widely adopted...",
"image": {
"url": "https://github.githubassets.com/images/modules/site/social-cards/campaign-social.png",
"width": 1200,
"height": 630
},
"favicon": "https://github.githubassets.com/favicons/favicon.svg",
"siteName": "GitHub",
"type": "website",
"themeColor": "#1e2327",
"openGraph": { ... },
"twitter": { ... },
"feeds": [],
"jsonLd": { "@type": "WebSite", ... },
"responseTime": 234
}The parts that were annoying to get right
OG tags use
property, Twitter usesname.
The Open Graph spec says; Twitter Cards say. Many sites swap them, so the parser checks both attributes for both prefixes.Multiple
og:imagetags are valid.
The OG spec supports arrays by repeating the tag. Structured properties likeog:image:widthapply to the most recently declaredog:image. Most scrapers just grab the first one and ignore the rest.JSON‑LD blocks are a mess.
A typical news article page can contain several JSON‑LD blocks (e.g., aBreadcrumbList, anOrganization, and the actualArticle). You need to parse all of them and pick the right one.Favicons have a priority order.
Apple touch icons at 180×180 are usually the highest quality, then standard icons at 32×32, then the generic/favicon.icofallback. Most implementations just grab the first “ they find.Relative URLs everywhere.
OG images and feed links are often relative paths. You need the effective URL (after redirects) as the base to resolve them correctly.
The technical approach
The service runs on a Fastify server hosted on a VPS, using Cheerio for HTML parsing. No headless browser or Puppeteer—just fetch the HTML and parse it, keeping response times under 500 ms for cache misses and under 5 ms for cache hits.
SSRF protection was the most time‑consuming part. Because the API accepts arbitrary URLs, it first resolves the hostname, checks the resulting IP against a blocklist of private ranges, and then connects directly to the resolved IP to prevent DNS rebinding attacks.
I also built a Text Analytics API on the same infrastructure: pass in text and get back readability scores (Flesch‑Kincaid, Coleman‑Liau, SMOG, etc.), keyword density, bigrams, trigrams, and estimated reading time. All pure‑math on strings, sub‑10 ms responses—useful for content‑optimization tools and writing assistants.
Try them
Both APIs are on RapidAPI with a free tier (500 requests/month):
- LinkPreview – URL metadata extraction
- TextAnalytics – readability scores, keyword density, text metrics
The free tier is enough to test and prototype. If you encounter edge cases the parser doesn’t handle well, I’d genuinely like to know—parsing the wild HTML of the internet is a forever project.