I built a Screenshot & Metadata API that extracts 50+ fields from any URL
Source: Dev.to
What it does
- URL to Screenshot – capture any webpage as PNG, JPEG, or WebP
- URL to PDF – generate PDFs with custom format, margins, orientation
- Metadata Extraction – 50+ fields from any URL (see below)
- HTML to Image – render custom HTML/CSS to PNG/JPEG/WebP
The Metadata Endpoint
A single GET request extracts:
Basic SEO
- title, description, keywords, author, language, charset, viewport, robots, canonical URL, generator
Open Graph
- og:title, og:description, og:image (+ dimensions), og:url, og:type, og:site_name, og:locale
Twitter Card
Icons & Theme
- favicon, apple-touch-icon, manifest, theme-color, color-scheme
Content Analysis
- first h1 text, h2 count, internal links count, external links count, images count, images without alt text, forms count, scripts count, stylesheets count, word count
Structured Data
- JSON‑LD (Schema.org) parsed and returned
Feeds
- RSS/Atom feeds auto‑detected
Raw dump
- All meta tags as key‑value pairs
Quick Start (Python)
import requests
headers = {
"X-RapidAPI-Key": "YOUR_KEY",
"X-RapidAPI-Host": "screenshot-pdf-api.p.rapidapi.com"
}
# Screenshot a website
response = requests.get(
"https://screenshot-pdf-api.p.rapidapi.com/v1/screenshot",
headers=headers,
params={"url": "https://github.com", "width": 1280, "format": "png"}
)
with open("screenshot.png", "wb") as f:
f.write(response.content)
print(f"Saved {len(response.content)} bytes")Quick Start (JavaScript)
// Screenshot
const response = await fetch(
"https://screenshot-pdf-api.p.rapidapi.com/v1/screenshot?url=https://github.com&format=png",
{
headers: {
"X-RapidAPI-Key": "YOUR_KEY",
"X-RapidAPI-Host": "screenshot-pdf-api.p.rapidapi.com"
}
}
);
const blob = await response.blob();
// Metadata
const meta = await fetch(
"https://screenshot-pdf-api.p.rapidapi.com/v1/metadata?url=https://github.com",
{
headers: {
"X-RapidAPI-Key": "YOUR_KEY",
"X-RapidAPI-Host": "screenshot-pdf-api.p.rapidapi.com"
}
}
);
const data = await meta.json();
console.log(data.data.title); // "GitHub · Build and ship software..."
console.log(data.data.og_image); // "https://..."
console.log(data.data.word_count); // 834cURL
# Screenshot
curl -o screenshot.png \
-H "X-RapidAPI-Key: YOUR_KEY" \
-H "X-RapidAPI-Host: screenshot-pdf-api.p.rapidapi.com" \
"https://screenshot-pdf-api.p.rapidapi.com/v1/screenshot?url=https://github.com"
# Full page capture
curl -o fullpage.png \
-H "X-RapidAPI-Key: YOUR_KEY" \
-H "X-RapidAPI-Host: screenshot-pdf-api.p.rapidapi.com" \
"https://screenshot-pdf-api.p.rapidapi.com/v1/screenshot?url=https://en.wikipedia.org&full_page=true"Endpoints
| Endpoint | Description | Tier |
|---|---|---|
GET /v1/screenshot | Screenshot URL to PNG/JPEG/WebP | Free |
GET /v1/health | API status & queue depth | Free |
GET /v1/pdf | Generate PDF from URL | Basic |
GET /v1/metadata | Extract 50+ metadata fields | Basic |
POST /v1/screenshot/html | Render HTML/CSS to image | Pro |
Screenshot Parameters
| Param | Default | Description |
|---|---|---|
url | required | URL to capture |
width | 1280 | Viewport width |
height | 800 | Viewport height |
format | png | png, jpeg, webp |
quality | 85 | JPEG/WebP quality (1‑100) |
full_page | false | Capture entire scrollable page |
delay | 0 | Wait N seconds before capture (0‑5) |
selector | null | CSS selector to capture specific element |
Use Cases
- Social media previews – generate Open Graph images
- PDF reports – convert dashboards and pages to PDF
- Web scraping – screenshot + metadata in one call
- Thumbnails – generate website thumbnails at scale
- SEO auditing – check OG tags, missing alt text, structured data
- Link previews – build rich preview cards
- Visual regression testing – automated screenshots for QA
Pricing vs Competitors
| Feature | This API | ScreenshotOne | URLBox |
|---|---|---|---|
| Free tier | 20/day | 100 one‑time | None |
| Basic plan | $9/mo | $17/mo | $19/mo |
| Metadata extraction | 50+ fields | No | No |
| JSON‑LD parsing | Yes | No | No |
| Content analysis | Yes | No | No |
Try it
Built with FastAPI + Playwright (headless Chromium). Hosted on a Hetzner VPS.