How I built a Bluesky scraper using the AT Protocol API (and published it on Apify)
Source: Dev.to
Why Bluesky is easy to scrape (legitimately)
Most social‑media scrapers fight Cloudflare, rotating proxies, and terms‑of‑service grey areas. Bluesky is different. The AT Protocol was explicitly designed for third‑party clients and data access. The public API at public.api.bsky.app serves unauthenticated read requests. There’s no fingerprinting, no CAPTCHA, no DOM parsing.
The only wrinkle: the search endpoint (app.bsky.feed.searchPosts) now requires authentication via a free App Password. Everything else — author feeds, threads, profiles — works without a token.
The three modes I built
I wanted one actor that covered the main B2B use cases:
- Search posts – keyword and hashtag search with date range, language filter, and sort order. Uses
bsky.social/xrpc/app.bsky.feed.searchPostswith a Bearer token. - Author feed – pull all posts from one or more handles. No auth needed. Useful for competitor monitoring or auditing a creator’s content history.
- Thread – fetch a full conversation tree from a post URL. The API returns a nested tree; I flatten it depth‑first so you get a clean ordered list of posts.
The one gotcha: API routing
I was sending authenticated requests (with a JWT) to public.api.bsky.app. That endpoint is Cloudflare‑fronted and returns 403 if you send auth tokens to it — it’s for unauthenticated traffic only.
Fix:
- Authenticated calls go to
bsky.social. - Unauthenticated reads go to
public.api.bsky.app.
Authenticate against bsky.social, get a JWT, then use that JWT only on subsequent bsky.social calls.
Monorepo deployment headache
I’m building a portfolio of Apify actors in a TypeScript monorepo with npm workspaces. The shared library (@apify-actors/shared) contains PPE charging helpers and error classes. Locally, workspace resolution handles it cleanly. On Apify’s build servers, there’s no monorepo — just the uploaded actor folder.
Solution: copy the shared source into src/shared/ inside each actor and use relative imports. tsup bundles everything into a single dist/main.js. The shared code stays in one canonical place in the repo; each actor gets its own copy baked in at build time.
Output schema
Every post comes back as a flat JSON record:
{
"url": "https://bsky.app/profile/user.bsky.social/post/3lhxxxxxxxxx",
"text": "Post content here",
"authorHandle": "user.bsky.social",
"authorDisplayName": "User Name",
"likeCount": 142,
"repostCount": 28,
"replyCount": 19,
"images": [{ "thumb": "...", "fullsize": "...", "alt": "..." }],
"externalEmbed": { "uri": "...", "title": "...", "description": "..." },
"createdAt": "2025-11-15T10:30:00.000Z"
}
Export as JSON, CSV, or Excel directly from Apify. Plug into Zapier or Make for no‑code workflows.
The actor is live
If you want to use it without building anything: Bluesky Posts Scraper on Apify Store
PPE pricing: $0.25 per run + $0.003 per post ($3/1,000). No subscription.
The AT Protocol makes Bluesky one of the cleanest data sources you can work with right now. If your use case involves social listening, brand monitoring, or lead‑gen signals from a fast‑growing tech‑forward audience, it’s worth adding to your stack.