I Built 23 Free Web Scrapers on Apify — Here is What I Learned

Published: (February 17, 2026 at 02:28 PM EST)
8 min read
Source: Dev.to

Source: Dev.to

I Built 23 Free Web Scrapers on Apify — Here Is What I Learned

Building in public is one thing, but building scrapers in public is a whole different beast. Over the last few months I’ve developed and released 23 free web scrapers on the Apify platform. From Amazon and TikTok to Google Maps and LinkedIn, I’ve touched almost every corner of the web where data lives.

If you’re an indie dev, a data enthusiast, or someone looking to break into the world of web automation, this is my story of why I built them, the technical hurdles I faced, and what it’s actually like to maintain a fleet of scrapers in 2026.


Why I Started

Most developers start building scrapers for a specific project. I started because I saw a gap.

  • While there are plenty of enterprise‑grade scraping solutions, many indie developers, students, and small researchers just need a quick, reliable way to get data without a $200/month subscription.
  • I wanted to build a “Swiss Army Knife” of data‑extraction tools. By releasing them for free on the Apify Store, I wasn’t just building tools; I was building a portfolio and a reputation. In the world of Scraper‑as‑a‑Service, your best marketing is a tool that actually works.

Goals

  1. Master the art of scraping – you don’t really know how a site works until you try to automate it.
  2. Help the community – data shouldn’t be gated by technical complexity.
  3. Explore the ecosystem – Apify handles the infrastructure, so I could focus 100 % on the logic.

Out of the 23 actors I’ve built, five have consistently dominated the charts in terms of usage. Below is a quick rundown of each, the problem they solve, and the technical challenges that made them interesting.

1. Amazon Product Scraper (the “OG”)

Use case: Price monitoring, competitor analysis, market research.

What it extracts:

  • ASIN
  • Title
  • Price
  • Ratings & reviews
  • BSR (Best‑Seller Rank)

The Challenge

Amazon is a master of A/B testing. On any given day you might see three different versions of a product page. Some have the price in a “ with a specific class; others hide it inside a “Buy Box” iframe.

The Lesson

Instead of brittle CSS selectors, I learned to target the JSON blobs hidden in the page. Look for the script that registers the product state:

window.P.register('twister-js-init-dpx-data', {...})

Parsing that JSON is far more stable than hunting for the right “.


2. Google Maps Business Scraper

Use case: Pull business names, addresses, phone numbers, ratings, and contact info.

The Innovation

Most users wanted more than just the Google Maps data—they wanted to contact the businesses. I added an “Include Website” option. When enabled, the scraper follows the business’s website link and attempts to find:

  • Email addresses
  • Social‑media profiles

Technical Hurdle

Scraping 1 000 different websites is harder than scraping one big site like Google. Every site has its own anti‑bot measures. I implemented a recursive crawler that:

  • Searches for “Contact Us” and “About” pages
  • Strictly limits depth to avoid “spider traps”

3. TikTok Profile Scraper

Use case: Collect profile data and the first 30 videos for analytics, trend spotting, or influencer outreach.

The Breakthrough

TikTok’s internal structure changes weekly, and a browser‑based scraper constantly hits “Verify you are human” sliders.

I discovered the __UNIVERSAL_DATA_FOR_REHYDRATION__ script tag. When a TikTok profile loads, the server sends a massive JSON object containing the profile data and the first 30 videos.

...

Result: Parsing this JSON instead of the DOM made the scraper 10× more stable and significantly faster—turning a “Browser” problem into a “JSON” problem.


4. LinkedIn Jobs Scraper (the “final boss”)

Use case: Pull 100+ job postings in minutes without triggering a login wall.

The Strategy

While others struggled with complex browser automation, I focused on a human‑mimicry implementation using Playwright.

  • Fingerprinting: Used Crawlee’s built‑in fingerprint rotation (headers, screen resolutions, WebGL signatures).
  • Scrolling: Simulated variable‑speed scrolls, pausing occasionally as if a human is reading the job description.

Result: A reliable scraper that can harvest large job feeds without being blocked.


5. Shopify Store Scraper

Use case: Dropshippers and e‑commerce researchers who need the full product catalog and store theme details.

The Trick

Most Shopify stores expose a /products.json endpoint. It’s often hidden or paginated, but it provides perfectly structured data.

Workflow:

  1. Detect if the site runs on Shopify.
  2. Hit the /products.json endpoint directly.
  3. Skip heavy page rendering, saving minutes per store.

My Stack After 23 Iterations

If you aren’t using Crawlee, you’re playing on hard mode. It’s the engine behind all my scrapers and handles the boring stuff—request retries, proxy rotation, and session management—so I can focus on the parsing logic.

ComponentWhen to UseWhy
CheerioCrawlerStatic or hydrated pagesLight, fast, uses ~1/10th of the RAM
PlaywrightCrawlerDynamic pages that require JS executionHandles complex interactions, heavy lifting
Crawlee (core)All scrapersUnified API for retries, proxies, sessions, and scaling

The Eternal Debate: Cheerio vs. Playwright

  • Cheerio (Static/Hydrated) – Use whenever possible. Most modern sites “hydrate” their data into a JSON object inside a “ tag. Find that tag and you don’t need a browser.
  • Playwright (Dynamic) – Use only when the page literally won’t show data until a button is clicked or a script runs. It’s your friend for truly dynamic content.

Anti‑Bot Countermeasures in 2026

Simple IP rotation isn’t enough anymore. Sites like Cloudflare and Akamai inspect the TLS fingerprint—the way your computer “shakes hands” with the server.

TechniqueDescriptionTypical Targets
Residential ProxiesAppear as traffic from home Wi‑Fi networksLinkedIn, Amazon
Header OrderBrowsers send headers in a very specific order; mismatched order can raise suspicionAlmost any high‑security site
Browser FingerprintingRotate screen resolution, WebGL signatures, user‑agent strings, etc.All major platforms
Rate Limiting & Random DelaysMimic human pacing to avoid detectionTikTok, Google Maps

Final Thoughts

After building and maintaining 23 free scrapers, my stack has become very opinionated, but it works:

  • Crawlee for the heavy lifting (retries, proxies, sessions)
  • Cheerio for speed on static/hydrated pages
  • Playwright for the occasional dynamic nightmare

If you’re starting out, focus on stable data sources (JSON blobs, hidden APIs) before resorting to full browser automation. And always respect the target site’s robots.txt and terms of service—scraping responsibly builds a healthier ecosystem for everyone.

Happy scraping! 🚀

Canvas Fingerprinting

Browsers render graphics differently based on your OS and GPU. Tools like Crawlee help spoof these so every request looks like it’s coming from a unique, “real” machine.

Building the Scraper

The scraper itself is the easy part. Maintenance is where the real work happens.

The “Tuesday” Problem

Big‑tech companies often push updates on Tuesdays. I’ve woken up many Wednesday mornings to find five scrapers broken because a single CSS class changed from price-value to p-val.

The Solution

  • Build for failure – wrap parsers in try‑catch blocks and use detailed logging.
  • Use a Sentinel pattern: the scraper regularly checks if it’s still finding the “core” fields (e.g., Price or Title). If the “missing field” rate exceeds 20 %, trigger an alert.

Why I Release These for Free

BenefitExplanation
Lead MagnetA “free” scraper acts as a business card. Dozens of companies have reached out for custom integrations or private versions after seeing the code quality.
Apify Platform CreditsUsers still pay for the compute and proxies they consume, which fuels the ecosystem and brings more paying users.
Portfolio EffectWhen I apply for contracts I can say, “I maintain 23 scrapers with 10 000+ monthly runs.” That proof of scale is invaluable.

Scraping in 2026 → AI

I now use LLMs (e.g., GPT‑4o) to help with “fallback” parsing.

My Stack

  • Language: TypeScript (type safety is non‑negotiable for complex parsers)
  • Framework: Crawlee

Libraries

  • cheerio – lightning‑fast HTML parsing
  • playwright – heavy‑duty browser automation
  • got-scraping – HTTP requests that mimic real browsers

Platform

Apify – hosting, scheduling, and proxy rotation.

Reflections

Building 23 scrapers taught me more about web architecture than years of standard web development. It’s a constant cat‑and‑mouse game, but there’s something incredibly satisfying about turning the messy, unstructured web into a clean CSV file.

“The web is the world’s largest database, but it’s a database with a terrible API. Web scraping is the bridge that fixes that.”

See the Scrapers in Action

You can find the whole collection here:

👉 [My Apify Store Profile](https://apify.com/store)

Whether you need to monitor Amazon prices, find leads on Google Maps, or track TikTok trends, these tools are ready for you.

What’s Next?

Probably 23 more. The demand for data isn’t slowing down, and as long as there are websites, there will be a need for people who know how to (respectfully) scrape them.


I’m an indie developer focusing on web automation and data extraction. If you found this useful, follow me for more technical deep dives into the world of automation!

0 views
Back to Blog

Related posts

Read more »

Add`go fix` to Your CI Pipeline

Introduction Most Go programmers have never invoked go fix in their CI pipeline. It’s been a dormant command for over a decade, originally designed for pre‑Go...