How I scraped the CQC Care Register without hitting the API auth wall

Published: 1 week ago (May 28, 2026 at 05:21 PM EDT)

3 min read

Source: Dev.to

API authentication wall

CQC migrated their API to api.service.cqc.org.uk and added bearer‑token authentication. You must register at their developer portal, create an application, and include an Authorization: Bearer header on every request. This isn’t a deal‑breaker for enterprise use cases, but it adds friction for a data product because users have to register with CQC before they can run your actor.

The old API base (api.cqc.org.uk/public/v1) now returns HTTP 403 – fully blocked.

Open‑data file rescue

CQC publishes a monthly open‑data file called HSCA_Active_Locations.ods. It’s a 23 MB OpenDocument Spreadsheet containing every active regulated location in England (≈ 56 000). The file is free, requires no authentication, and is released under the Open Government Licence. The URL is date‑stamped and changes each month, but the transparency page always links to the current version.

Approach

Scrape the transparency page to find the current ODS URL.
Download the file.
Parse it, filter rows, and push results.

No API, no auth wall.

ODS parsing challenge

ODS files (.ods) are ZIP archives containing XML. The standard tool for parsing them in Node.js is SheetJS (xlsx package, v0.18.5 – the last Apache 2.0 release).

Sheet selection

The workbook contains three sheets: README, HSCA_Active_Locations, and Dual_Registration_Locations. SheetJS defaults to the first sheet (README), which only has 34 rows. The code below selects the sheet with the most rows:

let bestSheet = null;
let bestCount = 0;

for (const name of workbook.SheetNames) {
    const sheet = workbook.Sheets[name];
    const probe = XLSX.utils.sheet_to_json(sheet, { header: 1 });
    if (probe.length > bestCount) {
        bestCount = probe.length;
        bestSheet = sheet;
    }
}

Column name handling

Column names are Title Case with spaces (e.g., Location Local Authority, Location Latest Overall Rating), not the snake_case shown in third‑party docs. A helper that tries multiple variants keeps the parser resilient to future renames:

function col(row, ...candidates) {
    for (const name of candidates) {
        if (row[name] !== undefined) return row[name];
    }
    return undefined;
}

Date handling

When you read the workbook with { cellDates: true }, dates are returned as JavaScript Date objects, while coordinates come back as strings. Both are normalised in the processing step.

const workbook = XLSX.read(buffer, { type: 'buffer', cellDates: true });

Bonus: service type and user‑band extraction

The ODS has 122 columns; about 60 are boolean flags (Y/null) for specific service types and user bands (e.g., Service type - Care home service with nursing). Instead of mapping each column individually, the active ones are extracted into arrays:

function extractTagged(row, prefix) {
    return Object.entries(row)
        .filter(([key, val]) => key.startsWith(prefix) && (val === 'Y' || val === true))
        .map(([key]) => key.slice(prefix.length));
}

Resulting output

{
  "serviceTypes": ["Care home service without nursing"],
  "serviceUserBands": ["Dementia", "Older People"]
}

Result

The actor supports two modes:

searchLocations – filter by area, rating, service type.
lookupProviders – fetch all locations for specific provider IDs.

It loads and parses the 23 MB ODS on every run (≈ 10–15 s) and then filters in memory. No proxy, no browser automation, no anti‑bot handling.

Available on Apify Store: CQC Care Register Scraper

How I scraped the CQC Care Register without hitting the API auth wall

API authentication wall

Open‑data file rescue

ODS parsing challenge

Sheet selection

Column name handling

Date handling

Bonus: service type and user‑band extraction

Result

Related posts

How I built an FCA Register scraper on Apify (and why it's the B2B data gap nobody talks about)

How I built a Bluesky scraper using the AT Protocol API (and published it on Apify)

SiteRows example #1:

A Smarter Google AI Edge Gallery: MCP integration, notifications, and session continuity