Building a French Address Validation API with 26M Addresses

Published: (March 7, 2026 at 07:20 AM EST)
5 min read
Source: Dev.to

Source: Dev.to

The Problem: Fragmented French Geographic Data

If you’re building a FinTech product in France, you need to validate customer addresses for KYC compliance. Sounds simple, right?

Current landscape (2026)

API / ServiceCostSLARate limitCompany data
API Adresse (BAN)FreeNone50 req/sNo
La Poste RNVPNo public REST API
Google Address Validation$0.005 /requestNo SIRENE integration
INSEE API SIRENEFreeNo address validation, ~500 ms latency, separate auth

To perform proper KYC you typically need at least two of these APIs, each with its own authentication, response format, and rate‑limit constraints.
We decided to build one API that does it all.

Architecture Overview

GEOREFER is built on a straightforward Java stack:

Java 11 + Spring Boot 2.7.5
PostgreSQL 16 (42 M+ rows across 12 tables)
Redis 7 (API‑key cache, TTL 5 min)
Elasticsearch 7.17 (city autocomplete, fuzzy search)

Layered design

REST Controllers (17 controllers, 39 endpoints)

Business Services (12 interfaces, 16 implementations)

Repositories (JPA + Elasticsearch)

PostgreSQL + Redis + Elasticsearch

Importing 26 M Addresses from the BAN

The BAN publishes its data as CSV files, updated monthly. The full dataset is ~3.5 GB compressed.

Import strategy

  1. Download the latest BAN CSV export.
  2. Parse with a streaming CSV reader (no full file in memory).
  3. Batch insert using JDBC batch operations (batch size = 5000).
  4. Index city data into Elasticsearch for autocomplete.

French administrative hierarchy

Region (18) → Department (101) → Commune (35 000+) → Address (26 M)

Paris, Lyon, and Marseille have arrondissements that act as sub‑communes with their own INSEE codes.

We store communes in a french_town_desc table with the full hierarchy:

SELECT f.name,
       f.insee_code,
       f.postal_code,
       d.name AS department,
       r.name AS region
FROM   georefer.french_town_desc f
JOIN   georefer.department d ON f.department_code = d.code
JOIN   georefer.region r ON d.region_code = r.code
WHERE  f.name ILIKE 'paris%';

Address Validation with GeoTrust Scoring

Endpoint: POST /addresses/validate
Input: a French address.
Output:

  • Confidence score (0‑100) – how sure we are the address exists.
  • GeoTrust score (0‑100) – composite reliability score for KYC.
  • Validated address – normalized, corrected, with GPS coordinates.
  • AFNOR format – postal‑standard NF Z 10‑011 formatting.

Scoring model

ComponentWeightWhat it measures
Confidence35 %Street‑level address matching
Geo Consistency25 %Cross‑validation: postal code ↔ commune ↔ department
Postal Match20 %Postal code precision (exact, partial, invalid)
Country Risk20 %FATF/GAFI country‑risk rating

Example request

curl -X POST 'https://georefer.io/geographical_repository/v1/addresses/validate' \
  -H 'Content-Type: application/json' \
  -H 'X-Georefer-API-Key: YOUR_API_KEY' \
  -d '{
        "street_line": "15 Rue de la Paix",
        "postal_code": "75002",
        "city": "Paris",
        "country_code": "FR"
      }'

Example response

{
  "success": true,
  "data": {
    "validated_address": {
      "street_line": "15 Rue de la Paix",
      "postal_code": "75002",
      "city": "PARIS",
      "country": "France"
    },
    "confidence_score": 95,
    "geotrust_score": {
      "overall": 92,
      "level": "LOW",
      "components": {
        "confidence": 95,
        "geo_consistency": 100,
        "postal_match": 100,
        "country_risk": 0
      }
    }
  }
}

Elasticsearch for City Autocomplete

City autocomplete must cover 35 000 communes.

  • Fuzzy search: GET /cities/search?q=Monplier (using fuzziness: AUTO) correctly returns Montpellier.

Multi‑Tenant API Keys & Rate Limiting

GEOREFER is a SaaS offering 5 subscription plans:

PlanDaily limitRate /minPrice
DEMO5010Free
FREE10010Free
STARTER5 0003049 EUR/mo
PRO50 00060199 EUR/mo
ENTERPRISEUnlimited200Custom

Each API key gets its own token bucket (Bucket4j) for rate limiting. Authentication flows through a Spring filter chain:

Request
   → API‑key validation (Redis cache)
   → Quota check
   → Rate‑limit enforcement
   → Feature‑gate evaluation
   → Controller

The Feature Gate determines which endpoints and scoring components are available for a given plan, ensuring that higher‑tier customers receive the full GeoTrust suite while lower‑tier users get a trimmed‑down offering.

Endpoints per Plan

For example, company search (/companies) requires PRO or higher, while city search is available on all plans.

What’s Next

We’re currently at 16.8 million SIRENE establishments imported and 35,000+ communes indexed. The API handles 39 endpoints across geographic data, address validation, company search, and admin/billing.

If you’re building anything that touches French addresses or company data, give it a try:

  • Free tier: 100 requests/day, no credit card required
  • Docs: (link pending)
  • Sign up: (link pending)
  • Examples: (link pending)

In the next article, we’ll deep‑dive into how we query 16.8 M SIRENE establishments in 66 ms using PostgreSQL trigram indexes.

AZMORIS Engineering — “Software that Endures”

0 views
Back to Blog

Related posts

Read more »

The research desk has a memory problem

Why a securities firm needed a brain, not another dashboard An analyst leans across the desk and asks: “What’s our current stance on XYZ Inc — the one that file...