Building a French Address Validation API with 26M Addresses
Source: Dev.to
The Problem: Fragmented French Geographic Data
If you’re building a FinTech product in France, you need to validate customer addresses for KYC compliance. Sounds simple, right?
Current landscape (2026)
| API / Service | Cost | SLA | Rate limit | Company data |
|---|---|---|---|---|
| API Adresse (BAN) | Free | None | 50 req/s | No |
| La Poste RNVP | – | – | – | No public REST API |
| Google Address Validation | $0.005 /request | – | – | No SIRENE integration |
| INSEE API SIRENE | Free | – | – | No address validation, ~500 ms latency, separate auth |
To perform proper KYC you typically need at least two of these APIs, each with its own authentication, response format, and rate‑limit constraints.
We decided to build one API that does it all.
Architecture Overview
GEOREFER is built on a straightforward Java stack:
Java 11 + Spring Boot 2.7.5
PostgreSQL 16 (42 M+ rows across 12 tables)
Redis 7 (API‑key cache, TTL 5 min)
Elasticsearch 7.17 (city autocomplete, fuzzy search)
Layered design
REST Controllers (17 controllers, 39 endpoints)
│
Business Services (12 interfaces, 16 implementations)
│
Repositories (JPA + Elasticsearch)
│
PostgreSQL + Redis + Elasticsearch
Importing 26 M Addresses from the BAN
The BAN publishes its data as CSV files, updated monthly. The full dataset is ~3.5 GB compressed.
Import strategy
- Download the latest BAN CSV export.
- Parse with a streaming CSV reader (no full file in memory).
- Batch insert using JDBC batch operations (
batch size = 5000). - Index city data into Elasticsearch for autocomplete.
French administrative hierarchy
Region (18) → Department (101) → Commune (35 000+) → Address (26 M)
Paris, Lyon, and Marseille have arrondissements that act as sub‑communes with their own INSEE codes.
We store communes in a french_town_desc table with the full hierarchy:
SELECT f.name,
f.insee_code,
f.postal_code,
d.name AS department,
r.name AS region
FROM georefer.french_town_desc f
JOIN georefer.department d ON f.department_code = d.code
JOIN georefer.region r ON d.region_code = r.code
WHERE f.name ILIKE 'paris%';
Address Validation with GeoTrust Scoring
Endpoint: POST /addresses/validate
Input: a French address.
Output:
- Confidence score (0‑100) – how sure we are the address exists.
- GeoTrust score (0‑100) – composite reliability score for KYC.
- Validated address – normalized, corrected, with GPS coordinates.
- AFNOR format – postal‑standard NF Z 10‑011 formatting.
Scoring model
| Component | Weight | What it measures |
|---|---|---|
| Confidence | 35 % | Street‑level address matching |
| Geo Consistency | 25 % | Cross‑validation: postal code ↔ commune ↔ department |
| Postal Match | 20 % | Postal code precision (exact, partial, invalid) |
| Country Risk | 20 % | FATF/GAFI country‑risk rating |
Example request
curl -X POST 'https://georefer.io/geographical_repository/v1/addresses/validate' \
-H 'Content-Type: application/json' \
-H 'X-Georefer-API-Key: YOUR_API_KEY' \
-d '{
"street_line": "15 Rue de la Paix",
"postal_code": "75002",
"city": "Paris",
"country_code": "FR"
}'
Example response
{
"success": true,
"data": {
"validated_address": {
"street_line": "15 Rue de la Paix",
"postal_code": "75002",
"city": "PARIS",
"country": "France"
},
"confidence_score": 95,
"geotrust_score": {
"overall": 92,
"level": "LOW",
"components": {
"confidence": 95,
"geo_consistency": 100,
"postal_match": 100,
"country_risk": 0
}
}
}
}
Elasticsearch for City Autocomplete
City autocomplete must cover 35 000 communes.
- Fuzzy search:
GET /cities/search?q=Monplier(usingfuzziness: AUTO) correctly returns Montpellier.
Multi‑Tenant API Keys & Rate Limiting
GEOREFER is a SaaS offering 5 subscription plans:
| Plan | Daily limit | Rate /min | Price |
|---|---|---|---|
| DEMO | 50 | 10 | Free |
| FREE | 100 | 10 | Free |
| STARTER | 5 000 | 30 | 49 EUR/mo |
| PRO | 50 000 | 60 | 199 EUR/mo |
| ENTERPRISE | Unlimited | 200 | Custom |
Each API key gets its own token bucket (Bucket4j) for rate limiting. Authentication flows through a Spring filter chain:
Request
→ API‑key validation (Redis cache)
→ Quota check
→ Rate‑limit enforcement
→ Feature‑gate evaluation
→ Controller
The Feature Gate determines which endpoints and scoring components are available for a given plan, ensuring that higher‑tier customers receive the full GeoTrust suite while lower‑tier users get a trimmed‑down offering.
Endpoints per Plan
For example, company search (/companies) requires PRO or higher, while city search is available on all plans.
What’s Next
We’re currently at 16.8 million SIRENE establishments imported and 35,000+ communes indexed. The API handles 39 endpoints across geographic data, address validation, company search, and admin/billing.
If you’re building anything that touches French addresses or company data, give it a try:
- Free tier: 100 requests/day, no credit card required
- Docs: (link pending)
- Sign up: (link pending)
- Examples: (link pending)
In the next article, we’ll deep‑dive into how we query 16.8 M SIRENE establishments in 66 ms using PostgreSQL trigram indexes.
AZMORIS Engineering — “Software that Endures”