greatCircleDistance in ClickHouse: Avoiding Full Table Scans
Source: Dev.to

The Problem
When working with location data, a common question is:
How do I calculate the distance between two coordinates stored in my database?
If you’re using ClickHouse, you don’t need to handle this outside the database—there’s a built‑in function.
The Right Tool: greatCircleDistance
greatCircleDistance(lat1, lon1, lat2, lon2)
It returns the shortest distance between two points on Earth, expressed in meters.
Example
SELECT greatCircleDistance(13.0827, 80.2707, 12.9716, 77.5946) AS distance_meters;
The query above returns the distance between Chennai and Bangalore.
Looks Simple… But There’s a Catch
A naïve query such as:
SELECT city
FROM locations
WHERE greatCircleDistance(lat, lon, 13.0827, 80.2707) < 5000;
may appear fine, but on large tables it can trigger a full table scan.
Why This Happens
ClickHouse indexes are sparse and designed for range pruning. They work well with conditions like:
WHERE lat BETWEEN x AND y
but not with:
WHERE greatCircleDistance(lat, lon, x, y) < 5000
Because the function is applied to the column values, ClickHouse cannot use the index to skip data efficiently.
The Better Approach (What You Should Actually Do)
First reduce the dataset with a bounding‑box filter, then apply the precise distance calculation.
Bounding Box Filter
SELECT city
FROM locations
WHERE lat BETWEEN (13.0827 - 0.05) AND (13.0827 + 0.05)
AND lon BETWEEN (80.2707 - 0.05) AND (80.2707 + 0.05)
AND greatCircleDistance(lat, lon, 13.0827, 80.2707) < 5000;
The bounding box is an approximation that narrows the search space before the exact greatCircleDistance check.
Why This Works
lat BETWEEN …→ uses the indexlon BETWEEN …→ further reduces rowsgreatCircleDistance→ applied only to the filtered subset
Thus, the query avoids scanning the entire table.
Real‑World Use Cases
- Delivery radius filtering
- Finding nearby users
- Geo‑based analytics
- Ride‑sharing systems
One Important Gotcha
- Coordinates must be in degrees (not radians).
- The order is always
(lat, lon). Swapping them yields incorrect results.
Final Thoughts
greatCircleDistance is powerful, but using it blindly can hurt performance. In ClickHouse, query design often matters more than the functions you call. Knowing when and how to apply greatCircleDistance—typically after an index‑friendly pre‑filter—ensures efficient, scalable geo‑queries.