greatCircleDistance in ClickHouse: Avoiding Full Table Scans

Published: (April 20, 2026 at 12:18 PM EDT)
2 min read
Source: Dev.to

Source: Dev.to

Cover image for greatCircleDistance in ClickHouse: Avoiding Full Table Scans

The Problem

When working with location data, a common question is:

How do I calculate the distance between two coordinates stored in my database?

If you’re using ClickHouse, you don’t need to handle this outside the database—there’s a built‑in function.

The Right Tool: greatCircleDistance

greatCircleDistance(lat1, lon1, lat2, lon2)

It returns the shortest distance between two points on Earth, expressed in meters.

Example

SELECT greatCircleDistance(13.0827, 80.2707, 12.9716, 77.5946) AS distance_meters;

The query above returns the distance between Chennai and Bangalore.

Looks Simple… But There’s a Catch

A naïve query such as:

SELECT city
FROM locations
WHERE greatCircleDistance(lat, lon, 13.0827, 80.2707) < 5000;

may appear fine, but on large tables it can trigger a full table scan.

Why This Happens

ClickHouse indexes are sparse and designed for range pruning. They work well with conditions like:

WHERE lat BETWEEN x AND y

but not with:

WHERE greatCircleDistance(lat, lon, x, y) < 5000

Because the function is applied to the column values, ClickHouse cannot use the index to skip data efficiently.

The Better Approach (What You Should Actually Do)

First reduce the dataset with a bounding‑box filter, then apply the precise distance calculation.

Bounding Box Filter

SELECT city
FROM locations
WHERE lat BETWEEN (13.0827 - 0.05) AND (13.0827 + 0.05)
  AND lon BETWEEN (80.2707 - 0.05) AND (80.2707 + 0.05)
  AND greatCircleDistance(lat, lon, 13.0827, 80.2707) < 5000;

The bounding box is an approximation that narrows the search space before the exact greatCircleDistance check.

Why This Works

  • lat BETWEEN … → uses the index
  • lon BETWEEN … → further reduces rows
  • greatCircleDistance → applied only to the filtered subset

Thus, the query avoids scanning the entire table.

Real‑World Use Cases

  • Delivery radius filtering
  • Finding nearby users
  • Geo‑based analytics
  • Ride‑sharing systems

One Important Gotcha

  • Coordinates must be in degrees (not radians).
  • The order is always (lat, lon). Swapping them yields incorrect results.

Final Thoughts

greatCircleDistance is powerful, but using it blindly can hurt performance. In ClickHouse, query design often matters more than the functions you call. Knowing when and how to apply greatCircleDistance—typically after an index‑friendly pre‑filter—ensures efficient, scalable geo‑queries.

0 views
Back to Blog

Related posts

Read more »

Subqueries and CTEs in SQL

When working with SQL, you eventually run into situations where a single query just isn’t enough. You need to break a problem into parts, compute an intermediat...