A common operation over geospatial datasets is searching for locations within some distance of a reference point. In Georacle, this is called a proximity search. This post provides some intuition for how to efficiently search by proximity. No more than a basic level of high school trig is needed to follow along. We’ll assume a spherical model of the earth which has a bounded error of \(0.5\%\).1

Great Circles

On the Cartesian plane, it’s well known that the shortest path between two points is the straight line between them. On the curved surface of a sphere however, there are no straight lines. The shortest path between two points on a sphere is a segment along a great circle. This is useful for example in navigation, where if you’ve ever taken a long distance flight you’ve most likely travelled along one of Earth’s great circles.

Meridians Parallels

One way to think about great circles is by imagining the largest circle that you can you draw around a sphere. On Earth, every meridian is a great circle since lines of longitude cut the sphere exactly in half. By contrast, only one parallel is a great circle (the Equator), since lines of latitude tend to get smaller as you approach the poles.

Bounding the Search Space

To see how this works, let’s try searching for points within \(1000\) km of the reference point \(p = (-40.863475, 34.765298)\). Let \(B\) denote the bounding box that inscribes the circle containing every point within \(1000\) km of \(p\) on the surface of the sphere. Moving \(1000\) km along a great circle of radius \(R_{earth} \approx 6371\) km means traveling at an angle2 \(\theta = \frac{1000km}{6371km} = 0.15696\). Since every meridian is a great circle, we can fix our points of latitude and correct for longitude to compute the bounds of \(B\) as follows

\[\begin{equation} \theta_x = \frac{x}{R_{earth}} \end{equation}\] \[\begin{equation} \delta_p(x) = \arcsin \frac{\sin \theta_x}{\cos \textit{lat}_p} \end{equation}\] \[\begin{equation} B_p(x) = \begin{cases} \min_{\textit{lat}} & \textit{lat}_p - \theta_x\\ \min_{\textit{lon}} & \textit{lon}_p - \delta_p\\ \max_{\textit{lat}} & \textit{lat}_p + \theta_x\\ \max_{\textit{lon}} & \textit{lon}_p + \delta_p\\ \end{cases} \end{equation}\]

Using our parameters, \(B_{p}(1000) = \{-49.856691, 22.836545, -31.870258, 46.694050\}\)

bound

These correspond to the south, west, north, and east corners of the bounding box3 projected over the Atlantic Ocean. Here’s a Sage implementation of this logic.

#!/usr/bin/env sage
R_EARTH = 6371 # km

def B(lat, lon, dist):
    theta = dist / R_EARTH

    # radians
    lat *= pi / 180.0
    lon *= pi / 180.0

    min_lat = lat - theta
    max_lat = lat + theta

    delta = arcsin(sin(theta) / cos(lat))
    min_lon = lon - delta
    max_lon = lon + delta

    # degrees
    sw = (float(min_lat * 180/pi), float(min_lon * 180/pi))
    ne = (float(max_lat * 180/pi), float(max_lon * 180/pi))

    return (sw, ne)

p = (-40.863475, 34.765298)
range = 1000 # km

bbox = B(*p, range)

Testing Candidate Points

By inscribing our circle within a bounding box, we’ve significantly narrowed down the search space. Points within \(B\) are good candidates for testing, since those that lie within the box and within our circle definitely satisfy our proximity query. Points that lie within the box but outside of the circle however, need to be rejected using some distance metric.

Given two points \(u\) and \(v\) on the sphere, the great circle distance4 between them is

\[\begin{equation} \Delta_{lon} = (u_{lon} - v_{lon}) \end{equation}\] \[\begin{equation} D(u, v) = R_{earth} \cdot \arccos( \sin u_{lat} \sin v_{lat} + \cos u_{lat} \cos \Delta_{lon})\\ \end{equation}\]

We can filter candidate points by ensuring \(D(p, x) \leq 1000\) km \(\forall x \in B\) as follows

#!/usr/bin/env sage
R_EARTH = 6371 # km

def D(u, v):
    # radians
    u = [i*pi/180 for i in u]
    v = [i*pi/180 for i in v]

    u_lat, u_lon = u[0], u[1]
    v_lat, v_lon = v[0], v[1]
    
    delta_lon = u_lon - v_lon
    x = sin(u_lat) * sin(v_lat)
    y = cos(u_lat) * cos(v_lat) * cos(delta_lon)

    # km
    return float(R_EARTH * arccos(x + y))

p = (-40.863475, 34.765298)
range = 1000 # km

# not in the box
assert(D(p, (0, 0)) > range)

# in the box, but outside of the circle
assert(D(p, (-49.856691, 22.836545)) > range)

# in the circle
assert(D(p, (-42.102995, 35.953711)) <= range)

Many geospatial databases are optimized for fast retrieval of points by bounding box. By inscribing the search space and filtering false positives, proximity queries can run efficiently on top of existing datasets.

References