Geospatial Indexing Explained: A Comparison of Geohash, S2, & H3
Back to Blog
GeospatialData EngineeringInfrastructure

Geospatial Indexing Explained: A Comparison of Geohash, S2, & H3

BF
Ben Feifke
··7 min read

Introduction

Geospatial Indexing is the process of indexing latitude-longitude pairs to small subdivisions of geographical space, and it is a technique that we data scientists often find ourselves using when faced with geospatial data.

While geohash was invented in 2008, governments have subdivided land for centuries using states, provinces, counties, and postal codes for census and electoral purposes.

Data scientists employ computational techniques for spatial subdivision to support:

  • Analytics
  • Feature-engineering
  • Granular AB testing by geographic subdivision
  • Indexing geospatial databases

This post examines three tools:

  1. Geohash
  2. Google S2
  3. Uber H3

Note: This article focuses on geospatial indexing functionality specifically, though these tools offer additional capabilities like polygon intersection and neighbor retrieval.

Geohash

Invented in 2008 by Gustavo Niemeyer, Geohash maps latitude-longitude pairs to squares of user-defined resolution, identified by signature strings like "drt".

Level-3 geohash grid visualization showing geographical subdivisions

The Geohash Algorithm

The process involves six steps:

  1. Choose a geohash-level/resolution
  2. Create an empty binary array S of length (geohash-level x 5)
  3. Check if point is in left/right half of map; append 0 or 1 accordingly
  4. Check if point is in bottom/top half; append 0 or 1 accordingly
  5. Repeat steps 3-4 (geohash-level x 5 / 2) times
  6. Convert every 5 bits to a 32-bit alphanumeric character

The algorithm can be repeated iteratively to create increasingly fine-grained divisions, down to cells less than a meter on each side.

Computing a level-1 geohash

Iterative geohash subdivision process

Z-Order Curve

The Z-order curve is a type of space-filling curve, which is designed just for this purpose of mapping multidimensional values (such as latitude-longitude pairs) to one dimensional representations (such as a string).

Alphabetically ordered geohashes trace a Z-order curve pattern.

Z-order curve pattern traced by alphabetically ordered geohashes

Geohash's Shortcomings

Problem 1: Weak Spatial Preservation

The Z-order curve weakly preserves latitude-longitude proximity. Due to edge effects, locations physically close may have distant geohash strings, and vice versa — locations with similar hash strings may be physically distant.

Problem 2: Projection Issues

The equirectangular projection creates:

  • High variability in geohash square sizes
  • Discontinuities at the North and South Poles

Geohashes get smaller as you approach the poles, making the system unreliable at polar regions. Antarctica at (-90°, 0°) cannot be represented.

Geohash cell size variation near the poles using equirectangular projection

Getting Started with Geohash

Geohash implementations are widely available:

  • PyPi libraries: geohashr, geohash-tools, pygeohash-fast
  • Rust crate: Rust-Geohash
  • NodeJS library: node-geohash
  • Database tools: PostGIS, AWS Redshift, GCP BigQuery

Google S2

Released December 5, 2017, created primarily by Eric Veach at Google. S2 addresses Geohash's limitations through two innovations:

  1. Hilbert curve (instead of Z-order curve)
  2. Unfolded cube projection (instead of equirectangular projection)

S2 geometry visualization on a globe

The Hilbert Curve

The Hilbert curve is another type of space-filling curve that, rather than using a Z-shaped pattern like the Z-order curve, uses a gentler U-shaped pattern.

The Hilbert curve requires more distance to index the same space compared to Z-order curves at all levels. This provides better locality-sensitive hashing properties.

Hilbert curve compared to Z-order curve, showing the gentler U-shaped traversal pattern

Key advantage: latitude-longitude pairs that are close in their S2 Cell ID string distance are much more likely to be close in physical distance.

Note: Edge effects still exist, but latitude-longitude pairs close in physical distance aren't necessarily close in string distance.

The S2 Map Projection

S2 uses an unfolded-cube projection rather than equirectangular projection. This significantly reduces cell size variation because as you move away from the equator, the distance between two longitude lines increases sinusoidally.

S2 unfolded cube projection compared to equirectangular projection

Getting Started with S2

  • Primary: C++ implementation on Google's Github
  • Ports: Kotlin, Java, Golang
  • Alternative: Open-source Python implementation by third parties (aaliddell/s2cell on Github)

Note: Python interface requires non-trivial setup.

Uber H3

Released 2018, H3 introduces two further innovations:

  1. Hexagons (instead of squares)
  2. Icosahedron projection (instead of cube or equirectangular)

H3's icosahedron projection perspective on Earth

Why Hexagons?

The hexagon is the regular polygon with the most sides that still tessellates with itself.

Key property: In a hexagon tessellation, all neighbors of any hexagon are equidistant from its center.

This differs from:

  • Triangles: Three distinct neighbor distances
  • Squares: Two distinct neighbor distances

Having this property that all neighbors are equidistant greatly simplifies any calculus or gradient related operations that Uber or H3's other users might want to perform.

Hexagons appear in nature: bee hives, water crystallization (snowflakes), and Saturn's North Pole storm.

Hexagon tessellation showing equidistant neighbor property compared to squares and triangles

The Icosahedron Projection

H3 uses a 20-faced polyhedron (icosahedron) projected onto Earth. More faces on a polyhedron approximate a sphere more closely, reducing spatial distortion.

H3's icosahedron structure projected onto the globe

Each triangular face of the icosahedron is covered with hexagons, which subdivide into smaller hexagons from there.

One icosahedron face subdivided into hexagons

Result: H3 hexagons have more consistent sizes than S2 squares, which have more consistent sizes than Geohash squares.

H3's Sacrifices

Trade-off 1: Hierarchical Structure

Hexagons don't subdivide as cleanly into smaller hexagons. One hexagon divides into seven others, positioned at a slight angle to the parent.

Consequence: the strict spatial hierarchy discussed above regarding Geohash — that if a latitude-longitude point is contained in a cell then it is guaranteed to be contained in that cell's parent — is not maintained in H3.

H3's hexagon subdivision pattern with angular offset

Example: In Geohash, "h356" is a child of "h35", but in H3, "862830807ffffff" is a child of "85283083fffffff" — string prefixes don't indicate hierarchy.

Trade-off 2: Space-Filling Curve Limitations

While H3 follows a space-filling curve within each icosahedron face, it does not globally. Hexagon string identifiers use a bitmap that doesn't retain string-prefix behavior.

Trade-off 3: Pentagon Necessity

Hexagons cannot perfectly tessellate on a sphere. H3 requires twelve pentagons positioned at icosahedron vertices (like a soccer ball pattern). These are strategically placed over oceans to minimize impact.

Icosahedron projection on sphere with pentagon placement comparison to soccer ball

Getting Started with H3

  • Primary: C implementation on Uber's Github
  • Bindings: Python (h3-py), R (h3r), and many other languages
  • Availability: Better cross-platform support than S2, though less universal than Geohash

Comparison

Comparative table summarizing Geohash, S2, and H3 characteristics

| Criterion | Geohash | S2 | H3 | |-----------|---------|----|----| | Cell Shape | Squares | Squares | Hexagons | | Space-Filling Curve | Z-order | Hilbert | None (icosahedron-based) | | Projection | Equirectangular | Cube | Icosahedron | | Cell Size Consistency | Poor | Good | Excellent | | String Proximity = Physical Proximity | Weak | Better | N/A | | Spatial Hierarchy | Strict | Strict | Approximate | | Availability | Ubiquitous | Limited | Moderate | | Complexity | Simple | Moderate | High |

When to Use Which Tool?

Feature Engineering with Latitude and Longitude

Simple neighborhood ID creation: Any tool works.

Aggregation features (e.g., average lifespan by neighborhood): Any tool functions.

Population density estimation: H3 preferred due to uniform cell sizes.

Small geographic areas (e.g., Manhattan): Geohash acceptable to leverage simplicity without distortion effects.

Multiple cell sizes simultaneously: Avoid H3 (creates multiple cell overlap). Choose Geohash or S2.

Ocean-related data: Avoid H3 to prevent pentagon complications.

Conclusion

As our need for richer features from our systems increase, design complexity increases, and with it, so increase the sacrifices we must make regarding the system's properties.

Each tool represents design trade-offs. The analogy is made to digital vs. analog calendars — there's no universally "correct" answer, only context-appropriate choices.

References