As a mapping company, we curate and visualize all sorts of data for our users. Part of delivering this experience is making sure that it’s easy to interact with data layers (e.g. emphasizing fast tile delivery, letting you filter legend items and/or click geometric attributes for additional detail). “Making sure it’s easy,” is not an easy thing to do, however.
The world, and the data we use to represent it, is complicated and varied. Great maps are often the result of a series of subjective decisions on what to do with that data; they are purpose-built to communicate ideas and information more effectively. At Felt, we are constantly exploring how to deliver the "right" experience at scale. But what if there are different "right" ways, and no one-size fits all solution? For the engineers at Felt, this means plunging into the messy world of reality and embracing heuristics over universal formulas.
Zoom to fit & PostGIS
Being able to see the extent of a single data layer may sound like a pretty straight-forward, one size could possibly fit all problem–right? Wrong. We found that determining the right extent for a data layer requires a heuristic approach. First, some context about the Zoom to fit feature.
When we decided to implement the feature, one of the first decisions was which part of the stack is responsible for the calculation, the frontend or the backend. If we were working on small datasets with relatively few, clustered points, it would be tempting to push the responsibility to the frontend. However our data layers are large, often consisting of thousands of geometric features that can span the whole globe. Additionally, the frontend is receiving tiles piecemeal so it won’t always immediately have the full picture of the data. To avoid burdening the frontend with a slow and complex calculation, we decided to take advantage of PostGIS functionality to calculate bounding boxes during upload processing and deliver these bounds to the frontend as geojson polygons along with other layer metadata. During the processing stage, we import our layers into PostGIS, which creates tables consisting of rows of various geometric elements. PostGIS gives users a few different query options to calculate a bounding box for different geometries or sets of geometries. In PostGIS terms, this is often referred to as the extent. Let’s take a look at a few different extension functions that let us calculate the extent of our data.
PostGIS Extent Calculations
PostGIS gives us some different options for calculating bounding boxes for our data:
- ST_Extent: “An aggregate function that returns a box2d bounding box that bounds a set of geometries.”
- ST_EstimatedExtent: “Returns the estimated extent of a spatial table as a box2d. The current schema is used if not specified. The estimated extent is taken from the geometry column's statistics.”
- ST_3DExtent: “An aggregate function that returns a box3d (includes Z ordinate) bounding box that bounds a set of geometries.”
- ST_Envelope: “Returns the double-precision (float8) minimum bounding box for the supplied geometry, as a geometry.” Crucially, this is only used for a single geometry, not for a whole table.
Given that we’re operating on the 2D plane and need to evaluate multiple geometries, we can disregard <p-inline>ST_3DExtent<p-inline> and <p-inline>ST_Envelope<p-inline>. The question becomes whether to use <p-inline>ST_Extent<p-inline> or <p-inline>ST_EstimatedExtent<p-inline>. Both can be useful in different situations. <p-inline>ST_Extent<p-inline> does the heavy lifting of calculating an accurate bounding box, but for large tables, this can come at a cost. <p-inline>ST_EstimatedExtent<p-inline>, on the other hand, is basically a “free” calculation. Rather than actually calculating the bounding_box for every row, <p-inline>ST_EstimatedExtent<p-inline> takes advantage of Postgres autovacuuming to regularly sample the table and store statistics on it. When you make a request <p-inline>ST_EstimatedExtent<p-inline> will give you a pre-calculated extent that is roughly 95% of the actual <p-inline>ST_Extent<p-inline> calculation.
But just how much faster is it? I wanted to get a better idea and so I ran a test on one of our larger data layers– a table with 242,747 rows of multipolygon geometries. In the table below, you can see the average execution time for 50 runs, as well as the resulting bounding box.