Skip to main content

How to minify GeoJSON files?

You can't do web mapping these days without knowing your GeoJSON. It's the vector format of choice among popular mapping libraries like Leaflet, D3.js and Polymaps. Size matters on the web, especially if you want to distribute complex geometries, like the world's countries. The challenge is even bigger if you want to target mobile users - or support web browsers with poor vector handling (IE < 9). This blog post will show you how to minify your GeoJSON files before sending them over the wire.

The first thing you should do is to generalize your vectors so they don't contain more detail than you need. In a previous blog post, I was able to remove 90% of the coordinates without loosing to much detail for map scale I wanted to use. This will of course have a great effect on the file size.

Today, I'm going to use country borders from the Natural Earth dataset. These datasets are already generalized for different scales (1:10m, 1:50m, and 1:110 million), so I'll use them as they are. The 1:110m (small scale) and 1:50m (medium scale) shapefiles will cover the needs for the thematic world maps I plan to make:

The 110m and 50m country polygons shown in QGIS.

Let's open the datasets in QGIS. If you look at the attribute table you'll see that each dataset contains 63 attributes, which makes them very versatile. For your web maps, you probably need just a few of the attributes, and you should remove the ones you don't need. I'm keeping the country name and the ISO 3166-1 country codes (alpha-2, alpha-3, and numeric), which can be used to link country geometries to statistical data. 

Only keep the attributes you need.

Next, we can convert the shapefiles to GeoJSON with ogr2ogr:

ogr2ogr -f "GeoJSON" -lco COORDINATE_PRECISION=1 ne_110m_admin_0_countries.json ne_110m_admin_0_countries.shp

ogr2ogr -f "GeoJSON" -lco COORDINATE_PRECISION=2 ne_50m_admin_0_countries.json ne_50m_admin_0_countries.shp

The important thing is that I'm only keeping one decimal (coordinate precision) for the 110m dataset, and two decimals for the 50m dataset, which is sufficient for my map scales. This will reduce the size of the GeoJSON files by more than half. The size of the 110m GeoJSON is now 207 kB and the 50m version is 1,897 kB. But we can do better.

The files contains a lot of whitespace, which is waste of space. I planned to use Sublime Text to remove the whitespace, but it were not able to handle the 50m GeoJSON file, so I switched to Notepad++. I used these regular expressions:

Find: "([^a-z.]) "
Replace: "$1"
This will remove all whitespace which is not succeeding a letter or a dot, which are present in country names.

Find: "\n,"
Replace: ","
Remove line breaks (keeping some for readability).

Find: "\.0([,\]])"
Replace: "$1"
Remove trailing zeros.

This will reduce the file size of the 110m GeoJSON from 207 to 156 kB, without loosing any data quality. More than 400k of whitespace characters was removed from the 50m GeoJSON file, reducing the file size from 1,897 to 1,481 kB.

If your web server is supporting gzipping on-the-fly, the 110m GeoJSON will end up being 45 kB and the 50m version will be 430 kB. Not bad!

And if this is too much work, you can always download the final GeoJSON files on thematicmapping.org.

NB! Mike Bostock’s TopoJSON would allow us to compress the GeoJSON even more, while preserving topology (shared borders between countries) - but we would need to use a map client supporting the format. Looks promising!

Comments

Unknown said…
Using the regular expressions like that can easily break your labels or attributes. I would suggest to use a json parser that supports minification to remove white space.

You can further minify the GeoJSON by
- removing invisible geometries, if the simplification process did not already.
- reducing the output precision of float coordinates according to the desired zoom level.
- use shorter ids for all attributes.
Bjørn Sandvik said…
Hi unknown,

Thanks for your comments.

I agree that I could also use a JSON parsers for minification, but I wanted more control to keep each country on a separate line for readability.

> removing invisible geometries, if the simplification process did not already.

The geometries is already simplified for the targeted map scale (maybe the 50m version could be a bit more simplified).

> reducing the output precision of float coordinates according to the desired zoom level.

I'm already doing this by only keeping one decimal for the 110m dataset, and two for 50m.

> use shorter ids for all attributes.

I agree!
Unknown said…
Thanks for the helpful post!

I've been playing with ogr2ogr to convert shapefiles to GeoJSON and I've used the -simplify option to reduce file size. Looking at the ogr2ogr reference I see the -lco option you've used, but where does the COORDINATE_PRECISION come from? Is there another reference I can use?

Also, the link you posted to TopoJSON is not properly formatted...
Arnie Shore said…
Have you looked into http://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm for possibly reducing the number of useful points?

AS
Devdatta Tengshe said…
One more thing which will really help, is gzipping the JSON as you server it from your server. I've seen that Gzipping in itself provides upto 70% in bandwidth decreases.
Bjørn Sandvik said…
Eli: COORDINATE_PRECISION is documented here. I've fixed the TopoJSON url. Thanks!

Arnie: Yes, here.

Devdatta: Yes, I'm mentioning gzipping which also gives me a 70% reduction.
Jeff said…
Thanks for the link to the minified dataset! Much appreciated.
Anonymous said…
Hi, thanks for the useful article, just wanted to say that I released a very simple javascript page for automatically removing attributes and whitespace from GeoJSON files.
It takes an input GeoJSON and removes every attribute except the country IDs and names.
You can find it here on gitHub.
https://github.com/Pimentoso/GeoJSON-Attribute-Cleaner
Anonymous said…
Have you looked at this javascript (node) module? geojson-mend It reduces unnecessary precision and closely clustered coordinates.

Popular posts from this blog

Creating a WebGL Earth with three.js

This blog post will show you how to create a WebGL Earth with three.js , a great JavaScript library which helps you to go 3D in the browser. I was surprised how easy it seemed when reading a blog post  by Jerome Etienne . So I decided to give it a try using earth textures  from one of my favourite cartographers, Tom Patterson . WebGL is a JavaScript API for rendering interactive 3D graphics in modern web browsers without the use of plug-ins. Three.js is built on top of WebGL, and allows you to create complex 3D scenes with a few lines of JavaScript. If your browser supports WebGL you should see a rotating Earth below: [ Fullscreen ] To be able to display something with three.js, you need three things: a scene, a camera and a renderer. var width  = window.innerWidth,     height = window.innerHeight; var scene = new THREE.Scene(); var camera = new THREE.PerspectiveCamera(45, width / height, 0.01, 1000); camera.position.z = 1.5; var rende...

Thematic Mapping Engine

It's time to introduce the Thematic Mapping Engine (TME). In my previous blog posts, I've shown various techniques of how geobrowsers can be used for thematic mapping. The goal has been to explore the possibilites and to make these techniques available to a wider audience. The Tematic Mapping Engine provides an easy-to-use web interface where you can create visually appealing maps on-the-fly. So far only prism maps are supported, but other thematic mapping techniques will be added in the upcoming weeks. The engine returns a KMZ file that you can open in Google Earth or download to your computer. My primary data source is UNdata . The above visualisation is generated by TME ( download KMZ ) and shows child mortaility in the world ( UNdata ). The Thematic Mapping Engine is also an example of what you can achieve with open source tools and datasets in the public domain: A world border dataset is loaded into a MySQL database . The same database contains tables with statistics ...

Creating 3D terrains with Cesium

Previously, I’ve used three.js to create 3D terrain maps in the browser ( 1 , 2 , 3 , 4 , 5 , 6 ). It worked great for smaller areas, but three.js doesn’t have built-in support for tiling and advanced LOD algorithms needed to render large terrains. So I decided to take Cesium for a spin. Cesium is a JavaScript library for creating 3D globes and 2D maps in the browser without a plugin. Like three.js, it uses WebGL for hardware-accelerated graphics. Cesium allows you to add your own terrain data, and this blog post will show you how. Impressed by the terrain rendering in @CesiumJS - with a 10m elevation model for Norway! Farewell Google Earth. pic.twitter.com/RQKvfu2hBb — Bjørn Sandvik (@thematicmapping) October 4, 2014 Compared to  the dying Google Earth plugin , it's quite complicated to get started with Cesium. The source code is well documented and the live coding Sandcastle is great, but there is a lack of tutorials  and my development slows down when ...