OpenMapTiles is a great project by the team over at maptiler.com that defines and provides a set of tools for generating and serving maps. Their main repository comes with an easy-to-use set of scripts and Docker configurations that lets you quickly download and process map data into a PostgreSQL database and generate vector map tiles. However, it is (purposefully) limited to small runs and not meant for large maps (e.g. continent or planet-size).
At WOWA.ca, we needed to generate a set of vector map tiles for North America (our main customer base). I used an AWS EC2 m5ad.24.xlarge instance with 96vCPUs, 384GiB RAM, and 900GiBx4 NVMe drives in software RAID0 with Amazon Linux 2. Everything went smoothly until I got to generate the map tiles.
16 days. That’s how long it was expected to take. I took a peek at docker stats to see how my PostgreSQL instance and the tile generation instance were doing. It was showing a lame 500% total usage (5vCPUs). No wonder it was taking so long!
By default, tilelive-copy (run by make generate tiles) uses 10 threads. This parameter is controlled by a COPY_CONCURRENCY environmental variable. To fully utilize your resources, set this first to half the number of vCPUs if you’re also hosting your DB on the same instance. PostgreSQL will also need CPU time and using more connections than necessary will likely slow things down rather than speed them up. You can always increase it if you haven’t reached full load.
You can add the parameter as an environmental variable to docker-compose.yaml under the generate vector-tiles instance configuration.
Node.JS uses a libuv threadpool to handle many of its I/O operations. The default number of threads is 4, but can be set up to 1024. At minimum, set it to the number of threads you defined using COPY_CONCURRENCY, but since it’s a limit rather than a definite amount, you can set it up to the maximum of 1024 without any significant negative impact.
The setting goes in the same docker-compose.yaml under the generate vector-tiles instance configuration.
Results: From 16 Days to 24hrs
With no PostgreSQL optimizations, the new configuration dropped my expected time from 16 days to 24hrs — 16 times faster than the default.
Upon closer inspection, PostgreSQL is now the bottleneck in the process. I have not made any optimizations to postgres.conf to take advantage of my memory (only 9GB reserved out of 384GiB!).
With some more work, I would likely be able to speed up the process even faster. With the amount of free memory I still have, caching part of the database in RAM would likely reduce the PostgreSQL (and I/O) bottleneck. However, I only needed to do a single run and I could live with waiting for a day. A 1,600% speedup is good enough for me.
Open for Suggestions
If you’ve worked with openmaptiles before, have any suggestions for PostgreSQL config, or any other tips that you think would be helpful, leave a comment!