Data Model Study
The Engineering Working Group of the OpenStreetMap Foundation commissioned a study in the beginning of 2022 on how to improve the existing data model. Jochen Topf has delivered the results of this study, including recommendations on how to make the OpenStreetMap data model more computationally efficient and more accessible.
Two key suggestions have been made:
- introducing an area datatype for representing polygons
- getting rid of untagged nodes
Community Consultation
In order to decide the next steps in this process we want to have more discussions with the community of developers as the proposed changes impact OpenStreetMap software which directly or indirectly depends on the data model.
Potential benefits
Less Mess for Areas
Some mappers may be surprised to hear that OSM does not already have an Area data type. After all, the iD editor prominently features buttons for drawing points, lines and areas. Once mapped, these areas usually appear on the map as expected. The OSM wiki documents whether a tag is typically used on areas, and even Overpass Turbo lets you use areas in your query.
Behind the scenes, however, these areas are represented as ways or relations. Each tool working with OSM data uses its own set of rules to guess whether a particular way represents a line or an area. Making areas a proper part of the OSM data model would lead to a consistent interpretation across applications, enable the API to prevent broken areas from being uploaded, and may eventually lead to support for partial downloads of very large areas.
Keeping OSM Processing Accessible
Currently, ways are made up of references to nodes, and we rely on these references to determine how ways connect to each other. Resolving the coordinates to these node references is a costly process within the OpenStreetMap toolchain as it takes hours to days, even on capable hardware.
In the future, we might model ways as a simple list of coordinates – depending on the exact implementation we end up with. This would offer large performance benefits, but getting rid of untagged nodes would be a significant change.
At first glance, performance improvements may not seem particularly exciting. But how easy it is to work with our data directly impacts how useful OpenStreetMap is to the world at large. As Jochen observes: “The goal is to keep OSM as that great resource that can be used not only by multi-billion-dollar companies but by the student who wants to create a map of the world on their notebook or the activist with their donated second-hand computer.”
Better OSM History
Many mappers are disappointed when they realise how few things the history tab of the website can actually show. There are many tools, like OSMCha and Achavi, that offer much more, but still require a certain degree of proficiency to use them.
You might ask why, and the answer is very technical – the location of a single version of a way is, in many cases, not defined. It is the reason that change tracking remained an expert discipline with relatively newbie-unfriendly tools. By changing the data model we will move away from that barrier, and subsequently we can expect substantially better tools, but not before we get proper coordinates and versions for ways.
Minutely Vector Tiles Generation
While there are quite a number of matured vector tile generators nowadays, a couple of problems are still open.
- One is which features shall go into the vector tiles for openstreetmap.org
- The other is how to reconcile minutely updates with vector tiles for performance at an acceptable level.
That task gets an order of magnitude easier if you can not only truly parallelise the generation of tiles, but also elide the first expensive step to figure out to which tile a changed way belongs.
We might be able to find someone who encapsulates the raw computing power necessary to do this. But even if so, this is a highly nondesirable degree of dependence on that partner.
So yes, vector tiles for openstreetmap.org are in principle possible without this data model change, but at a so much higher cost that only specialized hardware will be able to keep up with minutely changes.
Have Your Say about the Future
Some kind of change is inevitable. The growth of the OSM database is outpacing speed improvements in hardware, and the ID-based model means that the whole process cannot be parallelized with full speedup. Keeping up with changes was easily possible in the past, but needs needs more and more tricks now. There is a point in the future where also specialized hardware will suffice to keep up with minutely changes.
However, there are many possible approaches to meeting this challenge. Now is the opportunity for the developer community to share your opinion about the way forward.
The OpenStreetMap Foundation is a not-for-profit organisation, formed to support the OpenStreetMap Project. It is dedicated to encouraging the growth, development and distribution of free geospatial data for anyone to use and share. The OpenStreetMap Foundation owns and maintains the infrastructure of the OpenStreetMap project, is financially supported by membership fees and donations, and organises the annual, international State of the Map conference. Our volunteer Working Groups and small core staff work to support the OpenStreetMap project. Join the OpenStreetMap Foundation for just £15 a year or for free if you are an active OpenStreetMap contributor.