Tag Archives: servers

A Year of Infrastructure Progress: Site Reliability Engineer 2023/2024 Update

As the OpenStreetMap Foundation’s Senior Site Reliability Engineer (SRE), my focus in the OpenStreetMap Operations Team over the last year has been on driving efficiency, improving resiliency, and scaling our infrastructure to support the continued growth of the OpenStreetMap project. From cloud migration to server upgrades, we’ve made several improvements since last year to better position OpenStreetMap’s infrastructure to meet these resiliency and growth challenges.

Improving User Facing Services

Upgraded Rendering Services

The tile rendering infrastructure saw notable upgrades, including hardware and software optimisations, faster tile cache expiry to address vandalism, and automation to block non-attributing users. We now re-render low-zoom tiles daily, improving both performance and allowing a faster mapper feedback loop. The tile service is widely used and keeping up with demand is an ongoing challenge.

New Aerial Imagery Service

Launched a new aerial imagery service that supports GeoTIFF COGs. The service now hosts aerial.openstreetmap.org.za which is backed by 16TB of high-resolution imagery. The new service makes it easier to host additional imagery in the future.

Transition to Gmail Alternative & Spam Mitigation

After facing significant spam issues with the OSMF’s Google Workspace, I migrated OSMF email services to mailbox.org. This has reduced the spam volume and improved administrative efficiency. We’re also in the process of transitioning historical OSMF Google Docs data to a self hosted service.

Dealing with DDoS Attacks and Vandalism

This year, we faced several Distributed Denial of Service (DDoS) attacks, including a major DDoS for ransom incident, which was reported to law enforcement. These attacks tested our infrastructure, but we’ve implemented measures to strengthen our resilience and better protect against future threats.

We also dealt with large-scale vandalism that affected OpenStreetMap services. Thanks to the swift response and adjustments made by the Operations team, we’ve reinforced our infrastructure to better handle abuse and ensure continuous service.

Planet Data Hosting on AWS S3

With the OpenStreetMap Operations Team I’ve moved our planet data hosting to AWS S3 with mirrors in both the EU and US, allowing us to fully reinstate the back catalog of historical data. Through AWS’s OpenData sponsorship, replication diffs and planet data are now more accessible.

Making Systems Easier to Manage

Full AWS Infrastructure Management Using OpenTofu

With the OpenStreetMap Operations Team I’ve successfully migrated all manually managed AWS resources to Infrastructure-as-Code (IAC) using OpenTofu (formerly Terraform). This transition allowed us to improve cost efficiency, enhance security by adopting a least privilege IAM model, and gain better visibility into expenditures through detailed billing tags. Additionally, we’ve integrated S3 Storage Analytics to further optimise our costs, set up additional backups, and implemented enhanced lifecycle rules.

Improved Service Outage Alerting

We implemented SMS-based alerting for critical service outages, alongside a sponsored PagerDuty account. These improvements ensure quicker response times and better coordination during outages, with full integration with Prometheus/Alertmanager and Statuscake in the works.

Technical Debt reduction

This year, we made progress in reducing technical debt by moving several legacy services to more maintainable solutions. For instance, we containerised old services, including legacy State of the Map websites that were previously running poorly maintained WordPress installations. This transition has improved the scalability, security, and long-term maintainability of these services.

Additionally, we replaced our custom source installation of OTRS with a Znuny package installation from Debian. This shift simplifies upgrades and reduces the maintenance burden, ensuring the system remains up to date and secure without custom modifications.

Ensuring Infrastructure Resilience Despite Hardware Failures

Over the past year, we’ve maintained a resilient infrastructure even in the face of hardware failures. We replaced numerous disks and RAM, ensuring minimal disruption to services. Our bespoke monitoring system allows us to detect early signs of hardware failure, enabling us to act quickly and replace faulty components before they cause significant issues. This proactive approach has been key to maintaining system uptime and reliability.

Upgrading Infrastructure

Cross-Site Replication of Backups

To ensure robust disaster recovery, I’ve established cross-account, cross-region replication for AWS S3 backups, enabling point-in-time recovery. This safeguards critical data and services, even in the face of major failures, providing long-term peace of mind.

High Availability Infrastructure

Key hardware upgrades in our Amsterdam, Dublin, and OSUOSL sites improved performance, storage capacity, and network reliability. New switches were installed in 2022, and we’ve now finished setting up a high availability (HA) configurations to ensure improved service, which we have continued improve the setup by moving to dual diverse uplinks to our ISP for better resilience.

Debian Migration

We are migrating from Ubuntu to Debian 12 (Bookworm) as our standard distribution. All new servers now run on Debian. Our chef configuration management has been updated with test code to ensure ongoing compatibility. This transition marks a shift towards greater long-term stability and security. Mastodon post celebrating the transition.

Looking Ahead

The year ahead brings exciting new opportunities as we build on our progress. Key priorities for 2024 / 2025 include:

Engaging

Community Engagement & Outward Communication: Enhancing collaboration with the Communication Working Group (CWG) and improving our public-facing communication around service status and outages.

Improving Documentation and Onboarding: We’ll enhance onboarding documentation and conduct dedicated sessions to help new contributors get involved in operations more easily. This includes improving the reliability and coverage of our testing processes, ensuring smoother contributions and reducing the learning curve for new team members.

Planning and Optimizing

Capacity Planning for Infrastructure Growth: As OpenStreetMap and the demand on our services grow, we will ensure we can scale to meet demand. By anticipating future needs and balancing performance with cost-effective growth, we aim to maintain the service quality and availability our community expects.

Ongoing Cost Optimisation: We’ll continue to find ways to reduce costs by leveraging sponsorships like the AWS OpenData programme, ensuring sustainable operations.

Continuing to Reduce Technical Debt: We will continue simplifying our infrastructure by reducing the maintenance burden of legacy systems, such as increasing the use of containers. This will help streamline management tasks and allow us to focus on other improvements, making the infrastructure more efficient and scalable over time.

Continue Infrastructure Improvements

Implementation of High Availability Load Balancers: Rolling out the HA (VRRP + LVS + DSR) configuration for load balancers to improve system reliability and reduce potential downtime.

Finalising Prometheus Integration with PagerDuty: Completing the integration of Prometheus for monitoring and PagerDuty for streamlined alerting and incident response.

Complete the Transition to Full Debian Environment: Migrating all remaining services from Ubuntu to Debian for increased stability and security.

Enhancing Disaster Recovery & Backup Strategies: Further refining our recovery documentation and introducing additional backup measures across critical services are protected and recoverable in the event of failure.


OpenStreetMap tile CDN continues to grow

Since the last additions to our OpenStreetMap tile serving network in December, there has been a lot more server set-up going on.

osm-cdn-2015-03The German tile cache server tabaluga is now retired and is no longer serving tiles. This may sound like bad news, but quite the opposite! Tabaluga has been replaced with a new server, katie, which has taken over its job.
The new tile cache server katie is still located in Falkenstein, Germany, and still hosted by Hetzner.

More good news: There are two tile cache servers in Germany now!
The second tile cache server, konqi, is located in Jena, Germany, hosted by EUserv.

The Russian tile cache server gorynych just had a memory and SSD upgrade, and with this it can deliver even more content.

There is another new server in Hungary. With this Hungary becomes one of 12 countries hosting OSM CDN servers.

Tile cache server sarkany is located in Budapest, Hungary, hosted by szerverem.hu.

With all of these, the CDN (Content Delivery Network) server count comes to 16 active servers.

Tabaluga was running, thanks to Freerk Ohling, at Hetzner since May 2013, and served its last tiles in January. Freerk approached us back in April 2013 to suggest we implement EDNS client subnet support (implemented in December 2014) and to offer us a sponsored tile cache server. Now he has also kindly sponsored new tile cache servers in Germany.

Tabaluga primarily served traffic to visitors from Germany. Approximately 56 million map tiles per day. (avg 652/sec, peaking at 1245/sec). Serving close to 1TB of data per day. It was the highest traffic OSM tile cache server.

OpenStreetMap tiles are free for everyone to use, but should be used with moderation. If you are a high traffic site you should look at switch2osm.org to find out how to use the data and keep the tiles available for everyone.

The OpenStreetMap Foundation seeks additional distributed tile servers. If your organisation would like to donate a tile server and hosting, please see the Tile CDN requirements page on the wiki. You can also support OpenStreetMap by donating to the OpenStreetMap Foundation.

The OpenStreetMap Foundation is a not-for-profit organisation, formed in the UK to support the OpenStreetMap Project. It is dedicated to encouraging the growth, development and distribution of free geospatial data for anyone to use and share. The OpenStreetMap Foundation owns and maintains the infrastructure of the OpenStreetMap project.

Four New Tile Servers

Have you noticed faster tiles lately? Browsing the map on openstreetmap.org should now be even more responsive. Three new servers, started providing tiles over the last 2 weeks, joining a server which started earlier in the year.

osm-cdn-2015-01

Map tiles are delivered to users based on their GeoDNS location. The OpenStreetMap tile content delivery network (CDN) now supports EDNS-client-subnet to improve locating the closest region tile cache.

OpenStreetMap tiles are free for everyone to use, but should be used with moderation. If you are a high traffic site you should look into switch2osm.org to find out how to use the data and keep the tiles available for everyone.

Thanks to generous donations and active local community members, the OpenStreetMap distributed tile delivery infrastructure continues to grow.

The OpenStreetMap Foundation seeks additional distributed tile servers. If you would like to donate a tile server and hosting, please see the Tile CDN requirements page on the wiki. You can also support OpenStreetMap by donating to the OpenStreetMap Foundation.

The OpenStreetMap Foundation is a not-for-profit organization, formed in the UK to support the OpenStreetMap Project. It is dedicated to encouraging the growth, development and distribution of free geospatial data and to providing geospatial data for anyone to use and share. The OpenStreetMap Foundation owns and maintains the infrastructure of the OpenStreetMap project.

Two More New Tile Servers

Thanks to generous donations and active members of the OpenStreetMap community, OpenStreetMap infrastructure continues to grow.

A new tile server, Trogdor, has been added to the OSM tile cache network. Located in Amsterdam, The Netherlands, Trogdor is currently serving tiles to IP addresses from The Netherlands, Belgium and several other central European and central African countries.

A second new tile server, Ridgeback, has also been added to the OpenStreetMap tile cache network. Located in Oslo, Norway, Ridgeback is currently serving tiles to IP addresses from Finland, Iceland, the Faroe Islands and several others.

The list of countries served by any tile server will change over time due to expansion of the tile server network, loading, maintenance activities and other factors.

Map tiles are delivered to users based on their GeoDNS location. The OpenStreetMap Foundation seeks additional distributed tile servers. If you would like to donate a tile server and hosting, please see the Tile CDN requirements page on the wiki.

We would like to thank Blix Solutions AS for this generous donation to OpenStreetMap infrastructure.

The OpenStreetMap Foundation is a not-for-profit organization, formed in the UK to support the OpenStreetMap Project. It is dedicated to encouraging the growth, development and distribution of free geospatial data and to providing geospatial data for anyone to use and share. The OpenStreetMap Foundation owns and maintains the infrastructure of the OpenStreetMap project. You can support OpenStreetMap by donating to the OpenStreetMap Foundation.

Photo Credit. This photo of the Oslo tile cache server is kindly provided by Blix Solutions AS, licensed CC-By-SA and used by permission.

New Tile Server in Pau, France

Thanks to generous donations and active members of the OpenStreetMap community, OpenStreetMap infrastructure continues to grow.

A new tile server, Lurien, has been added to the OSM tile cache network. Located in Pau, Pyrénées-Atlantiques, France, Lurien is currently serving tiles to IP addresses from France, Spain, Portugal, Andorra, Gibraltar, Italy, Monaco, San Marino and Vatican.

Lurien, highlighted.

Map tiles are delivered to users based on their GeoDNS location. The OpenStreetMap Foundation seeks additional distributed tile servers. If you would like to donate a tile server and hosting, please see the Tile CDN requirements page on the wiki.

We would like to thank PauLLA with support of Université de Pau et des Pays de l’Adour (UPPA) for the server and connectivity and Communauté d’Agglomération de Pau Pyrénées (CDAPP) for the data centre hosting. We would also like to thank OpenStreetMap contributor Christophe Merlet for arranging the donation.

The OpenStreetMap Foundation is a not-for-profit organization, formed in the UK to support the OpenStreetMap Project. It is dedicated to encouraging the growth, development and distribution of free geospatial data and to providing geospatial data for anyone to use and share. The OpenStreetMap Foundation owns and maintains the infrastructure of the OpenStreetMap project. You can support OpenStreetMap by donating to the OpenStreetMap Foundation.

More new servers

Dragon sculpture on the Dragon Bridge in Ljubljana. Photo CC-By-SA, dani_7C3


The OpenStreetMap Foundation, and the Operations Working Group, would like to thank Nokia UK Limited for the donation of some of their redundant server hardware. This hardware has found new purpose in the form of “soup“[1] and “fiddlestick“[2], two new web front end servers. A third server “eustace“[3] will be used initially as a trial web statistics server.

The web front-end servers, soup and fiddlestick, replace puff and fuchur who had performed that role since 2008. Web front-end servers in OpenStreetMap provide the data browser and data layer, as well as user diaries and other “social” functions.

Eustace will debut in a new role for OpenStreetMap by collecting web statistics. The OpenStreetMap Foundation wants to know more about how users experience the OSM web site in an effort to improve the way that OSM services are delivered.

[1] Character from The Clangers, a UK children’s TV programme.
[2] Strangewood (1999): Fiddlestick, a small musically emotive dragon.
[3] Turns into a dragon in The Voyage of the Dawn Treader (Chronicles of Narnia) after slipping on a gold bracelet.

Introducing Zark

Zark, during installation.

Zark is the newest OpenStreetMap server. Give Zark a warm welcome. Continuing the in the tradition of naming OSM servers after dragons, the name “Zark” is taken from the Eidolon Chronicles/Shadow World books by Jane Johnson.

The first task for Zark will be to serve as a trial / evaluation server for the OWL – OpenStreetMap Watch List service. OWL’s popularity on the dev server has lead to performance problems and long update delays. After more than a year of development and increasing popularity of OWL’s ability to follow local changes without distracting “Big” changesets, moving OWL to Zark will make this service even more effective for mappers.

Many thanks to bitfolk.com for donating this server.