Strategy Brief: GTFS Management at Scale

Ryan Mahoney

Executive Summary

For public transit leaders, this strategy is essential to ensure accurate service information, making public transit easier to use and building trust in transit service.

Managing GTFS data at scale requires a nuanced and strategic approach. This strategy brief outlines the challenges large transit agencies face, emphasizing the critical role GTFS plays in delivering accurate information to riders. It highlights the risks of misinformation and recommends a division of labor to optimize project management and engineering duties while also warning against the potential pitfalls of premature optimization and the importance of advanced data processing tools, like Polars, to enhance software performance.

Defining The Problem

The GTFS standard, or General Transit Feed Specification, plays a crucial role in simplifying public transportation. It enables agencies to publish their schedules and geographic information in a format that is easily utilized by external mobile applications and maps, as well as a transit agency’s own rider-facing digital tools like websites and digital screens at stations. For digital systems, GTFS is an essential source of truth, providing riders with accurate information that makes public transit easier to use.

Large and small transit agencies each face unique challenges in managing GTFS. This policy is aimed at large transit agencies or state departments that could provide GTFS management services to smaller agencies.

The broadest problem with managing GTFS is the challenge of combining structured and unstructured information from various sources while maintaining its accuracy and integrity under non-negotiable deadlines. For example, bus schedules might exist in a digital format proprietary to a vendor-provided scheduling operations management software, while plans about a recurring weekend station closure exist in a Word document. If the station closure starts in a week and the bus schedule has just been revised to incorporate school trips, it may be challenging for a person or a team to apply these changes ahead of time, which is essential for public transit riders to adjust their travel plans.

Due to the multiplicity of transit agency data systems (which include a mix of vendor-provided and ad hoc tools) and agency-specific business operations where multiple departments may influence service and disruption planning, there is no widely established software tool for managing GTFS at scale. At each large transit agency, this is a manual process that utilizes varying degrees of automation for data processing, validation, and publishing.

Join the Transit Tech Insiders!
Stay updated about the latest strategy briefs, tech strategies, and innovative ideas to enhance rider experience and support operations staff.

Knowing the Risks

  • To The Public: When GTFS data fails to reflect the actual transit service, riders face unexpected changes or non-existent services, leading to confusion and frustration. For instance, a rider may have to walk farther due to a surprise bus shuttle replacing subway service, or another might find themselves stranded because of an unannounced elevator closure.
  • To The Agency: Inaccurate service information erodes rider trust, prompting those with alternative options to abandon public transit. This trust deficit fuels a downward spiral of declining ridership and reputational damage, threatening the future viability of the transit system.
  • To Internal Digital Teams: Agency leaders often underestimate the difficulty of maintaining accurate transit data. Delays or errors in updates can undermine digital teams, making it harder to secure necessary budget and leadership support for essential technology initiatives.
  • To Technology Staff: The demanding, high-stakes nature of GTFS management leads to burnout among tech and project management staff. The pressure to maintain accurate data with limited resources and extreme expertise can drive talented individuals to seek other opportunities, amplifying the risk of operational failures.
  • To Sustainable Operations: The technical and repetitive nature of GTFS management often means that budget-strapped agencies rely on a single individual for this critical task. This creates a vulnerability, as the absence of this key person can result in significant disruptions.

Strategy Overview

To meet the challenges of managing GTFS at scale, we recommend a multifaceted approach to address each dimension.

Continuous Learning to Promote Subject Matter Expertise
The public transit tech industry is small and it is rare to meet experts in GTFS management. As such, agencies must develop internal staff and provide adequate training to new hires. The following topics should be covered:

GTFS Fundamentals:

  • Structure and components of GTFS feeds (static and real-time)
  • Data formats and standards (CSV, JSON)

Public Transit Operations:

  • Transit scheduling and planning
  • Service changes and disruptions management
  • Translation of conceptual service changes to actionable tasks

Data Management:

  • Understanding the limitations of consumers
    • For example, the delays in Google applying GTFS updates
  • Data collection and validation
  • Data integration and consistency
  • Quality assurance and control

Technology and Software:

  • Transit scheduling and operations management software (e.g., HASTUS, Trapeze)
  • GTFS editing and management tools (e.g., Transit Editor, OpenTripPlanner)
  • APIs and web services for data exchange

Geospatial Information Systems (GIS):

  • Mapping and spatial analysis
  • Coordinate systems and geocoding

Project Management:

  • Agile and traditional project management methodologies
  • Resource allocation and budgeting
  • Risk management and mitigation
  • Negotiating and scoping GTFS changes

Intradepartmental Collaboration:

  • Coordinating between different operational departments
  • Working with or within IT
  • Understanding the perspective of diverse stakeholders

Division of Labor
In the high-stakes case of GTFS management, division of labor is not just beneficial—it’s essential. Separating project management from engineering responsibilities ensures that each role can be performed with maximum efficiency and expertise. Project managers can focus on the broad strokes: scoping projects, planning work, and managing stakeholders, ensuring that deadlines are met and resources are allocated effectively. Engineers, on the other hand, can concentrate on the intricate details of GTFS changes, utilizing their specialized skills without the distraction of administrative tasks. This clear delineation of responsibilities not only enhances productivity but also reduces the risk of errors, ultimately leading to more sustainable operations and more accurate transit information.

Avoid Premature Optimization
Among technologists, this recommendation might not be popular, but many software engineers may consider a systematic approach to managing GTFS changes, believing it can significantly reduce the effort involved. However, this could be a case of premature optimization, depending on the technology team’s maturity and capacity. The danger of optimizing too early is that building such a system consumes time that could be spent making GTFS changes. Additionally, if the system doesn’t support a specific case, the change could end up taking longer than if it were done manually due to the complexity of adjusting the system. A good indicator that a systematic approach might be viable is if GTFS operations are already running efficiently. This suggests a mature understanding of the problem domain, making the implementation of a system more likely to succeed.

Targeted Job Roles in Detail

While a team tasked with GTFS management work may have more than two members, two clearly defined roles are required for efficient operations: GTFS Project Manager and GTFS Engineer.

GTFS Project Manager

A GTFS project manager handles the typical project management responsibilities, such as scoping, planning, and stakeholder management. However, to be truly effective, they must also possess a deep understanding of transit operations and internal organizational structures.

Consider this: even the most skilled GTFS engineers will struggle if they are unaware of deadlines, rely on incomplete information, or commit to work beyond their capacity. Success hinges on a project manager’s ability to navigate these challenges.

One harsh reality of GTFS projects is that external factors can sometimes delay progress. A savvy project manager must anticipate this and collaborate with engineers and stakeholders to prioritize and sequence changes, ensuring the most critical components are delivered on time.

Managing GTFS projects involves more than just keeping track of deadlines. It requires timely notifications, complete information, and the ability to translate requests into intermediate formats to save engineering time.

Should a GTFS engineer double as a project manager? Ideally, no. But if staffing constraints demand it, the individual must excel in both functional skills and personal time management, seamlessly switching between contexts.

GTFS Engineer

Like the GTFS Project Manager, having subject matter expertise is essential for this role. While this position is a form of software engineering, it is not interchangeable with other software engineering roles in public transit. Many software engineering positions in transit are “full stack,” involving website development and similar skills. In contrast, this role is focused specifically on data transformation. Many software engineers find GTFS work less enjoyable due to its intricate nature and significant data entry component. Although it requires genuine software engineering skills, it lacks the variety found in most software engineering roles.

A successful GTFS Engineer understands the unique demands of the role and appreciates the critical impact they have on supporting public transit riders in their region.

While automated testing and validation are important in all software engineering roles, in GTFS engineering, attention to detail, communication, and conscientiousness are paramount. Unlike other roles where Agile methodologies allow for flexible deadlines and emergent issues, the GTFS Engineer must communicate frequently about their progress and ability to meet deadlines. They must also be adept at finding creative solutions to problems or reducing scope in a high-stakes environment where timely delivery is non-negotiable.

New Data Processing Approaches

The programming language Python is widely used for data processing work, including GTFS changes, due to its ease of use, extensive libraries, versatility, and popularity. A key part of that ecosystem is Pandas, a widely used data analysis library. While Pandas is a viable choice for GTFS data processing, it has known limitations that become more prominent at scale. Specifically, Pandas is typically slower at certain types of data processing used in GTFS, such as string manipulation. This can create bottlenecks and slow down the development process, as processing jobs might take several minutes to complete—a significant problem for iterative development approaches.

Fortunately, newer libraries like Polars can significantly improve the performance of GTFS management operations. Polars is designed to offer high-performance data processing capabilities, making it particularly beneficial for handling the large and complex datasets typical in GTFS feeds. Unlike traditional libraries like Pandas, Polars is built on Apache Arrow and Rust, providing faster execution and lower memory usage.

Polars is also available in other languages, such as Elixir, making it a versatile option worth considering for any technology team managing significant GTFS processing workloads.