GTFSwizard logo

Lifecycle: experimental

CRAN/METACRAN Total downloads

GTFSwizard is a set of tools for exploring and manipulating General Transit Feed Specification (GTFS) files in R.

Its main purpose is to provide researchers and practitioners with a seamless and easy way to visually explore and simulate changes within a GTFS file, which represent public transportation schedules and geographic data. The package allows users to filter data by routes, trips, stops, and time, generate spatial visualizations, and perform detailed analyses of transit networks, including headway, dwell times, and route frequencies. Editing functions to delay and split trips, and to merge distictins GTFS, are available. This is an ongoing work and new features are planned to be implemented soon.

Installation

install.packages("GTFSwizard")
library(GTFSwizard)

# dev version:
install.packages('remotes') # if not already installed
# wait for the installation to complete

remotes::install_github('OPATP/GTFSwizard@main')
library(GTFSwizard)

Basics

GTFS feeds are read using the read_gtfs() function. read_gtfs() returns a wizardgtfs object, which is a slightly improved gtfs object. You can also convert a regular gtfs object to a wizardgtfs object using the as_wizardgtfs() function. You can take a glance at the feed using the base summary() function.

library(GTFSwizard)

gtfs <- GTFSwizard::read_gtfs('path-to-gtfs.zip') # or
gtfs <- GTFSwizard::as_wizardgtfs(gtfs_obj)

names(gtfs)
# [1] "agency"          "calendar"
# [3] "calendar_dates"  "fare_attributes"
# [5] "fare_rules"      "routes"
# [7] "shapes"          "stop_times"
# [9] "stops"           "trips"
# [11] "dates_services"

class(gtfs)
# [1] "wizardgtfs" "gtfs" "list"

summary(gtfs)
#A wizardgtfs object with:  

#10  GTFS tables 
#With the following names and respective numbers of entries in each: 
#         agency        calendar  calendar_dates fare_attributes 
#              1               3               6               2 
#     fare_rules          routes          shapes      stop_times 
#            345             345          125776         2659737 
#          stops           trips 
#           4676           85410 
#345  routes 
#4676  stops 
#85410  trips 
#823  valid days of service 
#271.4  meters is the average distance between sequencial stops in a given route 

GTFS feeds are mainly browsed using the explore_gtfs() function. It pops-up an informative and interactive dashboard.

GTFSwizard::explore_gtfs(for_bus_gtfs)

Service Patterns

The concept of a service_pattern in GTFSwizard helps to address a common limitation of GTFS: its lack of a standardized way to distinguish distinct service patterns within the same route. GTFS files can have multiple service_ids for trips within the same route on the same day, such as regular and extra services. However, GTFS does not inherently identify unique service patterns, i.e. unique set of service_ids.

In wizardgtfs objects, the dates_services table is an extended feature that consolidates dates and associated service_ids into a single, organized table. This table is not standard in typical GTFS files but is added specifically in wizardgtfs objects. The dates_services table is structured so that each date is associated with a list of service_ids representing the transit services operating on that specific day. Essentially, each unique list of service_ids observed across dates defines a distinct service pattern. It is common to observe at least 3 service patterns: weekdays, saturdays and sundays.

You can check service_pattern using the get_servicepattern() function.

GTFSwizard::get_servicepattern(for_bus_gtfs)
## A tibble: 3 × 3
#  service_id service_pattern  pattern_frequency
#  <chr>      <fct>                        <int>
#1 U          servicepattern-1                65
#2 S          servicepattern-2                13
#3 D          servicepattern-3                13

Most of the functions will account for service_patterns, e.g. get_frequency() and plot_routefrequency(). The former arrange service_pattern from most frequent (typical day) to less frequent (rarer day), while the latter highligths the most frequent service pattern.

GTFSwizard::get_frequency(for_bus_gtfs)
## A tibble: 667 × 4
#   route_id daily.frequency service_pattern  pattern_frequency
#   <chr>              <int> <fct>                        <int>
# 1 004                   66 servicepattern-1                65
# 2 011                   53 servicepattern-1                65
# 3 011                   30 servicepattern-2                13
# 4 011                   17 servicepattern-3                13
# 5 012                   53 servicepattern-1                65
# 6 012                   28 servicepattern-2                13
# 7 012                   20 servicepattern-3                13
# 8 014                   30 servicepattern-1                65
# 9 014                   19 servicepattern-2                13
#10 014                   19 servicepattern-3                13
## ℹ 657 more rows
## ℹ Use `print(n = ...)` to see more rows

GTFSwizard::plot_routefrequency(for_bus_gtfs, route = for_bus_gtfs$routes$route_id[3])

You can use plot_calendar() to check the numer of trips along the calendar and get a better sense of the service_pattern rationale.

GTFSwizard::plot_calendar(for_bus_gtfs)

Exploring

Routes, frequency, headways, dell times, speeds, shapes, stops, durations, distances, and fleet are retrieved using the get_frequency(), the get_headways(), the get_dwelltimes(), the get_duration(), the get_distances(), the get_speed(), and the get_fleet() functions. These functions support several methods, such as by.trip or detailed. Refer to documentation ? for more information.

GTFSwizard::get_headways(for_bus_gtfs, method = 'by.hour')
#This method assumes constant headways along stops.
## A tibble: 75 × 5
#   hour  trips average.headway service_pattern  pattern_frequency
#   <chr> <int>           <dbl> <fct>                        <int>
# 1 00       15           5248  servicepattern-1                65
# 2 00       15           5248  servicepattern-2                13
# 3 00       15           5248  servicepattern-3                13
# 4 01       15           5848  servicepattern-1                65
# 5 01       14           6107. servicepattern-2                13
# 6 01       14           6060  servicepattern-3                13
# 7 02        6          10610  servicepattern-1                65
# 8 02        6           4320  servicepattern-2                13
# 9 02        5           3132  servicepattern-3                13
#10 03        7           3720  servicepattern-1                65
## ℹ 65 more rows
## ℹ Use `print(n = ...)` to see more rows

GTFSwizard::get_dwelltimes(for_bus_gtfs, max.dwelltime = 60, method = 'by.trip')
## A tibble: 52,304 × 5
#   route_id trip_id          average.dwelltime service_pattern  pattern_frequency
#   <chr>    <chr>                        <dbl> <fct>                        <int>
# 1 004      U004-T01V01B01-I                 0 servicepattern-1                65
# 2 004      U004-T01V01B01-V                 0 servicepattern-1                65
# 3 004      U004-T01V02B01-I                 0 servicepattern-1                65
# 4 004      U004-T01V02B01-V                 0 servicepattern-1                65
# 5 004      U004-T01V03B01-I                 0 servicepattern-1                65
# 6 004      U004-T01V03B01-V                 0 servicepattern-1                65
# 7 004      U004-T01V04B01-I                 0 servicepattern-1                65
# 8 004      U004-T01V04B01-V                 0 servicepattern-1                65
# 9 004      U004-T02V01B01-I                 0 servicepattern-1                65
#10 004      U004-T02V01B01-V                 0 servicepattern-1                65
## ℹ 52,294 more rows
## ℹ Use `print(n = ...)` to see more rows

GTFSwizard::get_durations(for_bus_gtfs, method = 'detailed')
## A tibble: 1,398,251 × 8
#   route_id trip_id          hour  from_stop_id to_stop_id duration service_pattern 
#   <chr>    <chr>            <chr> <chr>        <chr>         <dbl> <fct>           
# 1 011      D011-T01V01B01-I 05    7038         1015             60 servicepattern-3
# 2 011      D011-T01V01B01-I 05    1015         4251             60 servicepattern-3
# 3 011      D011-T01V01B01-I 05    4251         990              60 servicepattern-3
# 4 011      D011-T01V01B01-I 05    990          6182             60 servicepattern-3
# 5 011      D011-T01V01B01-I 05    6182         989             120 servicepattern-3
# 6 011      D011-T01V01B01-I 05    989          1600            120 servicepattern-3
# 7 011      D011-T01V01B01-I 05    1600         1608             60 servicepattern-3
# 8 011      D011-T01V01B01-I 05    1608         4767            120 servicepattern-3
# 9 011      D011-T01V01B01-I 05    4767         6450             60 servicepattern-3
#10 011      D011-T01V01B01-I 05    6450         2055            120 servicepattern-3
## ℹ 1,398,241 more rows
## ℹ 1 more variable: pattern_frequency <int>
## ℹ Use `print(n = ...)` to see more rows

GTFSwizard::get_distances(for_bus_gtfs, method = 'by.trip')
## A tibble: 52,304 × 5
#   route_id trip_id          distance service_pattern  pattern_frequency
#   <chr>    <chr>                 [m] <fct>                        <int>
# 1 011      D011-T01V01B01-I   17348. servicepattern-3                13
# 2 011      D011-T01V02B01-I   17348. servicepattern-3                13
# 3 011      D011-T01V03B01-I   17348. servicepattern-3                13
# 4 011      D011-T01V04B01-I   17348. servicepattern-3                13
# 5 011      D011-T01V05B01-I   17348. servicepattern-3                13
# 6 011      D011-T01V06B01-I   17348. servicepattern-3                13
# 7 011      D011-T01V07B01-I   17348. servicepattern-3                13
# 8 011      D011-T01V08B01-I   17348. servicepattern-3                13
# 9 011      D011-T01V09B01-I   17348. servicepattern-3                13
#10 011      D011-T01V10B01-I   17348. servicepattern-3                13
## ℹ 52,294 more rows
## ℹ Use `print(n = ...)` to see more rows

GTFSwizard::get_distances(for_bus_gtfs, method = 'by.route')
## A tibble: 667 × 5
#   route_id trips average.distance service_pattern  pattern_frequency
#   <chr>    <int>              [m] <fct>                        <int>
# 1 004         66           18664. servicepattern-1                65
# 2 011         53           17348. servicepattern-1                65
# 3 011         30           17348. servicepattern-2                13
# 4 011         17           17348. servicepattern-3                13
# 5 012         53           17013. servicepattern-1                65
# 6 012         28           17013. servicepattern-2                13
# 7 012         20           17013. servicepattern-3                13
# 8 014         30           11897. servicepattern-1                65
# 9 014         19           11897. servicepattern-2                13
#10 014         19           11897. servicepattern-3                13
## ℹ 657 more rows
## ℹ Use `print(n = ...)` to see more rows

GTFSwizard::get_speeds(for_bus_gtfs, method = 'by.route')
## A tibble: 667 × 5
#   route_id trips average.speed service_pattern  pattern_frequency
#   <chr>    <int>         <dbl> <fct>                        <int>
# 1 004          1          17.4 servicepattern-1                65
# 2 011          3          13.2 servicepattern-1                65
# 3 011          3          15.0 servicepattern-2                13
# 4 011          3          15.7 servicepattern-3                13
# 5 012          3          14.6 servicepattern-1                65
# 6 012          3          14.3 servicepattern-2                13
# 7 012          3          17.8 servicepattern-3                13
# 8 014          3          12.4 servicepattern-1                65
# 9 014          3          14.3 servicepattern-2                13
#10 014          3          14.3 servicepattern-3                13
## ℹ 657 more rows
## ℹ Use `print(n = ...)` to see more rows

GTFSwizard::get_fleet(for_bus_gtfs, method = 'peak')
#This method returns the number of simultaneous trips for the three busiest hours.
## A tibble: 9 × 4
## Groups:   service_pattern [3]
#   hour fleet service_pattern  pattern_frequency
#  <dbl> <int> <fct>                        <int>
#1     7  1296 servicepattern-1                65
#2     6  1295 servicepattern-1                65
#3     4  1294 servicepattern-1                65
#4     7   783 servicepattern-2                13
#5     6   782 servicepattern-2                13
#6     4   780 servicepattern-2                13
#7     7   381 servicepattern-3                13
#8     6   380 servicepattern-3                13
#9     4   374 servicepattern-3                13

Filtering

Filtering tools allows customized GTFS data by service patterns, specific dates, service IDs, route IDs, trip IDs, stop IDs, and time ranges. These filter_ functions help retain only the relevant data, making analysis easier and more focused.

# Filter by service pattern
filtered_gtfs <- GTFSwizard::filter_servicepattern(for_bus_gtfs, "servicepattern-2")

# Filter by specific date
filtered_gtfs <- GTFSwizard::filter_date(for_bus_gtfs, "2023-01-01")

# Filter by route ID, retaining only specified routes
filtered_gtfs <- GTFSwizard::filter_route(for_bus_gtfs, for_bus_gtfs$routes$route_id[1:2])

# Filter by trip ID, excluding specified trips
filtered_gtfs <- GTFSwizard::filter_trip(for_bus_gtfs, for_bus_gtfs$trips$trip_id[1:2], FALSE)

# Filter by time range
filtered_gtfs <- GTFSwizard::filter_time(gtfs = for_bus_gtfs, "06:30:00", "10:00:00")

# Spatial filter using filter_stop
spatial.filter <- GTFSwizard::get_shapes_sf(for_bus_gtfs$shapes)

stops <- sf::st_filter(GTFSwizard::get_stops_sf(for_bus_gtfs$stops),
                       spatial.filter) |>
          dplyr::pull(stop_id)

filtered_gtfs <- GTFSwizard::filter_stop(for_bus_gtfs, stops)

Visualizing

GTFSwizard offers interactive plotting tools to analyze trip frequencies and headways, providing insight into peak hours, route-specific patterns, and system-wide trends.

Editing

GTFSwizard provides functions to edit GTFS data directly - for delaying, splitting, and merging trips. The delay_trip() function allows users to apply a delay to specific trips. The split_trip() function equally divides a trip in split number of points, creating split + 1 separate trips. This can be useful for analyzing partial routes or for simulating route adjustments. The merge_gtfs() function combines two GTFS files, allowing for the integration of distinct GTFS datasets into a single dataset.

# Delay trips by 5 minutes (300 seconds)
delayed_gtfs <- delay_trip(for_bus_gtfs, trip_id = for_bus_gtfs$trips$trips_id[1:2], delay = 300)

# Split a trip in 3 sections (2 splits)
split_gtfs <- split_trip(for_bus_gtfs, trip_id = for_bus_gtfs$trips$trip_id[1:2], split = 2)

# Merge two GTFS files into one
merged_gtfs <- merge_gtfs(for_bus_gtfs, for_rail_gtfs)

Please note that split_trip() uses stop sequences to recreate the shapes table of split trips; accordingly, it should not be used after filter_time(), as this function removes invalid stop_times.

Feeds are, then, exported using the write_gtfs() function. It saves a standard GTFS .zip file, located as declared.

GTFSwizard::write_gtfs(for_bus_gtfs, 'path-to-file.zip')

Handling Geographic Data

GTFSwizard autodetects and reconstructs missing shape tables using the get_shapes() function. Variations of this function can create simple feature objects from stops or shapes tables, using get_stops_sf() or get_shapes_sf() functions, or even standard GTFS shapes data frame tables from simple feature shapes objects, using get_shapes_df() function. Please note that get_shapes() uses stop sequences to recreate the shapes table; accordingly, it should not be used after filter_time(), as this function removes invalid stop_times.

gtfs <- for_bus_gtfs

gtfs$shapes <- NULL

gtfs$shapes
#NULL

gtfs <- GTFSwizard::get_shapes(gtfs)

gtfs$shapes
## A tibble: 6,830 × 5
#   shape_id shape_pt_lon shape_pt_lat shape_pt_sequence shape_dist_traveled
#   <chr>           <dbl>        <dbl>             <int>               <dbl>
# 1 shape-1         -38.7        -3.96                 1                  0 
# 2 shape-1         -38.7        -3.95                 2                499.
# 3 shape-1         -38.7        -3.96                 3               1428 
# 4 shape-1         -38.7        -3.98                 4               3404.
# 5 shape-1         -38.7        -3.97                 5               5490.
# 6 shape-1         -38.7        -3.97                 6               6248.
# 7 shape-1         -38.7        -3.98                 7               7871.
# 8 shape-1         -38.8        -3.97                 8              11398.
# 9 shape-1         -38.7        -3.97                 9              13093.
#10 shape-1         -38.8        -3.97                10              13395 
## ℹ 6,820 more rows
## ℹ Use `print(n = ...)` to see more rows

GTFSwizard::get_shapes_sf(for_bus_gtfs$shapes)
#Simple feature collection with 509 features and 2 fields
#Geometry type: LINESTRING
#Dimension:     XY
#Bounding box:  xmin: -38.65624 ymin: -3.897338 xmax: -38.41016 ymax: -3.694365
#Geodetic CRS:  WGS 84
# A tibble: 509 × 2
#   shape_id                                                                        geometry
#   <chr>                                                                   <LINESTRING [°]>
# 1 shape004-I (-38.50181 -3.831178, -38.50185 -3.831165, -38.50242 -3.831016, -38.50177 -3…
# 2 shape004-V (-38.4845 -3.738462, -38.48455 -3.738554, -38.48457 -3.738583, -38.48468 -3.…
# 3 shape011-I (-38.53249 -3.722036, -38.53255 -3.722017, -38.53286 -3.721924, -38.53297 -3…
# 4 shape012-V (-38.53251 -3.722151, -38.53242 -3.722182, -38.53195 -3.722339, -38.53178 -3…
# 5 shape014-V (-38.53121 -3.757169, -38.53111 -3.757062, -38.53102 -3.756946, -38.53095 -3…
# 6 shape015-I (-38.60755 -3.772986, -38.60752 -3.772894, -38.60741 -3.772499, -38.60732 -3…
# 7 shape015-V (-38.58454 -3.737546, -38.58455 -3.73752, -38.58455 -3.737483, -38.58459 -3.…
# 8 shape016-I (-38.58995 -3.709432, -38.59 -3.709269, -38.59003 -3.70917, -38.59009 -3.708…
# 9 shape016-V (-38.48468 -3.738597, -38.48476 -3.738568, -38.48484 -3.738538, -38.48563 -3…
#10 shape017-I (-38.47904 -3.796202, -38.47898 -3.796908, -38.47975 -3.796987, -38.47983 -3…
## ℹ 499 more rows
## ℹ Use `print(n = ...)` to see more rows

GTFSwizard::get_stops_sf(for_bus_gtfs$stops)
#Simple feature collection with 4793 features and 3 fields
#Geometry type: POINT
#Dimension:     XY
#Bounding box:  xmin: -38.64515 ymin: -3.894503 xmax: -38.4108 ymax: -3.694365
#Geodetic CRS:  WGS 84
## A tibble: 4,793 × 4
#   stop_id stop_name                      wheelchair_boarding              geometry
#   <chr>   <chr>                                        <int>           <POINT [°]>
# 1 10      AVENIDA CEL CARVALHO, 1135                      NA  (-38.5906 -3.702286)
# 2 1000    RUA DOR JOAO MOREIRA, 489                       NA (-38.52899 -3.722157)
# 3 1001    RUA DOR JOAO MOREIRA, 163                       NA (-38.52601 -3.723169)
# 4 1009    AVENIDA TRISTÃO GONÇALVES, 500                  NA (-38.53285 -3.725926)
# 5 1010    AVENIDA IMPERADOR, 550                          NA (-38.53421 -3.726137)
# 6 1013    AVENIDA IMPERADOR, 262                          NA  (-38.53348 -3.72369)
# 7 1014    RUA SAO PAULO, 955                              NA (-38.53403 -3.723879)
# 8 1015    RUA GUILHERME ROCHA, 1048                       NA (-38.53547 -3.724543)
# 9 1016    RUA GAL SAMPAIO, 951                            NA  (-38.53078 -3.72682)
#10 102     RUA BETA, 202                                   NA  (-38.60078 -3.71436)
## ℹ 4,783 more rows
## ℹ Use `print(n = ...)` to see more rows

Objects

GTFSwizard features two toy examples, a small for_rail_gtfs wizardgtfs object, and a rather bigger for_bus_gtfs wizardgtfs object. They are real GTFS samples, the first being the urban subway system, and the second one the regular bus system; both for the city of Fortaleza, Brazil, on the 2020’s.

gtfs <- GTFSwizard::for_bus_gtfs

plot(gtfs)

plot.for_gtfs

Cheat Sheet

Under development…

Contributing

Contributions are welcome! To report a bug, suggest a feature, or contribute code, please use the repository’s Issues.

GTFSwizard mainly relies on dplyr and sf for data wrangling, leaflet for map rendering, ggplot2 and plotly for data visualization, and shiny for the explore_gtfs() application assembling.

Citation

To cite package ‘GTFSwizard’ in publications use:

A BibTeX entry for LaTeX users is

  @Manual{quesado.guimaraes.2024,
    title = {GTFSwizard: Exploring and Manipulating GTFS Files},
    author = {Nelson de O. {Quesado Filho} and Caio G. C. Guimarães},
    year = {2024},
    note = {R package version 1.0.0},
    url = {https://cran.r-project.org/package=GTFSwizard},
    doi = {10.32614/CRAN.package.GTFSwizard},
  }

Acknowledgement OPA-TP

GTFSwizard is developed by Nelson Quesado and Caio Guimarães at OPA-TP research group, Universidade Federal do Ceará.