Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTFS stop duplication across modes #385

Closed
carlhiggs opened this issue Mar 4, 2024 · 1 comment
Closed

GTFS stop duplication across modes #385

carlhiggs opened this issue Mar 4, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@carlhiggs
Copy link
Member

Describe the bug
When identifying stops from a particular agency's GTFS feed, and particularly in the case of agencies with multiple modes of service (e.g. tram, rail, funicular, bus, etc), the GTFS analysis currently analyses each mode seperately, but using each stop regardless of mode.

This doesn't impact the currently implemented analysis results for several reasons:

  • we only consider access to closest, regardless of mode, and not count of stops
  • frequency analysis draws on mode-related information on routes and trips to determine stop frequency

The main impact is in the displayed total stop counts (these include all stops in feed), and the duplication of stops across modes in the pt_stops_headway SQL and geopackage table.

This issue was reported by @marcdmallafre, using the enhancements branch of the code here

To Reproduce
perform GTFS analysis using the above version of code using a GTFS feed with multiple modes of transport

Expected behavior
stops analysed by mode should relate only to that mode given the available information

Screenshots
For example, in our urban Barcelona study region there should only be two Funicular stops (according to the information we are using), operated by FGC; however, in screen shot below there were 60 unique stops. This was an error, and also included stops that were bus, metro and rail services, also operated by FGC. The two correct stops that were funicular services are displayed in pink:
image

This meant that there were 240 stops attributed to FGC (60* 5 modes), instead of 85 (60 unique stops, with some supporting seperate metro and rail services).

As part of addressing this issue, the output information provided to users should be improved to provide more guidance, and omit unnecessary information.

For example, instead of output like this:

  - /home/ghsci/process/data/transit_feeds/spain_gtfs/20231009_130209_FGC_Catalunya)
  - 20230101 to 20231231)
  - ['07:00:00', '19:00:00']


     Tram                  0/60 (0.0%) stops identified with departure times (0.06 seconds)
     Metro                24/60 (40.0%) stops identified with departure times (0.47 seconds)
     Rail                 53/60 (88.3%) stops identified with departure times (1.08 seconds)
     Bus                   3/60 (5.0%) stops identified with departure times (0.09 seconds)
     Ferry                 0/60 (0.0%) stops identified with departure times (0.06 seconds)
     Cable tram            0/60 (0.0%) stops identified with departure times (0.06 seconds)
     Aerial lift           0/60 (0.0%) stops identified with departure times (0.06 seconds)
     Funicular             2/60 (3.3%) stops identified with departure times (0.17 seconds)
     Trolleybus            0/60 (0.0%) stops identified with departure times (0.06 seconds)
     Monorail              0/60 (0.0%) stops identified with departure times (0.06 seconds)

Better, would be output like this:

Commencing GTFS analysis.  GTFS analysis can be complex.  For details on expected data structures, including default route type codes, consult the GTFS Schedule Reference (https://gtfs.org/schedule/reference/).  Correct identification of transport modes may require custom configuration if GTFS feeds do not match the standard route_type codes defined in routes.txt.  Analysis will only be undertaken for stops aligned with trips in stop_times.txt, allowing mode of transport to be identified.  Service frequency analysis will only be undertaken for stops with scheduled services.  Where stop times are not defined but a stop sequence is defined, interpolation of stop times may be configured to support service frequency analysis.

20231009_130209_FGC_Catalunya
  - analysis dates: 20230101 to 20231231
  - analysis times: ['07:00:00', '19:00:00']
  - 60 unique stops in stops.txt

  - configured Metro route type codes: [1]
  - Metro                24/27 (88.9%) metro stops aligned with departure times.
  - configured Rail route type codes: [2]
  - Rail                 53/53 (100.0%) rail stops aligned with departure times.
  - configured Bus route type codes: [3]
  - Bus                   3/3 (100.0%) bus stops aligned with departure times.
  - configured Funicular route type codes: [7]
  - Funicular             2/2 (100.0%) funicular stops aligned with departure times.

This less verbose, but more informative output would help users confirm that their GTFS feeds are being analysed as intended, or if they require further configuration/cleaning.

@carlhiggs carlhiggs added the bug Something isn't working label Mar 4, 2024
@carlhiggs carlhiggs self-assigned this Mar 4, 2024
carlhiggs added a commit that referenced this issue Mar 4, 2024
…ovide more informative feed output as per #385; also, pandas future warnings re pd.concat were ('The behavior of DataFrame concatenation with empty or all-NA entries is deprecated') specifically ignored as this pattern is used heavily and a way of refactoring this is not currently obvious, and the warnings impede readable output for users.   Also re-factored gtfs_analysis() function to reduce complexity and reduce some repetition, pulling out some code as sub-functions (added r.bbox attribute; functions fro get_frequent_stop_stats, get_average_headway, gtfs_to_db, load_gtfs_feed, get_frequencies_df).
@marcdmallafre
Copy link

Thank you @carlhiggs for reporting this issue.

Following with this, I've investigated the impact of the changes made to the GTFS analysis, especially concerning agencies with multiple modes of service.

As you outlined in the description, the main problem stemmed from the analysis treating all stops independently of the mode, resulting in duplications across different modes and affecting the total stop counts. However, this didn't significantly impact the current analysis results.

The new version of the GTFS analysis results in the logs is better than the old version, making it easier to read. good upgrade! During my investigation into this matter, I conducted several tests using the GTFS data from "Ferrocarrils de la Generalitat" in Barcelona, identical to the analysis performed by @carlhiggs. The results seem to be satisfactory.

Another test I conducted was on the Palma de Mallorca GTFS data using the old version, yielding the following results:

Palma analysis:
  - /home/ghsci/process/data/transit_feeds/palma/20231010_130143_TIB_Mallorca)
  - 20210101 to 20501231)
  - ['07:00:00', '19:00:00']

     Tram                  0/202 (0.0%) stops identified with departure times (0.91 seconds)
     Metro                 9/202 (4.5%) stops identified with departure times (1.04 seconds)
     Rail                 11/202 (5.4%) stops identified with departure times (1.34 seconds)
     Bus                 185/202 (91.6%) stops identified with departure times (2.09 seconds)
     Ferry                 0/202 (0.0%) stops identified with departure times (0.92 seconds)
     Cable tram            0/202 (0.0%) stops identified with departure times (0.87 seconds)
     Aerial lift           0/202 (0.0%) stops identified with departure times (0.86 seconds)
     Funicular             0/202 (0.0%) stops identified with departure times (0.86 seconds)
     Trolleybus            0/202 (0.0%) stops identified with departure times (0.87 seconds)
     Monorail              0/202 (0.0%) stops identified with departure times (0.87 seconds)

And with the new version:

20231010_130143_TIB_Mallorca
  - analysis dates: 20210101 to 20501231
  - analysis times: ['07:00:00', '19:00:00']
  - 202 unique stops in stops.txt

  - configured Metro route type codes: [1]
  - Metro                 9/9 (100.0%) metro stops aligned with departure times.
  - configured Rail route type codes: [2]
  - Rail                 11/11 (100.0%) rail stops aligned with departure times.
  - configured Bus route type codes: [3]
  - Bus                 185/185 (100.0%) bus stops aligned with departure times.

Upon analyzing the results in QGIS, I can confirm that this change resolves the problem.

Old Version of Code

As can be seen, in the new version, these results change, and the stops are no longer duplicated:

New Version

Therefore, I would consider this change effective.

carlhiggs added a commit that referenced this issue Mar 6, 2024
Updated GTFS analysis to address stop duplication across modes and provide more informative feed output as per #385
@carlhiggs carlhiggs mentioned this issue May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants