-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTFS stop duplication across modes #385
Comments
…ovide more informative feed output as per #385; also, pandas future warnings re pd.concat were ('The behavior of DataFrame concatenation with empty or all-NA entries is deprecated') specifically ignored as this pattern is used heavily and a way of refactoring this is not currently obvious, and the warnings impede readable output for users. Also re-factored gtfs_analysis() function to reduce complexity and reduce some repetition, pulling out some code as sub-functions (added r.bbox attribute; functions fro get_frequent_stop_stats, get_average_headway, gtfs_to_db, load_gtfs_feed, get_frequencies_df).
Thank you @carlhiggs for reporting this issue. Following with this, I've investigated the impact of the changes made to the GTFS analysis, especially concerning agencies with multiple modes of service. As you outlined in the description, the main problem stemmed from the analysis treating all stops independently of the mode, resulting in duplications across different modes and affecting the total stop counts. However, this didn't significantly impact the current analysis results. The new version of the GTFS analysis results in the logs is better than the old version, making it easier to read. good upgrade! During my investigation into this matter, I conducted several tests using the GTFS data from "Ferrocarrils de la Generalitat" in Barcelona, identical to the analysis performed by @carlhiggs. The results seem to be satisfactory. Another test I conducted was on the Palma de Mallorca GTFS data using the old version, yielding the following results:
And with the new version:
Upon analyzing the results in QGIS, I can confirm that this change resolves the problem. As can be seen, in the new version, these results change, and the stops are no longer duplicated: Therefore, I would consider this change effective. |
Updated GTFS analysis to address stop duplication across modes and provide more informative feed output as per #385
Describe the bug
When identifying stops from a particular agency's GTFS feed, and particularly in the case of agencies with multiple modes of service (e.g. tram, rail, funicular, bus, etc), the GTFS analysis currently analyses each mode seperately, but using each stop regardless of mode.
This doesn't impact the currently implemented analysis results for several reasons:
The main impact is in the displayed total stop counts (these include all stops in feed), and the duplication of stops across modes in the pt_stops_headway SQL and geopackage table.
This issue was reported by @marcdmallafre, using the enhancements branch of the code here
To Reproduce
perform GTFS analysis using the above version of code using a GTFS feed with multiple modes of transport
Expected behavior
stops analysed by mode should relate only to that mode given the available information
Screenshots
For example, in our urban Barcelona study region there should only be two Funicular stops (according to the information we are using), operated by FGC; however, in screen shot below there were 60 unique stops. This was an error, and also included stops that were bus, metro and rail services, also operated by FGC. The two correct stops that were funicular services are displayed in pink:
This meant that there were 240 stops attributed to FGC (60* 5 modes), instead of 85 (60 unique stops, with some supporting seperate metro and rail services).
As part of addressing this issue, the output information provided to users should be improved to provide more guidance, and omit unnecessary information.
For example, instead of output like this:
Better, would be output like this:
This less verbose, but more informative output would help users confirm that their GTFS feeds are being analysed as intended, or if they require further configuration/cleaning.
The text was updated successfully, but these errors were encountered: