Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Investigate routes with ratio of actual trips to scheduled trips greater than 1 #19

Open
dcjohnson24 opened this issue Sep 13, 2022 · 2 comments
Labels
mvp must be addressed before mvp launch

Comments

@dcjohnson24
Copy link
Collaborator

dcjohnson24 commented Sep 13, 2022

Investigate routes with ratio > 1

There are some routes that have a ratio of actual trips to scheduled trips greater than one, and it would be good to know why.

Access the data

Jupyter Notebook

To access the data, run the notebook compare_scheduled_and_rt.ipynb. Add a cell at the bottom with %store summary and run it. The %store magic command allows you to share variables between notebooks https://stackoverflow.com/questions/31621414/share-data-between-ipython-notebooks.

Next, run the static_gtfs_analysis.ipynb. Add a cell at the bottom with %store -r summary and run it to read the summary DataFrame from the compare_scheduled_and_rt.ipynb notebook. Merge the summary DataFrame with the final_gdf GeoDataFrame from the compare_scheduled_and_rt.ipynb using summary_gdf = summary.merge(final_gdf, how="right", on="route_id")

Python

Run the following in an interpreter from the project root:

import pandas as pd

import data_analysis.compare_scheduled_and_rt as csrt
import data_analysis.static_gtfs_analysis as sga

summary_df = csrt.main()

gdf = sga.main()

summary_gdf = summary_df.merge(gdf, how="right", on="route_id")

Find routes with ratio > 1

To filter the rows with ratio > 1, use

ratio_over_one = summary_gdf.loc[summary_gdf.ratio > 1]
ratio_over_one.head()

A few things to look for:

@porouspaper
Copy link

porouspaper commented Sep 16, 2022

on trips crossing the hour boundary - are we suspecting that this code is double-counting trips if the trip crosses an hour boundary? despite vid being aggregated as a set?

@dcjohnson24
Copy link
Collaborator Author

I think that's the code. I guess maybe vid is unique only for a given hour, but it could appear in another hour for the same trip. It does seem strange though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mvp must be addressed before mvp launch
Projects
Development

No branches or pull requests

2 participants