Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Investigate the Fullerton bus more #41

Open
2 tasks done
lauriemerrell opened this issue Dec 13, 2022 · 2 comments
Open
2 tasks done

[Data] Investigate the Fullerton bus more #41

lauriemerrell opened this issue Dec 13, 2022 · 2 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@lauriemerrell
Copy link
Member

lauriemerrell commented Dec 13, 2022

In an early EDA session, we observed that the realtime API data for the Fullerton (74) bus had some trips with missing/non-distinct trip_id values that were a series of asterisks (like ******). At the time the issue did not seem too widespread, but the Fullerton bus is in our bottom 10 routes in terms of performance. It is probably worth taking a second look to see whether this data issue is causing the 74 to seem worse than it actually is.

Goals for this ticket:

  • Assess prevalence of placeholder trip ID values on the Fullerton bus -- what is the frequency, what days does it occur, etc.
  • Also assess whether this issue is observed on other routes and whether the frequency on the Fullerton bus is a true outlier
@lauriemerrell lauriemerrell added the good first issue Good for newcomers label Dec 13, 2022
@KyleDolezal
Copy link
Collaborator

From October through December, 2022, two bus routes include the missing trip ****** value. Both the 66 and the 74 bus feature such missing trips.

The 66 bus had 857 missing trips and 327,759 non-missing trips. Missing trips constituted .2% of all scheduled trips.

The 77 bus had 19,451 missing trips and 176,375 non-missing trips. Missing trips were a total of 9.9% of total scheduled trips.

@csklare101
Copy link

From chi hack night 7/25/23.
Comments said that the issue exists still on CTA side at the time. There was a consideration in looking at CTA data directly to get more accurate real time data. Compare what this data would bring over, to see if its not a series of asterisks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants