Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Request - NTD by RTPA report tweaks #990

Open
tiffanychu90 opened this issue Jan 10, 2024 · 1 comment
Open

Research Request - NTD by RTPA report tweaks #990

tiffanychu90 opened this issue Jan 10, 2024 · 1 comment
Assignees
Labels
ntd National Transit Database research request Issues that serve as a request for research (summary and handoff)

Comments

@tiffanychu90
Copy link
Member

tiffanychu90 commented Jan 10, 2024

Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).

Research Question

Single sentence description: After the first batch of NTD data went out to RTPAs, we received feedback to make minor changes to how we send out the output.

Follow-up issue to #978

Detailed description:

  1. Export as .xlsx instead of .csv with multiple tabs
    • Set up title page, and add the same title page to every RTPA. MTC's requests (but we can add more to info too) for title page here.
    • Work a data dictionary into title page. Mode_full, TOS_full is the unabbreviated version of Mode and TOS, so let's spell that out very clearly...already it has been missed. Don't get rid of Mode and TOS because those are original NTD columns.
From MTC:  Add a cover sheet as the first tab that explains what the document contains, and what can 
be found on subsequent sheets. (Be sure to name the sheets as they are named in the document.)

Best to spell out abbreviations on the first occurrence. (For example, TOS.)

Within each sheet, select all of the data, and “Format as Table.” This will enable better filtering and 
use for people who use assistive technology. It also helps blind users know where the data ends.
When Formatting as Table, be sure to properly set the Header and Column Rows.

Column headers should not have unnecessary abbreviations. For example, JN1 currently reads 
“change_1yr_1/2023”. This might be difficult to understand without context. Better to write 
“Change in one year since 2023” (or whatever it actually means; I have no idea). Similarly, KH1
 “mode_full” doesn’t make a lot of sense without context.

Add a Data Dictionary tab to explain what values for variables mean. For example, the variable 
Mode has values like CB and CC, which are not understandable to the public.

The index/cover sheet tab should indicate that the Data Dictionary sheet has more than one table
 on it.

File “Properties” should be added. Especially the Title.
  • Multiple tabs for tables --> we will not be able to "format as table" programmatically. We are zipping up an entire folder of Excel spreadsheets, so unless we want to manually open it every time we overwrite and format as table, the RTPA should do it when they are opening up the dataset.
  • We will add multiple tabs for additional aggregated tables...like aggregated to TOS or Mode or Agency.
  • Rename columns (with spaces). Should this be renamed to YOY (year over year or month over month...whatever industry standard).
    • We programmatically generate our column names, and we want to maintain our programmatic column names through our export to parquet.
    • Apply a function to get readable column names to parse away underscores and do titlecase, spaces, parentheses, whatever need be. Use this output df in the export.
  • File “Properties” should be added. Especially the Title. --> unless someone finds a programmatic way to attach the Excel properties, we're not going into every spreadsheet to do this.
  1. Making changes to the script

  2. Populate the YAML for portfolio script

  3. Notebook that renders report

Notes

  • Work in ntd folder in GH repo and GCS
@tiffanychu90 tiffanychu90 added the research request Issues that serve as a request for research (summary and handoff) label Jan 10, 2024
@github-actions github-actions bot added this to Research Requests in Analytics Work Jan 10, 2024
@tiffanychu90 tiffanychu90 added the ntd National Transit Database label Jan 17, 2024
@tiffanychu90 tiffanychu90 assigned csuyat-dot and unassigned shweta487 Feb 22, 2024
@csuyat-dot
Copy link
Contributor

Related PRs:

Encountered errors during initial set up of monthly NTD ridership site scripts regarding:

  • netlify set up
  • changes in raw data column names from NTD
  • missing folders/files for scripts
  • notebook conflicts

Resolved issues by: setting up netlify in CLI, adjusted scripts to accept changes in column names, running individual scripts until notebooks were built and website deployed. Unfortunately was not able to resolve initial notebook conflicts so had to start a new branch and everything ran as expected (no errors)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ntd National Transit Database research request Issues that serve as a request for research (summary and handoff)
Projects
Analytics Work
Research Requests
Development

No branches or pull requests

3 participants