Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster R package vignettes #266

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft

Faster R package vignettes #266

wants to merge 12 commits into from

Commits on Nov 11, 2020

  1. Convert API calls to request CSV format for data, instead of JSON

    The CSV format is much more compact (does not repeat field names for
    every row), and more naturally fits with R anyway.
    
    Alter the relevant tests to serve CSVs. I've verified all vignettes
    build with these changes.
    capnrefsmmat committed Nov 11, 2020
    Configuration menu
    Copy the full SHA
    91137da View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    bcf0191 View commit details
    Browse the repository at this point in the history
  3. Correct error in metadata test

    It should not be possible to have two signals with the same source,
    signal, time_type, and geo_type. This will cause a query for that signal
    to have two metadata rows attached to the covidcast_signal data frame,
    which will confuse everything.
    capnrefsmmat committed Nov 11, 2020
    Configuration menu
    Copy the full SHA
    60f6cc7 View commit details
    Browse the repository at this point in the history
  4. Add an additional test of covidcast_signal

    Fetching multiple days is important.
    capnrefsmmat committed Nov 11, 2020
    Configuration menu
    Copy the full SHA
    b10569f View commit details
    Browse the repository at this point in the history

Commits on Nov 12, 2020

  1. Address review comments

    capnrefsmmat committed Nov 12, 2020
    Configuration menu
    Copy the full SHA
    677f4cd View commit details
    Browse the repository at this point in the history

Commits on Nov 13, 2020

  1. Configuration menu
    Copy the full SHA
    37e8a07 View commit details
    Browse the repository at this point in the history
  2. Use dplyr::distinct in {latest,earliest}_issue

    Profiling revealed that latest_issue was responsible for a large portion
    of the time taken in building correlation-utils.Rmd (apart from
    downloading the data). Much of this time was spent in dplyr::filter.
    
    Rather than grouping by geography and time, we can use dplyr::distinct,
    knowing that each geo_value and time_value should appear only once per
    issue date. By taking the first or last (after sorting by issue date),
    we get the desired result.
    
    dplyr does not document algorithmic details, so I can't easily give O(n)
    notation here. Algorithmic details notwithstanding, the results are
    extraordinary:
    
    > nrow(d)
    [1] 203360
    > system.time(latest_issue_old(d))
       user  system elapsed
      6.395   0.037   6.465
    > system.time(latest_issue(d))
       user  system elapsed
      0.025   0.003   0.027
    capnrefsmmat committed Nov 13, 2020
    Configuration menu
    Copy the full SHA
    0c93efa View commit details
    Browse the repository at this point in the history
  3. Do our correlation analysis at the state, not county, level

    Fetching the county data took a large portion of the time required to
    build the vignette, particularly after the fixes to latest_issue in
    b0f7e7b.
    capnrefsmmat committed Nov 13, 2020
    Configuration menu
    Copy the full SHA
    7f7fd89 View commit details
    Browse the repository at this point in the history
  4. Fix test thinkos

    Always run your tests before pushing...
    capnrefsmmat committed Nov 13, 2020
    Configuration menu
    Copy the full SHA
    bf86746 View commit details
    Browse the repository at this point in the history
  5. Fix source links in pkgdown documentation

    By providing the `repo` block with a link pointing to
    R-packages/covidcast/, pkgdown can build the correct URLs.
    capnrefsmmat committed Nov 13, 2020
    Configuration menu
    Copy the full SHA
    4e8b0ef View commit details
    Browse the repository at this point in the history
  6. Switch all vignettes to use httptest to record API requests

    Also modify several vignettes to download only the necessary amount of
    data, thus reducing the file size of these CSV files.
    capnrefsmmat committed Nov 13, 2020
    Configuration menu
    Copy the full SHA
    0ebe6ed View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    30a5c2f View commit details
    Browse the repository at this point in the history