Skip to content

Chipdelmal/CDPHScrape

Repository files navigation

CDPH Scrape

These scripts were created to scrape the monthly California Department of Public Health (CDPH) arbovirus case updates to CSV files for easy analysis.

  • pdfDownload.py: Downloads PDFs defined in sources.py to a directory.
  • pdfScrape.py: Scrapes the PDFs looking for tables and getting rid of parentheses data and notes (redundant to counts).

Use

Currently, paths need to be changed 'in-file' but I'll add a wrapper in the near future to call the whole thing from bash. To parse the tables from the PDFs run the scripts as follows (making sure the PATH_O from pdfDownload matches PATH_I in pdfScrape):

python pdfDownload.py
python pdfScrape.py

Dependencies

To install the required dependencies, run:

pip install camelot-py pandas

Authors


Héctor M. Sánchez C., Tomás León

Releases

No releases published

Packages

No packages published

Languages