Skip to content

Project to scrape CFTC.gov reports and present using Power Query & Power Pivot

License

Notifications You must be signed in to change notification settings

jason-norris/cot-reporter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Commitment of Traders (COT) Reporter

COT Reporter is a Python application which scrapes weekly reports posted by the Commodity Futures Trading Commission (CFTC) and prepares them for presentation in Excel via Power Query and Power Pivot. To demonstrate how reports of interest can be automatically extracted, loaded, and transformed for analysis in pivot tables and charts, two reports were targeted in this pilot project - the Disaggregated Futures and Traders in Financial Futures. A data model with certain measures was then created to present the Blees Sentiment Rating for given markets.

Tech Choices

Python was selected for its convenient libraries to handle HTML or XML (Beautiful Soup) and various data structures (pandas), while Excel was chosen for its accessibility as a BI tool. Power Query offers the necessary ETL functionality to manage the underlying CSVs created by the app, while Power Pivot acts as a suitable endpoint for the expected consumers of this data.

Challenges and Future Plans

The primary challenge was in maintaining modularity for additional reports to be pulled and pushed to the model workbook. Attention was given to portability and dynamic paths overall.

Future updates will likely include:

  • Logging and error handling
  • Rewrites to leverage classes
  • Console with progress bars
  • Interactive installation

Installation

Download as archive or clone COT Reporter in your preferred directory. Once extracted, complete the following steps to obtain reports and populate workbook:

  1. Update config.ini with the installation path. Example:

config_ini

  1. Update the workbook sheet edit_paths with the same install path. Example:

wb_edit_paths

  1. Optional: Update pswhConfigFile.xml in text editor if you plan to run PowerShell

pwshConfigFile

Usage

Populating workbook

  • There are a couple of ways to run and create consolidated master files for the workbook:

    • Run cotreporter.exe by double-clicking or from cmd (creates two files in /data/master)
    • Use bundled PowerShell script in /pwsh to schedule the task
  • Open Excel and click Refresh All on Data tab to import and query master files

wb_refresh_1

The app will download all historical reports when first executed, but it will only append the report for the current year on each subsequent run. Note that these files are not datestamped, but a week number is appended to the file name when compressed and saved in /data/archive.

Excel (without a VBA or macro-based solution) may require multiple clicks to completely import data. However, there is another PowerShell script included that will refresh the workbook in one go.

Other Notes & Tips

  • Do not enable content for workbook until paths to master files set (avoids warnings)
  • Once template is configured, save as .xlsx (and update PowerShell config file if used)
  • Native Excel add-ons may require enablement if prompted

Disclaimer

If you choose to schedule this app, please scrape responsibly and run once per week when new reports are posted. Likewise, if you modify this code for any other related purpose, please read more on ethical web scraping beforehand.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

Apache 2.0