Skip to content

satyakisikdar/openalex

Repository files navigation

OpenAlex

Updated: Feb 23, 2022

Preprocessing

  1. Create the sci-sci conda environment from environment.yml.
  2. Download the OpenAlex snapshots from this link to a directory of your choosing (say, basedir).
  3. Open preprocessing/flatten_openalex_files.py and update the BASEDIR variable to the above directory.
  4. Uncomment and run flatten_<entity> functions to generate the flattened compressed CSV files.
  • The flatten_works() function generates CSV and Parquet files at the same time.

Warnings:

  • flattening authors and works take anywhere between 15 and 30 hours. The code will cache the files, so you should consider running it in batches by setting the files_to_process variable.

Coming Soon

  • Filtering CSVs based on concepts, publication years, and venues

About

Tools to process OpenAlex snapshots

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published