Skip to content

dvdblk/hack4good-oecd

Repository files navigation

Hack4Good - NLP for policy trend analysis (OECD)

Release pre-commit Code style

This project was created for the Hack4Good 2023 hackathon in collaboration with OECD.

image

  • GUI via Streamlit
  • Extracts semi-structured PDF into structured text using Adobe PDF Extract API
  • Uses PDFTriage1 style prompting (similar to ReAct) to answer questions about the documents.
  • Answers to questions provide the section where they were found in the document.

GUI Quickstart

Environment variables need to be set in order to run the interactive GUI. Create an .env file in the root of the repo (you can use cp .env.default .env) with the following variables:

Environment Variable Description
ADOBE_CLIENT_ID Create Adobe Developer account and select "Get credentials" here
ADOBE_CLIENT_SECRET Copy from "Get credentials" here as with ADOBE_CLIENT_ID
OPENAI_API_KEY Get the OpenAI API key

After setting the environment variables (make sure to to not enclose the env variables in quotes), you can run the code in one of two ways:

Conda Environment
  1. Create a conda environment with the required dependencies:

To create a conda environment after cloning the repo:

# from the root of the repo
conda env create -f environment.yml
# to activate the environment
conda activate hack4good
# to deactivate the environment (when you're done)
conda deactivate

(Optional) To update the conda environment after pulling latest changes:

conda activate hack4good
conda env update -f environment.yml --prune

(Optional) To remove the conda environment:

conda deactivate
conda env remove -n hack4good
  1. Run the streamlit app
python -m streamlit run app/main.py
  1. Access the streamlit app at http://localhost:8501
Docker
  1. Run (or build) the Docker image

To run the latest docker image:

docker run -p 8501:8501 --env-file .env --volume $PWD/data:/app/app/data ghcr.io/dvdblk/hack4good-oecd/app:latest

(Optional) To build the docker image locally (after cloning the repo) and run it:

docker build -t hack4good .
docker run -p 8501:8501 --env-file .env --volume $PWD/data:/app/app/data hack4good
  1. Access the streamlit app at http://localhost:8501

Contributing

  1. Install pre-commit.
  2. Run pre-commit install to apply the repo's pre-commit hooks to your local git repo.
  3. Add your changes, commit and create a pull request with main branch as the target.

Footnotes

  1. Saad-Falcon, J., Barrow, J., Siu, A., Nenkova, A., Yoon, D. S., Rossi, R. A., & Dernoncourt, F. (2023). PDFTriage: Question Answering over Long, Structured Documents. arXiv preprint arXiv:2309.08872