Fixed type issues and upgraded deps

docugami · Mar 13, 2024 · f4c49c9 · f4c49c9
1 parent de8dcf0
commit f4c49c9
Show file tree

Hide file tree

Showing 15 changed files with 1,908 additions and 1,881 deletions.
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -1,20 +1,35 @@
-name: PR Gate
+name: CI
 
-on:
-  pull_request:
-    branches:
-      - main
+on: [push]
 
 jobs:
   build:
     runs-on: ubuntu-latest
 
     steps:
-      - name: Checkout code
-        uses: actions/checkout@v2
+      - name: Check out the code
+        uses: actions/checkout@v3
 
-      - name: Run setup.sh
-        run: chmod +x setup.sh && ./setup.sh
+      - name: Install Poetry
+        run: |
+          curl -sSL https://install.python-poetry.org | python3 -
+        shell: bash
 
-      - name: Run static_analysis.sh
-        run: chmod +x static_analysis.sh && ./static_analysis.sh
+      - name: Install dependencies
+        run: poetry install --with dev
+
+      - name: Lint code
+        run: make lint
+
+      - name: Check spellings
+        run: make spell_check
+
+      - name: Test code
+        run: make test
+
+      - name: Check PR status
+        run: |
+          if [ -n "$(git diff --name-only ${{ github.base_ref }}..${{ github.head_ref }})" ]; then
+            echo "Changes detected. Please make sure to push all changes to the branch before merging.";
+            exit 1;
+          fi
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -1,6 +1,10 @@
 {
-    "[python]": {
-        "editor.defaultFormatter": "ms-python.black-formatter"
-    },
-    "python.formatting.provider": "none"
+    "python.testing.pytestArgs": [
+        "tests",
+        "--doctest-modules",
+        "tests",
+        "docugami"
+    ],
+    "python.testing.unittestEnabled": false,
+    "python.testing.pytestEnabled": true
 }
diff --git a/Makefile b/Makefile
@@ -0,0 +1,54 @@
+.PHONY: all format lint test tests
+
+# Default target executed when no arguments are given to make.
+all: help
+
+# Define a variable for the test file path.
+TEST_FILE ?= tests/ docugami_dfm_benchmarks/
+
+test:
+	poetry run pytest --doctest-modules $(TEST_FILE)
+
+tests:
+	poetry run pytest --doctest-modules $(TEST_FILE)
+
+######################
+# LINTING AND FORMATTING
+######################
+
+# Define a variable for Python and notebook files.
+PYTHON_FILES=.
+MYPY_CACHE=.mypy_cache
+lint format: PYTHON_FILES=.
+lint_package: PYTHON_FILES=docugami_dfm_benchmarks
+lint_tests: PYTHON_FILES=tests
+lint_tests: MYPY_CACHE=.mypy_cache_test
+
+lint lint_diff lint_package lint_tests:
+	poetry run ruff check .
+	poetry run ruff check $(PYTHON_FILES) --diff
+	poetry run ruff check --select I $(PYTHON_FILES)
+	mkdir -p $(MYPY_CACHE); poetry run mypy $(PYTHON_FILES) --cache-dir $(MYPY_CACHE)
+
+format format_diff:
+	poetry run ruff check --select I --fix $(PYTHON_FILES)
+
+spell_check:
+	poetry run codespell --skip "./poetry.lock,./data/*,./tests/testdata/*,./temp/*" --toml pyproject.toml
+
+spell_fix:
+	poetry run codespell --skip "./poetry.lock,./data/*,./tests/testdata/*,./temp/*" --toml pyproject.toml -w
+
+
+######################
+# HELP
+######################
+
+help:
+	@echo '----'
+	@echo 'format                       - run code formatters'
+	@echo 'lint                         - run linters'
+	@echo 'spell_check                  - run spell checker'
+	@echo 'test                         - run unit tests'
+	@echo 'tests                        - run unit tests'
+	@echo 'test TEST_FILE=<test_file>   - run all tests in file'
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@ This repo contains benchmark datasets and eval code for the Docugami Foundation
 
 # Getting Started
 
-Make sure you have [poetry](https://python-poetry.org/docs/) installed on your machine, then just run: `setup.sh`. This should install all dependencies required.
+Make sure you have [poetry](https://python-poetry.org/docs/) installed on your machine, then just run: `poetry install` or `poetry install --with dev`. This should install all dependencies required.
 
 # Running Eval
 
@@ -16,15 +16,15 @@ poetry run benchmark eval /path/to/data.csv
 This should output results for the data in the benchmark, in tabular format. See current results section below for some examples for different benchmarks.
 
 # Data
-The data for the benchmarks was sourced from various long-form business documents, a sampling of which is included under `data/documents` as PDF or DOCX. Text was extracted from the documents using Docugami's internal models and then then split appropropriately for each task. 
+The data for the benchmarks was sourced from various long-form business documents, a sampling of which is included under `data/documents` as PDF or DOCX. Text was extracted from the documents using Docugami's internal models and then then split appropriately for each task. 
 
 All data is samples from Docugami released under the license of this repo, except for the medical data which was pulled from the openly accessible [UNC H&P Examples](https://www.med.unc.edu/medclerk/education/grading/history-and-physical-examination-h-p-examples/).
 
 # Benchmarks and Current Results
 
 As of 5/29/2023 we are measuring the following results for the Docugami Foundation Model (DFM), compared to other widely used models suitable for commercial use. We measured each model with the same prompt, with input text capped at 1024 chars, and max output tokens set to 10. 
 
-We are reporting different metrics against the human-annotated ground truth as implemented in `docugami/dfm_benchmarks/scorer.py` specifically Exact Match, Vector Similarity (above different thresholds) and Average F1 for output tokens. These metrics give a more balanced view of the output of each model, since generative labels are meant to capture the semantic meaning of each node, and may not necessarily match the ground truth exactly.
+We are reporting different metrics against the human-annotated ground truth as implemented in `docugami_dfm_benchmarks/scorer.py` specifically Exact Match, Vector Similarity (above different thresholds) and Average F1 for output tokens. These metrics give a more balanced view of the output of each model, since generative labels are meant to capture the semantic meaning of each node, and may not necessarily match the ground truth exactly.
 
 Specifically, DFM outperforms on the more stringent comparisons i.e., Exact match and Similarity@>=0.8 (which can be thought of as "almost exact match" in terms of semantic similarity). This means that Docugami’s output more closely matches human labels, either exactly or very closely. 
 
@@ -53,4 +53,4 @@ This benchmark measures the model's ability to produce human readable semantic l
 
 # Contributing
 
-We welcome contributions and feedback. We would appreciate it if you run `static_analysis.sh` and fix any issues prior to submitting your PR, but we are happy to fix such issues ourselves as part of reviewing your PR.
+We welcome contributions and feedback. We would appreciate it if you run `make format`, `make lint` and `make spell_check` prior to submitting your PR, but we are happy to fix such issues ourselves as part of reviewing your PR.
diff --git a/docugami/dfm_benchmarks/scorer.py b/docugami/dfm_benchmarks/scorer.py
diff --git a/docugami_dfm_benchmarks/__init__.py b/docugami_dfm_benchmarks/__init__.py
diff --git a/docugami/dfm_benchmarks/cli.py → docugami_dfm_benchmarks/cli.py b/docugami/dfm_benchmarks/cli.py → docugami_dfm_benchmarks/cli.py
@@ -1,16 +1,11 @@
-"""
-Copyright (c) Docugami Inc.
-"""
-
 import csv
-from pathlib import Path
 import sys
-
+from pathlib import Path
 from typing import Optional
-import typer
 
-from docugami.dfm_benchmarks.scorer import OutputFormat, score_data, tabulate_scores
+import typer
 
+from docugami_dfm_benchmarks.utils.scorer import OutputFormat, score_data, tabulate_scores
 
 app = typer.Typer(
     help="Benchmarks for Business Document Foundation Models",
@@ -22,7 +17,7 @@
 def eval(
     csv_file: Path,
     output_format: OutputFormat = OutputFormat.GITHUB_MARKDOWN,
-):
+) -> None:
     with open(csv_file) as file:
         reader = csv.DictReader(file)
         data = [row for row in reader]
@@ -55,7 +50,7 @@ def main(
         is_eager=True,
         help="Prints the version number.",
     )
-):
+) -> None:
     pass