Skip to content

Commit

Permalink
Fixed type issues and upgraded deps
Browse files Browse the repository at this point in the history
  • Loading branch information
Taqi Jaffri committed Mar 13, 2024
1 parent de8dcf0 commit f4c49c9
Show file tree
Hide file tree
Showing 15 changed files with 1,908 additions and 1,881 deletions.
37 changes: 26 additions & 11 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -1,20 +1,35 @@
name: PR Gate
name: CI

on:
pull_request:
branches:
- main
on: [push]

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Check out the code
uses: actions/checkout@v3

- name: Run setup.sh
run: chmod +x setup.sh && ./setup.sh
- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
shell: bash

- name: Run static_analysis.sh
run: chmod +x static_analysis.sh && ./static_analysis.sh
- name: Install dependencies
run: poetry install --with dev

- name: Lint code
run: make lint

- name: Check spellings
run: make spell_check

- name: Test code
run: make test

- name: Check PR status
run: |
if [ -n "$(git diff --name-only ${{ github.base_ref }}..${{ github.head_ref }})" ]; then
echo "Changes detected. Please make sure to push all changes to the branch before merging.";
exit 1;
fi
12 changes: 8 additions & 4 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
{
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
},
"python.formatting.provider": "none"
"python.testing.pytestArgs": [
"tests",
"--doctest-modules",
"tests",
"docugami"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true
}
54 changes: 54 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
.PHONY: all format lint test tests

# Default target executed when no arguments are given to make.
all: help

# Define a variable for the test file path.
TEST_FILE ?= tests/ docugami_dfm_benchmarks/

test:
poetry run pytest --doctest-modules $(TEST_FILE)

tests:
poetry run pytest --doctest-modules $(TEST_FILE)

######################
# LINTING AND FORMATTING
######################

# Define a variable for Python and notebook files.
PYTHON_FILES=.
MYPY_CACHE=.mypy_cache
lint format: PYTHON_FILES=.
lint_package: PYTHON_FILES=docugami_dfm_benchmarks
lint_tests: PYTHON_FILES=tests
lint_tests: MYPY_CACHE=.mypy_cache_test

lint lint_diff lint_package lint_tests:
poetry run ruff check .
poetry run ruff check $(PYTHON_FILES) --diff
poetry run ruff check --select I $(PYTHON_FILES)
mkdir -p $(MYPY_CACHE); poetry run mypy $(PYTHON_FILES) --cache-dir $(MYPY_CACHE)

format format_diff:
poetry run ruff check --select I --fix $(PYTHON_FILES)

spell_check:
poetry run codespell --skip "./poetry.lock,./data/*,./tests/testdata/*,./temp/*" --toml pyproject.toml

spell_fix:
poetry run codespell --skip "./poetry.lock,./data/*,./tests/testdata/*,./temp/*" --toml pyproject.toml -w


######################
# HELP
######################

help:
@echo '----'
@echo 'format - run code formatters'
@echo 'lint - run linters'
@echo 'spell_check - run spell checker'
@echo 'test - run unit tests'
@echo 'tests - run unit tests'
@echo 'test TEST_FILE=<test_file> - run all tests in file'
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ This repo contains benchmark datasets and eval code for the Docugami Foundation

# Getting Started

Make sure you have [poetry](https://python-poetry.org/docs/) installed on your machine, then just run: `setup.sh`. This should install all dependencies required.
Make sure you have [poetry](https://python-poetry.org/docs/) installed on your machine, then just run: `poetry install` or `poetry install --with dev`. This should install all dependencies required.

# Running Eval

Expand All @@ -16,15 +16,15 @@ poetry run benchmark eval /path/to/data.csv
This should output results for the data in the benchmark, in tabular format. See current results section below for some examples for different benchmarks.

# Data
The data for the benchmarks was sourced from various long-form business documents, a sampling of which is included under `data/documents` as PDF or DOCX. Text was extracted from the documents using Docugami's internal models and then then split appropropriately for each task.
The data for the benchmarks was sourced from various long-form business documents, a sampling of which is included under `data/documents` as PDF or DOCX. Text was extracted from the documents using Docugami's internal models and then then split appropriately for each task.

All data is samples from Docugami released under the license of this repo, except for the medical data which was pulled from the openly accessible [UNC H&P Examples](https://www.med.unc.edu/medclerk/education/grading/history-and-physical-examination-h-p-examples/).

# Benchmarks and Current Results

As of 5/29/2023 we are measuring the following results for the Docugami Foundation Model (DFM), compared to other widely used models suitable for commercial use. We measured each model with the same prompt, with input text capped at 1024 chars, and max output tokens set to 10.

We are reporting different metrics against the human-annotated ground truth as implemented in `docugami/dfm_benchmarks/scorer.py` specifically Exact Match, Vector Similarity (above different thresholds) and Average F1 for output tokens. These metrics give a more balanced view of the output of each model, since generative labels are meant to capture the semantic meaning of each node, and may not necessarily match the ground truth exactly.
We are reporting different metrics against the human-annotated ground truth as implemented in `docugami_dfm_benchmarks/scorer.py` specifically Exact Match, Vector Similarity (above different thresholds) and Average F1 for output tokens. These metrics give a more balanced view of the output of each model, since generative labels are meant to capture the semantic meaning of each node, and may not necessarily match the ground truth exactly.

Specifically, DFM outperforms on the more stringent comparisons i.e., Exact match and Similarity@>=0.8 (which can be thought of as "almost exact match" in terms of semantic similarity). This means that Docugami’s output more closely matches human labels, either exactly or very closely. 

Expand Down Expand Up @@ -53,4 +53,4 @@ This benchmark measures the model's ability to produce human readable semantic l

# Contributing

We welcome contributions and feedback. We would appreciate it if you run `static_analysis.sh` and fix any issues prior to submitting your PR, but we are happy to fix such issues ourselves as part of reviewing your PR.
We welcome contributions and feedback. We would appreciate it if you run `make format`, `make lint` and `make spell_check` prior to submitting your PR, but we are happy to fix such issues ourselves as part of reviewing your PR.
169 changes: 0 additions & 169 deletions docugami/dfm_benchmarks/scorer.py

This file was deleted.

Empty file.
15 changes: 5 additions & 10 deletions docugami/dfm_benchmarks/cli.py → docugami_dfm_benchmarks/cli.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,11 @@
"""
Copyright (c) Docugami Inc.
"""

import csv
from pathlib import Path
import sys

from pathlib import Path
from typing import Optional
import typer

from docugami.dfm_benchmarks.scorer import OutputFormat, score_data, tabulate_scores
import typer

from docugami_dfm_benchmarks.utils.scorer import OutputFormat, score_data, tabulate_scores

app = typer.Typer(
help="Benchmarks for Business Document Foundation Models",
Expand All @@ -22,7 +17,7 @@
def eval(
csv_file: Path,
output_format: OutputFormat = OutputFormat.GITHUB_MARKDOWN,
):
) -> None:
with open(csv_file) as file:
reader = csv.DictReader(file)
data = [row for row in reader]
Expand Down Expand Up @@ -55,7 +50,7 @@ def main(
is_eager=True,
help="Prints the version number.",
)
):
) -> None:
pass


Expand Down

0 comments on commit f4c49c9

Please sign in to comment.