Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adtl-qc: Quality Control module with report generation #89

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from
Draft

Conversation

abhidg
Copy link
Contributor

@abhidg abhidg commented Oct 17, 2023

  • qc: add adtl-qc subcommand, bump version
  • docs: add quality control module documentation
  • qc: new module for data quality control
  • qc: add support for rule collection
  • qc: process work units, move to submodule
  • qc: add tests
  • qc: save results in SQLite DB
  • qc: add templates
  • qc: add long desc, write rules to DB
  • qc: style fixups
  • qc: add report module

@abhidg abhidg changed the title qc adtl-qc: Quality Control module with report generation Oct 17, 2023
@codecov
Copy link

codecov bot commented Oct 17, 2023

Codecov Report

Merging #89 (6412927) into main (610e8d6) will decrease coverage by 9.47%.
Report is 1 commits behind head on main.
The diff coverage is 61.60%.

❗ Current head 6412927 differs from pull request most recent head f960679. Consider uploading reports for the commit f960679 to get more accurate results

@@             Coverage Diff             @@
##              main      #89      +/-   ##
===========================================
- Coverage   100.00%   90.53%   -9.47%     
===========================================
  Files            2        5       +3     
  Lines          677      909     +232     
===========================================
+ Hits           677      823     +146     
- Misses           0       86      +86     
Files Coverage Δ
adtl/qc/__init__.py 83.33% <83.33%> (ø)
adtl/qc/runner.py 68.81% <68.81%> (ø)
adtl/qc/report.py 29.23% <29.23%> (ø)

... and 1 file with indirect coverage changes

The `reason` attribute can hold metadata about the reason a rule
was triggered. This can be used to add row-dependent descriptive
information as well as being used by the schema functions to fill
in the reason why a particular row failed schema validation.
Schema rules validate dataframes against a JSON Schema
Do not save to DB as data is corrupted on insert
pd.read_csv() does not do automatic type conversions for mixed
types. This adds a _to_json() helper function that converts a
row into JSON, auto-converting floating point, integer and
boolean points and dropping keys with null values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant