Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADD] DataComPy #228

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

[ADD] DataComPy #228

wants to merge 1 commit into from

Conversation

fdosani
Copy link

@fdosani fdosani commented Apr 8, 2024

Description

Why

DataComPy is a package to compare two Spark Data Frames. Originally started to be something of a replacement for SAS's PROC COMPARE for Pandas DataFrames with some more functionality than just Pandas.DataFrame.equals(Pandas.DataFrame).

It provides stats and lets users adjust for match accuracy, and specify absolute or relative tolerance for comparison of numeric columns. The package also allows via Fugue to compare Spark DataFrames against other types so: Spark vs Pandas, Spark vs Polars, Spark vs DuckDB etc.

Checklist

  • Item hasn't been proposed yet (there is no open PR, it hasn't been rejected or removed).
  • Item is added at the on of the relevant section.
  • Description ends with period and doesn't contain trailing whitespace.

Code of Conduct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant