Skip to content

MonsoonNLP/data-fails

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Objective

Does your dataset have issues? How do you find out, and how do you fix those issues?

I originally pitched this project as:

One dataset with different bad samples (eg too much of one class, missing values, gender bias), each as its own "discover the data problem" exercise

I wanted to include others' previous work on parsing CSV / data sources in general, to offer as many examples as possible

Work in progress

In the future ideally there would be a data browser, where you can programmatically review the dataset and determine its problems

License

Open source, MIT License

About

Find quality / sampling errors in data before it goes into a machine learning model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages