layout | title | description | thumbnail |
---|---|---|---|
page |
Resources |
Resources related to machine learning and data-centric AI that we recommend.
|
/static/assets/thumbnail.png |
General resources related to machine learning and data-centric AI that we recommend. For additional resources on the topics covered in lectures, see the references in individual lecture notes.
- cleanlab - automatically detect problems in a dataset to facilitate ML with messy, real-world data
- refinery - assess and maintain natural language data
- great expectations - validate, document, and profile data for quality testing
- ydata-profiling - generate summary reports of tabular datasets stored as pandas DataFrame
- cleanvision - automatically detect low-quality images in computer vision datasets
- albumentations - data augmentation for computer vision
- label-studio - interfaces to label and annotate data for many ML tasks
- llamaindex - a data framework for LLM applications (Retrieval-Augmented Generation)
- dspy - algorithmically optimize LLM prompts and bootstrap data
- Unbiggen AI
- Andrew Ng Launches A Campaign For Data-Centric AI
- Tips for a Data-Centric AI Approach
- Data-Centric Approach vs Model-Centric Approach in Machine Learning
- A Linter for ML Datasets
- Handling Mislabeled Data to Improve Your Model
- Catch bad data in LLM datasets
- A Data Quality-Driven View of MLOps
- Advances in Exploratory Data Analysis, Visualisation and Quality for Data Centric AI Systems
- A Survey on Data Selection for Language Models