Skip to content

Latest commit

History

History
51 lines (40 loc) 路 3.36 KB

resources.md

File metadata and controls

51 lines (40 loc) 路 3.36 KB
layout title description thumbnail
page
Resources
Resources related to machine learning and data-centric AI that we recommend.
/static/assets/thumbnail.png

General resources related to machine learning and data-centric AI that we recommend. For additional resources on the topics covered in lectures, see the references in individual lecture notes.

Open-Source Software Tools for Data-Centric AI

  • cleanlab - automatically detect problems in a dataset to facilitate ML with messy, real-world data
  • refinery - assess and maintain natural language data
  • great expectations - validate, document, and profile data for quality testing
  • ydata-profiling - generate summary reports of tabular datasets stored as pandas DataFrame
  • cleanvision - automatically detect low-quality images in computer vision datasets
  • albumentations - data augmentation for computer vision
  • label-studio - interfaces to label and annotate data for many ML tasks
  • llamaindex - a data framework for LLM applications (Retrieval-Augmented Generation)
  • dspy - algorithmically optimize LLM prompts and bootstrap data

Short Articles

Papers

Books

Links