Skip to content

zenml-io/awesome-open-data-annotation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 

Repository files navigation

🏷 Open Source Data Annotation & Labeling Tools

maintained-by-zenml

At ZenML we believe that annotation and labeling workflows are a core part of the machine learning lifecycle. As an open-source tool, we wanted to highlight and recognize the variety of tools that are available to help your workflows become more data-centric. We had three core criteria to decide whether a particular tool could make it into the list:

  • The tool has an open-source licence.
  • The tool is actively maintained.
  • The tool is functional and fit for purpose.

We welcome contributions to this list, so if you know of a tool that we've missed or if you've built one yourself, please do create a PR!

🔥 Do you use these tools or do you want to add one to your MLOps stack? At ZenML, we are looking for design partnerships and collaboration to develop the integrations and workflows around using annotation within the MLOps lifecycle. If you'd like to learn more, please join our Slack and leave us a message!

Contents

Multi Modal / Multi Domain

Name Description License
Acharya A Data Centric MLOps tool for your Named Entity Recognition projects ?
Adala An Autonomous Data (Labeling) Agent framework. Apache-2
Classifai A comprehensive open-source data annotation platform Apache-2
Computer Vision Annotation Tool (CVAT) A free, online, interactive video and image annotation tool for computer vision MIT
Data Annotator for Machine Learning (DAML) An application that helps machine learning teams facilitating the creation and management of annotations Apache-2
DataGym Open source annotation and labeling tool for image and video assets MIT
Diffgram Training Data (Data Labeling, Annotation, Workflow) for all Data Types (Image, Video, 3D, Text, Geo, Audio, more) at scale ELv2
Hover Explore and label on a map of raw data. Handles text, audio and images. MIT
Label Studio A multi-type data labeling and annotation tool with standardized output format Apache-2
Pigeon A simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort of your Jupyter notebook Apache-2
Tator Video analytics web platform AGPL-3
TornadoAi A human-in-the-loop machine learning framework AGPL-3
Universal Data Tool A web/desktop app for editing and annotating images, text, audio, documents and to view and edit any data defined in the extensible .udt.json and .udt.csv standard MIT
VGG Image Annotator (VIA) A standalone image annotator application packaged as a single HTML file (< 400 KB) that runs on most modern web browsers BSD-2
VIAME Video and Image Analytics for Multiple Environments Custom

Text

Name Description License
Annotation Lab An NLP annotation tool included in spark-nlp Apache-2
Argilla A production-ready Python framework for exploring, annotating, and managing data in NLP projects Apache-2
bulk Bulk is a quick developer tool to apply some bulk labels MIT
CoreNLP A Java suite of core NLP tools GPL-3
DataQA Labeling platform for text using weak supervision GPL-3
doccano An open source text annotation tool supporting text classification, sequence labeling and sequence to sequence tasks MIT
FLAT - FoLiA Linguistic Annotation Tool A web-based linguistic annotation environment based around the FoLiA format, an XML-based format for linguistic annotation GPL-3
INCEpTION A semantic annotation platform offering intelligent annotation assistance and knowledge management Apache-2
knodle Knodle (Knowledge-supervised Deep Learning Framework) Apache-2
NER Annotator for Spacy NER Annotator for SpaCy allows you to create training data for creating a custom NER Model with custom tags. MIT
NPLM Noisy Partial Label Model(NPLM) N/A
Potato An annotation framework with 20+ templates, editable UI, quality control, data management and an option to add a survey for crowdsourcing PolyForm Shield
refinery The data scientist's open-source choice to scale, assess and maintain natural language data. Apache-2
SMART A tool for building labeled training datasets for supervised machine learning tasks in NLP MIT
SpaCy annotator Spacy NER annotator using ipywidgets N/A
Small-Text Active Learning for Text Classification MIT
Snorkel Programmatically Build and Manage Training Data Apache-2
skweak skweak: Weak supervision for NLP MIT
TALEN A way to do annotations for NER Custom
YEDDA A lightweight collaborative text span annotation tool Apache-2
WeaSEL WeaSEL: Weakly Supervised End-to-end Learning Apache-2

Images

Name Description License
3D Slicer Visualization, processing, segmentation, registration, and analysis of medical, biomedical, and other 3D images and meshes BSD
Annotorious A JavaScript library for image annotation BSD-3
CATMAID The Collaborative Annotation Toolkit for Massive Amounts of Image Data GPL-3
COCO Annotator A web-based image segmentation tool for object detection, localization, and keypoints MIT
DeepLabel A cross-platform desktop image annotation tool for machine learning MIT
ilastik Segment, classify, track and count your cells or other experimental data Custom
ImageTagger An open source online platform for collaborative image labeling MIT
imglab A web based tool to label images for objects that can be used to train dlib or other object detectors MIT
KNOSSOS A software tool for the visualization and annotation of 3D image data and was developed for the rapid reconstruction of neural morphology and connectivity GPL-2
LabelFlow An open platform for image labeling Custom
labelme Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation) Custom
LabelImg A graphical image annotation tool and label object bounding boxes in images MIT
LOST A flexible web-based framework for semi-automatic image annotation MIT
Make Sense A free-to-use online tool for labeling photos GPL-3
OHIF Medical Imaging Viewer OHIF zero-footprint DICOM viewer and oncology specific Lesion Tracker MIT
OpenLabeler An open source desktop application for annotating objects for AI appplications Apache-2
Pixano A web-based smart-annotation tool for computer vision applications CeCILL-C
Scalabel A web-based visual data annotation tool, supporting both 2D and 3D data labeling Apache-2
webKnossos A fully cloud- and browser-based 3D annotation tool for distributed large-scale data analysis in light- and electron-microscopy based Connectomics AGPL-3

Video

Name Description License
DIVE Media annotation and analysis tools for web and desktop Apache-2

Audio

Name Description License
aubio A library for audio and music analysis GPL-3
audino Open source audio annotation tool MIT
Praat Annotation tool for phonetics analysis GPL-3
Peaks.js JavaScript UI component for interacting with audio waveforms LGPL-3
Wavesurfer.js Navigable waveform built on Web Audio and Canvas BSD-3

Time Series

Name Description License
sktime A framework for machine learning with time series BSD-3

Other

Name Description License
Encord Active Toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling Apache-2
OpenCRAVAT A modular annotation tool for genomic variants MIT
Personal Cancer Genome Reporter (PCGR) A stand-alone software package for translation of individual tumor genomes for precision cancer medicine MIT
Quepid Gather Human Judgements (aka Explicit Ratings) for Search Quality. Also a safe space to play with your search algorithm. Apache-2

Acknowledgements

Thanks to the creators of these other repositories (and this one!) for getting us going down the path of creating our own. I used these efforts to get started in my survey of the space before adding, updating and pruning as per the open-source and other criteria specified above.