Skip to content

AI4Bharat/DMU-DataDaan

Repository files navigation


Bhashini DataDaan

An open source platform to submit any kind of media files

License: MIT


Overview

Bhashini DataDaan is a portal/platform which enables any government entities or PSUs to submit any kind of media files (audio, video, text, pdf, etc). These can be transformed to rich datasets (Parallel, ASR, OCR, etc) which can be made available in ULCA and in parallel power the ML models.

General Requirements

  • The actual media files should be zipped (zip or gz)
  • Platform to support max size of 5GB zip file.
  • The metadata file format can be txt file (though it is a free text, we highly encourage to keep it structural & precise)

API Specs

The APIs used in DataDaan are specified as OpenAPI 3 under SwaggerHub Specs

DataDaan Architecture