Skip to content

HarikalarKutusu/cv-tbox-metadata-viewer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Common Voice Metadata Viewer - Common Voice ToolBox

WepApp for examining Common Voice metadata in the timeline, using tables and graphs.

cv-tbox-metadata-viewer-1

Description

Metadata for Mozilla Common Voice releases is published as separate JSON files after each release in a separate GitHub repository (see Common Voice Dataset). They contain basic but important aggregated information which dataset engineers, community managers, or trainers frequently reference. On the other, hand accessing this information on a GitHub repo and finding what you are looking for, in multiple large JSON files is very time-consuming, especially if you work on multiple languages, and multiple versions and want to compare them.

Common Voice Metadata Viewer tries to solve this problem. It uses a flattened dataframe file generated by the Common Voice Toolbox utility and uses it like an embedded "database" to interactively show the information. There is no backend/server. Whenever Mozilla Common Voice makes a new dataset release, this utility will follow with a new release.

With this utility, you can:

  • View the data in table format or as graphs.
  • Filter the versions you are interested in
  • Select some languages and make some comparative analysis.
  • See how a language is doing across versions and/or wrt to other languages.
  • See the overall totals of the Common Voice project

If you need more detailed data on a particular dataset, you can use the sister app Common Voice Dataset Analyzer (Beta Mirror).

Working Version

A working (beta) version is here for your use: Common Voice Metadata Viewer (Beta Mirror).

More

Setting a development environment

  • Clone the repo and cd into it
  • Enter into the web directory
  • Run npm install to get the dependencies
  • Run npm start to run on local
git clone https://github.com/HarikalarKutusu/cv-tbox-metadata-viewer.git
cd cv-tbox-metadata-viewer
cd web
npm install
npm start

TODO

  • Annotate and export tables and graphs (annotation is missing)
  • A graph creator for your own graphs
  • Add higher-level query tools, e.g. "What languages have between 100-200 hours of validated data".
  • Add grouping for major major language families.

The whole list is under the project in github. Please open issues or feature requests or make pull requests.