Skip to content

pattaprateek/Kathabhidhana

 
 

Repository files navigation

Kathabhidhana: Audio recording for Odia Wiktionary

What comes to your mind when you think of a dictionary? A huge boring book that you never wanted to open? Or may be a mobile app that you open while struggling with understanding a few new words in any write-up? But think of a dictionary that also pronounces the words rather than just showing them in [International Phonetic Alphabet] (https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) (IPA).

Wikipedia has a sister project called Wiktionary. And it's multilingual. Though it's easier to search any word in Google with a suffix "meaning" to hear the pronunciation of the word, there are not many [open-licensed] (https://en.wikipedia.org/wiki/Open_Content_License) audio recordings that you can hear, download, and even use for your own work. Kathabhidhana is a community project led by [Subhashish Panigrahi] (http://meta.wikimedia.org/wiki/User:Psubhashish/) to create an open source solution for recording large chunks of words and then uploading them under open licenses so that they can useful for projects like Wiktionary. The project draws its inspiration largely from another open source [software] (https://github.com/tshrinivasan/voice-recorder-for-tawictionary) created by by [Shrinivasan T] (https://github.com/tshrinivasan).

Currently several Odia-language words are being [recorded] (https://commons.wikimedia.org/wiki/Category:Odia_pronunciation), uploaded on [Wikimedia Commons] (https://commons.wikimedia.org), and are being used in [Odia Wiktionary] (https://or.wiktionary.org), the Odia-language version of Wiktionary. The purpose of creating this audio library is multi-folded—apart from using them on Wiktionary, we also aim at using them for any [Natural Language Processing] (http://en.wikipedia.org/wiki/Natural Language Processing) (NLP) project (and you are free to use [with attribution] (https://github.com/OdiaWikimedia/Kathabhidhana/blob/master/README.md#attribution) any resource available in this page).

alt tag

An Odia version of the resources and tutorial is available here. We are currently working on building more tutorials so that you can learn more about bettering your home studio setup—assuming you don't have access to a fancy recording studio but if you have please do leverage that, tips and tricks about batch renaming files, cleaning up using open source tools like [Audacity] (http://www.audacityteam.org/download/), setting up files for batch upload on Wikimedia Commons, etc. So stay tuned.

Prerequisites

  • Linux or macOS
  • Linux running in a virtual machine

or

For Kathabhidhana for iOS

  • iOS
  • An app called Workflow

How to execute?

(you need to run the command in Linux or Mac, or Linux in a [virtual machine] (https://en.wikipedia.org/wiki/Virtual_machine) if you're on Windows) [Read in Odia] (https://goo.gl/hqXeG3)

  1. Fill the words you want to recoed in a textfile named "file"
  2. run the below command

python voice-record.py 2> err

this will record the sounds in ogg and wav formats.

  1. To upload all the ogg files to Wikimedia Commons

  2. a) Edit the file mediawiki-uploader.py Fill the commons api url, username and password

  3. b) run the below command python mediawiki-uploader.py

Attribution

Other resources

Shoutouts

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%