Skip to content

global-asp/pb-source

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Source stories from the Pratham Books collection in Markdown format

This repository makes available the source texts of open-licensed stories from Pratham Books in Markdown format.

Each folder in the repository represents a language, identified by its ISO 639-1 or ISO 639-3 code. Source translations into each language are stored in the appropriate folders.

All of these source texts have been extracted from the epub files available on the Storyweaver website. The markdown files in this repo provide data for many other projects, for example the translations in the Global Pratham Books Project, the PB Image Bank Explorer, as well as making possible the easy creation of bilingual storybooks in any language combination.

Corresponding images for the stories in this repository can be found in the Pratham Books Image Bank.

Format

The extracted source text of all stories has been provided here in Markdown format. See here for specific details about the format used.

A sequence of two hashes ## on a separate line indicates a page break.

Editing of the story content has been kept to a minimum and for the most part the stories are presented as they are. Corrections other than obvious errors of orthography or traces of the conversion process should be directed to Pratham Books through the Storyweaver website directly.

Languages

Pratham Books currently provides stories in 35 different languages. This repository attempts to provide the source text for all of these stories in machine- and human-readable Markdown format.

Below is a key to the languages covered by this repository and their ISO 639-1/3 codes.

ISO code Language Name
as Assamese
bn Bengali
en English
gu Gujarati
hi Hindi
kn Kannada
kok Konkani
kru Kurukh
ml Malayalam
mqu Mundari
mr Marathi
or Oriya
pa Punjabi
sa Sanskrit
sck Sadri
ta Tamil
te Telugu

License

All stories in this repository are Creative Commons licensed (CC-BY 4.0) with the exception of several stories that are Public Domain. The specific license for each story is indicated both in the metadata section at the bottom of each file, as well as in the corresponding README.md file for that language. Direct links to the original stories on the Pratham Books website can also be found in the README.md files.

Releases

No releases published

Packages

No packages published