Skip to content

iapt-platform/pali_analysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pāli Analysis

What is this repository

This repository is some work related to pāliwiki, which is a Tipiṭaka reading, learning and translating platform.

Explanation of the folders

material

includes the materials for cook, they are basicly convert from others.

From pāliwiki:

  • pali_text.tgz is the full text of Tipiṭaka.

  • dict_parent.db is convert from sys_regular.db and sys_irregular.db.

    • The sys_regular.db is a regular inflection dictionary infered by all possible grammer rules. The parent of a word is not the stem nor the root. For example, "gam->gaccha(ti)->gacchanta->gacchato", each word's previous is called the parent of the word.
    • The sys_irregular.db is the dictionary of irregular word entered manually.
  • dict_compound.db is convert from pm.db and comp.db, it contains the split of each compound word.

    • pm.db is the pāli-myanmar dictionary.
    • comp.db is the results of splits using algorithm.

cook

includes preprocessing scripts:

  • parenting.py convert each word of pali_text.txt to its parent using dict_parent.db and dict_compound.db.

sentence

includes codes used for analysis the sentences in Tipiṭaka:

  • Get similar sentences using jaccard similarity.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.1%
  • Shell 1.9%