IPA transcription tokenizer
from ipa_tokenizer.tokenizer import tokenize
tokens = tokenize("ˈtoʊ.kən.aɪz", language="en")
print(tokens)
# ['t', 'oʊ', 'k', 'ə', 'n', 'aɪ', 'z']
Copyright 2023 Levi Gruspe
This repository contains some data files that are derived from works that are licensed under CC BY-SA 3.0 licenses. The copyright of the original works belong to their authors. PHOIBLE 2.0 is by Steven Moran and Daniel McCloy. Wiktionary is by its editors and contributors.
Derivative works:
tools/data/phoible.csv
- based on PHOIBLE (Glottocode and ISO639-3 code columns)
tools/data/wiktionary.txt
- based on the Wiktionary language list
ipa_tokenizer/inventories.csv
- based on PHOIBLE
ipa_tokenizer/languages.json
- based on PHOIBLE and the Wiktionary language list
These derivative works are made available under a CC BY-SA 3.0 license.