Can't pronounce abbreviations #987
-
I'm using the builtin models. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
You need to preprocess the sentences for it, basic way of doing can be found here https://gtts.readthedocs.io/en/latest/tokenizer.html#pre-processing |
Beta Was this translation helpful? Give feedback.
-
The accepted answer does not seem to work for quite a lot of cases. For instance, I am using >>> for i in "abcdefghijklmnopqrstuvwxyz":
... (i, pronouncing.search("^"+" ".join(pronouncing.phones_for_word(i))+"$"))
...
('a', [])
('b', ['b', 'b.', 'be', 'bea', 'bee'])
('c', ['c', 'c.', 'cie', 'sci', 'sea', 'see', 'si', 'sie', 'sieh', 'tse'])
('d', ['d', 'd.', 'de', 'dea', 'dee', 'di'])
('e', ['e', 'e.', 'ee'])
('f', ['f', 'f.'])
('g', ['g', 'g.', 'gee', 'je', 'jee', 'ji', 'jie'])
('h', ['h', 'h.'])
('i', ['ai', 'ay', 'aye', 'eye', 'i', 'i.'])
('j', ['j', 'j.', 'jae', 'jay', 'jaye'])
('k', ["'kay", 'cay', 'k', 'k.', 'kay', 'kaye', 'khe', 'quai', 'quay', 'quaye'])
('l', ['ehle', 'el', 'ell', 'elle', 'l', 'l.'])
('m', ['em', 'emme', 'm', 'm.'])
('n', ['en', 'n', 'n.'])
('o', ['au', 'aux', 'eau', 'eaux', 'o', "o'", 'o.', 'oh', 'ohh', 'ow', 'owe'])
('p', ['p', 'p.', 'pea', 'peay', 'pee'])
('q', ['cue', 'kew', 'kyu', 'q', 'q.', 'que', 'queue'])
('r', ['ahr', 'ar', 'are', 'our', 'r', 'r.'])
('s', ["'s", 'es', 'ess', 'esse', 's', 's.'])
('t', ['t', 't.', 'te', 'tea', 'tee', 'ti'])
('u', ['ewe', 'hugh', 'u', 'u.', 'uwe', 'yew', 'yoo', 'you', 'yu', 'yue'])
('v', ['v', 'v.', 've', 'vee', 'vi'])
('w', ['w', 'w.'])
('x', ['aix', 'eckes', 'ex', 'x', 'x.'])
('y', ['wai', 'why', 'wye', 'y', 'y.'])
('z', ['xie', 'z', 'z.', 'ze', 'zea', 'zee', 'zi']) However, there is an issue here! Which phone should we select for each alphabet? I could not find an automated approach for this, so I came up with my own list: abbreviations = {
"a": "ay",
"b": "bee",
"c": "sieh",
"d": "dea",
"e": "ee",
"f": "eff",
"g": "jie",
"h": "edge",
"i": "eye",
"j": "jay",
"k": "kaye",
"l": "elle",
"m": "emme",
"n": "en",
"o": "owe",
"p": "pea",
"q": "queue",
"r": "are",
"s": "esse",
"t": "tea",
"u": "hugh",
"v": "vee",
"w": "doub you",
"x": "ex",
"y": "why",
"z": "zee",
} This seems to handle all the cases mentioned above. |
Beta Was this translation helpful? Give feedback.
-
I use just use this line of code to pre-process text with abbreviations: re.sub(r'\b([A-Z]+)\b', lambda match: '-'.join(match.group(1)), text) Example: |
Beta Was this translation helpful? Give feedback.
You need to preprocess the sentences for it, basic way of doing can be found here https://gtts.readthedocs.io/en/latest/tokenizer.html#pre-processing