Can't pronounce abbreviations #987

anonsquest · 2021-12-02T02:23:11Z

anonsquest
Dec 2, 2021

I'm using the builtin models.
All the models for en that I tried could not pronounce word abbreviations like IP , SDK, DNS, TCP, UDP, etc.
suggestions?

Answered by anuragshas

Dec 2, 2021

You need to preprocess the sentences for it, basic way of doing can be found here https://gtts.readthedocs.io/en/latest/tokenizer.html#pre-processing

View full answer

anuragshas · 2021-12-02T04:12:37Z

anuragshas
Dec 2, 2021

You need to preprocess the sentences for it, basic way of doing can be found here https://gtts.readthedocs.io/en/latest/tokenizer.html#pre-processing

1 reply

anonsquest Dec 2, 2021
Author

You need to preprocess the sentences for it, basic way of doing can be found here https://gtts.readthedocs.io/en/latest/tokenizer.html#pre-processing

Thanks. I got it to work now.

pncnmnp · 2022-11-25T22:05:10Z

pncnmnp
Nov 25, 2022

The accepted answer does not seem to work for quite a lot of cases. For instance, I am using tts_models/en/ljspeech/tacotron2-DDC and cases such as "NHL", "SCC", "TCP" are generated in a garbled manner.
So, I started looking for a hack. Apparently, there is a library called pronouncing which can generate phones for words:

>>> for i in "abcdefghijklmnopqrstuvwxyz":
...     (i, pronouncing.search("^"+" ".join(pronouncing.phones_for_word(i))+"$"))
... 
('a', [])
('b', ['b', 'b.', 'be', 'bea', 'bee'])
('c', ['c', 'c.', 'cie', 'sci', 'sea', 'see', 'si', 'sie', 'sieh', 'tse'])
('d', ['d', 'd.', 'de', 'dea', 'dee', 'di'])
('e', ['e', 'e.', 'ee'])
('f', ['f', 'f.'])
('g', ['g', 'g.', 'gee', 'je', 'jee', 'ji', 'jie'])
('h', ['h', 'h.'])
('i', ['ai', 'ay', 'aye', 'eye', 'i', 'i.'])
('j', ['j', 'j.', 'jae', 'jay', 'jaye'])
('k', ["'kay", 'cay', 'k', 'k.', 'kay', 'kaye', 'khe', 'quai', 'quay', 'quaye'])
('l', ['ehle', 'el', 'ell', 'elle', 'l', 'l.'])
('m', ['em', 'emme', 'm', 'm.'])
('n', ['en', 'n', 'n.'])
('o', ['au', 'aux', 'eau', 'eaux', 'o', "o'", 'o.', 'oh', 'ohh', 'ow', 'owe'])
('p', ['p', 'p.', 'pea', 'peay', 'pee'])
('q', ['cue', 'kew', 'kyu', 'q', 'q.', 'que', 'queue'])
('r', ['ahr', 'ar', 'are', 'our', 'r', 'r.'])
('s', ["'s", 'es', 'ess', 'esse', 's', 's.'])
('t', ['t', 't.', 'te', 'tea', 'tee', 'ti'])
('u', ['ewe', 'hugh', 'u', 'u.', 'uwe', 'yew', 'yoo', 'you', 'yu', 'yue'])
('v', ['v', 'v.', 've', 'vee', 'vi'])
('w', ['w', 'w.'])
('x', ['aix', 'eckes', 'ex', 'x', 'x.'])
('y', ['wai', 'why', 'wye', 'y', 'y.'])
('z', ['xie', 'z', 'z.', 'ze', 'zea', 'zee', 'zi'])

However, there is an issue here! Which phone should we select for each alphabet? I could not find an automated approach for this, so I came up with my own list:

         abbreviations = {
            "a": "ay",
            "b": "bee",
            "c": "sieh",
            "d": "dea",
            "e": "ee",
            "f": "eff",
            "g": "jie",
            "h": "edge",
            "i": "eye",
            "j": "jay",
            "k": "kaye",
            "l": "elle",
            "m": "emme",
            "n": "en",
            "o": "owe",
            "p": "pea",
            "q": "queue",
            "r": "are",
            "s": "esse",
            "t": "tea",
            "u": "hugh",
            "v": "vee",
            "w": "doub you",
            "x": "ex",
            "y": "why",
            "z": "zee",
        }

This seems to handle all the cases mentioned above.

0 replies

Nik-Kras · 2024-05-07T09:48:11Z

Nik-Kras
May 7, 2024

I use tts_models/en/multi-dataset/tortoise-v2 and for me the best solution so far was just putting dashes between letters of abbreviation. In this way, TTS read abbreviation letter-by-letter

just use this line of code to pre-process text with abbreviations:

re.sub(r'\b([A-Z]+)\b', lambda match: '-'.join(match.group(1)), text)

Example:
At the same time, the USSR undergoes the development of punitive psychiatry. -> At the same time, the U-S-S-R undergoes the development of punitive psychiatry.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't pronounce abbreviations #987

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Can't pronounce abbreviations #987

anonsquest Dec 2, 2021

Replies: 3 comments · 1 reply

anuragshas Dec 2, 2021

anonsquest Dec 2, 2021 Author

pncnmnp Nov 25, 2022

Nik-Kras May 7, 2024

anonsquest
Dec 2, 2021

Replies: 3 comments 1 reply

anuragshas
Dec 2, 2021

anonsquest Dec 2, 2021
Author

pncnmnp
Nov 25, 2022

Nik-Kras
May 7, 2024