Reducing model size by converting to ORT format #416

ken107 · 2024-02-27T21:29:33Z

ken107
Feb 27, 2024

I tried reducing the model size of Piper models by converting to ORT format, but the resulting file has about the same size, if not slightly larger. Is it true that Piper models are already optimized?

I'm deploying to web and looking to minimize model size, but also potentially using a minimal build of onnxruntime. Currently the ort-wasm-simd-threaded.wasm file that needs to be downloaded to the browser is about 10MB.

Synthesizing in the browser works great, and fast enough for real time application even using CPU on old machines, but user will need to download roughly 10MB (onnxruntime) + 60MB (model). Thank you for any suggestions.

Answered by robertknight

May 16, 2024

Issue 1 is the topic of this thread. I'm wondering if Piper models are already optimized, or if their sizes can be further reduced. As for performing inferencing at the edge, we'd like the download size to be as small as possible

The ORT model format is mostly not about model file size but about being more efficient to load and requiring a smaller binary file size, because it is based on FlatBuffers, a format Google designed to enable efficient loading of resources in mobile games, rather than Protocol Buffers.

The optimizations mentioned in the doc you linked to are mainly about improving execution performance, usually by combining ("fusing") multiple steps of the model into a single s…

View full answer

ken107 · 2024-03-25T15:05:37Z

ken107
Mar 25, 2024
Author

https://piper.ttstool.com

Piper has been integrated into Read Aloud, and released as a separate extension as well.

The source code is here. Please help out if you can with some of the open issues.

Issue 1 is the topic of this thread. I'm wondering if Piper models are already optimized, or if their sizes can be further reduced. As for performing inferencing at the edge, we'd like the download size to be as small as possible

Issue 2 is need help compiling a JS/WASM version of piper-phonemize. It's the last piece of the puzzle needed for fully offline speech synthesis.

0 replies

robertknight · 2024-05-16T09:11:31Z

robertknight
May 16, 2024

Issue 1 is the topic of this thread. I'm wondering if Piper models are already optimized, or if their sizes can be further reduced. As for performing inferencing at the edge, we'd like the download size to be as small as possible

The ORT model format is mostly not about model file size but about being more efficient to load and requiring a smaller binary file size, because it is based on FlatBuffers, a format Google designed to enable efficient loading of resources in mobile games, rather than Protocol Buffers.

The optimizations mentioned in the doc you linked to are mainly about improving execution performance, usually by combining ("fusing") multiple steps of the model into a single step.

To reduce the model size you'd need to reduce the size of the weights, either by 16-bit floats or quantizing to 8-bit integers. Depending on how this is done it can affect quality. See https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html. I haven't tried quantizing Piper TTS models specifically, so I can't say how easy this will be.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing model size by converting to ORT format #416

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Reducing model size by converting to ORT format #416

ken107 Feb 27, 2024

Replies: 2 comments

ken107 Mar 25, 2024 Author

robertknight May 16, 2024

ken107
Feb 27, 2024

ken107
Mar 25, 2024
Author

robertknight
May 16, 2024