[FR] Increase server resources - the scaling problem #4

HarikalarKutusu · 2022-03-30T01:03:28Z

Full STT inference is currently done on the server. Currently, once a client connects to the server, the server becomes busy and nobody else can use it, i.e. it does not scale at all. What are our possibilities to increase these capabilities?

We have mainly two variables: Client selected language and the connection, which is an uninterrupted stream...

1. NodeJS Threads / Clusters - flexible language

We might use cluster/thread mechanisms from NodeJS. One server with multiple cores can handle multiple audio streams. A connection will spawn a worker process, negociate with the client on a language and serve it until the connection is closed.

This would require a more powerful server and will be limited to the number of cores.

2. More servers - flexible language

We might use a server pool of low-cost/free servers. Clients can poll them until they find a free one. A new client might also scan all servers at the start in parallel to get their status?

This can scale better but will be limited to the number of mini-servers again.

3. Dedicated language servers - limited communication => shared server/process

We want only relevant commands to be transferred to the server, not the whole sounds in the environment continuously. This can be achieved by a push-and-speak type (walkie-talkie style) client configuration or by preprocessing voice activity on the client for sufficient data (to remove silences, background noise etc), or any other method, which should be another discussion topic. This way, the connection will be used for a relatively short time.

If we can do this, a language process (be it a mini-server or a worker process on a core) can share its STT capabilities with multiple users. We don't want to dedicate a server to a language thou, as some might become idle, some others congested. Instead:

New connection requesting language A => If there is a server/process for it, try to use it. If none, spawn it for that language.
Limit connections per process (slots) to N, if full spawn a new one.
A connection closed => free the slot. If no other connections let, free the worker from language dedication.

We might need to implement a buffering/queuing mechanism and fine-tune it to find the optimal number of slots.

More?

...

HarikalarKutusu · 2022-04-04T08:35:14Z

We implemented the server pool method (no. 2 above) as a start in #12

HarikalarKutusu added Discussion Discuss possible routes to solve the problem backend Server related labels Mar 30, 2022

HarikalarKutusu mentioned this issue Apr 4, 2022

Update to server pool use #12

Merged

HarikalarKutusu changed the title ~~Increase server resources - the scaling problem~~ [FR] Increase server resources - the scaling problem Apr 5, 2022

HarikalarKutusu mentioned this issue Apr 5, 2022

[FR] Preprocess voice in the client and only send data when spoken #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Increase server resources - the scaling problem #4

[FR] Increase server resources - the scaling problem #4

HarikalarKutusu commented Mar 30, 2022 •

edited

HarikalarKutusu commented Apr 4, 2022 •

edited

[FR] Increase server resources - the scaling problem #4

[FR] Increase server resources - the scaling problem #4

Comments

HarikalarKutusu commented Mar 30, 2022 • edited

HarikalarKutusu commented Apr 4, 2022 • edited

HarikalarKutusu commented Mar 30, 2022 •

edited

HarikalarKutusu commented Apr 4, 2022 •

edited