Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Increase server resources - the scaling problem #4

Open
HarikalarKutusu opened this issue Mar 30, 2022 · 1 comment
Open

[FR] Increase server resources - the scaling problem #4

HarikalarKutusu opened this issue Mar 30, 2022 · 1 comment
Labels
backend Server related Discussion Discuss possible routes to solve the problem

Comments

@HarikalarKutusu
Copy link
Owner

HarikalarKutusu commented Mar 30, 2022

Full STT inference is currently done on the server. Currently, once a client connects to the server, the server becomes busy and nobody else can use it, i.e. it does not scale at all. What are our possibilities to increase these capabilities?

We have mainly two variables: Client selected language and the connection, which is an uninterrupted stream...

1. NodeJS Threads / Clusters - flexible language

We might use cluster/thread mechanisms from NodeJS. One server with multiple cores can handle multiple audio streams. A connection will spawn a worker process, negociate with the client on a language and serve it until the connection is closed.

This would require a more powerful server and will be limited to the number of cores.

2. More servers - flexible language

We might use a server pool of low-cost/free servers. Clients can poll them until they find a free one. A new client might also scan all servers at the start in parallel to get their status?

This can scale better but will be limited to the number of mini-servers again.

3. Dedicated language servers - limited communication => shared server/process

We want only relevant commands to be transferred to the server, not the whole sounds in the environment continuously. This can be achieved by a push-and-speak type (walkie-talkie style) client configuration or by preprocessing voice activity on the client for sufficient data (to remove silences, background noise etc), or any other method, which should be another discussion topic. This way, the connection will be used for a relatively short time.

If we can do this, a language process (be it a mini-server or a worker process on a core) can share its STT capabilities with multiple users. We don't want to dedicate a server to a language thou, as some might become idle, some others congested. Instead:

  • New connection requesting language A => If there is a server/process for it, try to use it. If none, spawn it for that language.
  • Limit connections per process (slots) to N, if full spawn a new one.
  • A connection closed => free the slot. If no other connections let, free the worker from language dedication.

We might need to implement a buffering/queuing mechanism and fine-tune it to find the optimal number of slots.

More?

...

@HarikalarKutusu HarikalarKutusu added Discussion Discuss possible routes to solve the problem backend Server related labels Mar 30, 2022
@HarikalarKutusu
Copy link
Owner Author

HarikalarKutusu commented Apr 4, 2022

We implemented the server pool method (no. 2 above) as a start in #12

@HarikalarKutusu HarikalarKutusu changed the title Increase server resources - the scaling problem [FR] Increase server resources - the scaling problem Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Server related Discussion Discuss possible routes to solve the problem
Projects
Status: Todo
Development

No branches or pull requests

1 participant