Hardware requirement for inference server #55

RJ · 2023-05-18T12:23:16Z

RJ
May 18, 2023

I've tried out the local speech recognition stuff by enabling multinet and controlling my home assistant entities.
In my limited testing, local multinet is not nearly as accurate as shipping the audio off to the inference server and letting whisper have at it (which seems excellent for accuracy so far!).

So, although neat, I doubt i'd use the on-device multinet speech recognition. Seems unlikely that will improve faster than whisper et al, which is already pretty great.

What sort of hardware is needed to run the inference server with whisper locally once you release it? I have a GTX 1080 8GB in a windows machine, state of the art ~8 years ago.. any good?

Answered by kristiankielhofner

May 18, 2023

Hello again! First, I want to thank you for your interest in Willow, you've been very helpful!

We will be releasing our highly optimized Willow Inference Server next week. Here are some basic benchmarks:

Device	Model	Beam Size	Speech Duration (ms)	Inference Time (ms)	Realtime Multiple
RTX 4090	large-v2	5	3840	140	27x
H100	large-v2	5	3840	294	12x
H100	large-v2	5	10688	519	20x
H100	large-v2	5	29248	1223	23x
GTX 1060	large-v2	5	3840	1114	3x
Tesla P4	large-v2	5	3840	1099	3x
RTX 4090	medium	1	3840	84	45x
GTX 1060	medium	1	3840	588	6x
Tesla P4	medium	1	3840	586	6x
RTX 4090	medium	1	29248	377	77x
GTX 1060	medium	1	29248	1612	18x
Tesla P4	medium	1	29248	1730	16x
RTX…

View full answer

kristiankielhofner · 2023-05-18T12:37:39Z

kristiankielhofner
May 18, 2023
Maintainer

Hello again! First, I want to thank you for your interest in Willow, you've been very helpful!

We will be releasing our highly optimized Willow Inference Server next week. Here are some basic benchmarks:

Device	Model	Beam Size	Speech Duration (ms)	Inference Time (ms)	Realtime Multiple
RTX 4090	large-v2	5	3840	140	27x
H100	large-v2	5	3840	294	12x
H100	large-v2	5	10688	519	20x
H100	large-v2	5	29248	1223	23x
GTX 1060	large-v2	5	3840	1114	3x
Tesla P4	large-v2	5	3840	1099	3x
RTX 4090	medium	1	3840	84	45x
GTX 1060	medium	1	3840	588	6x
Tesla P4	medium	1	3840	586	6x
RTX 4090	medium	1	29248	377	77x
GTX 1060	medium	1	29248	1612	18x
Tesla P4	medium	1	29248	1730	16x
RTX 4090	base	1	180000	277	648x (not a typo)

We use "medium 1" by default (very high quality as you've experienced) and as you can see here a GTX 1060 can do 3.8s of speech in 588ms. The realtime multiple climbs significantly with longer segments. You will be more than fine with your GTX 1080, and another thing to know is we've optimized VRAM usage heavily (thanks to ctranslate2) so all three models we support - base, medium, and large loaded simultaneously occupy less than 3gb of VRAM so your GTX 1080 will still be usable for other tasks when not handling Willow requests. We load all three models because we support Willow selecting the model and other parameters via URI parameters in the request.

One caveat for you, though. You will need to use WSL as the Willow Inference Server is Linux only.

0 replies

RJ · 2023-05-18T13:06:51Z

RJ
May 18, 2023
Author

Fab, thanks. I will stop browsing expensive graphics cards then.. The windows machine is for occasional games, but now I have a reason to try out WSL.

Leaving open or it hides from the default Discussion view. feel free to close if you want.

0 replies

kristiankielhofner · 2023-05-18T13:24:17Z

kristiankielhofner
May 18, 2023
Maintainer

This is all great stuff and unless things get ugly we'll be leaving discussions open.

Thanks again!

0 replies

gitalpaca · 2023-05-26T15:07:47Z

gitalpaca
May 26, 2023

@kristiankielhofner Would it be possible to run the WIS on a Jetson Nano? I know it would be relatively slow compared to the listed GPUs, but would it be faster than CPU only? I have one sitting idle and it would be nice to have a small self contained device to run the WIS on if possible.

4 replies

kristiankielhofner May 26, 2023
Maintainer

I have a Jetson Nano myself and there are a few issues that prevent it from working with WIS:

Nvidia has essentially given up on them and no longer provides software updates. Because of this, the most recent release has a much older version of CUDA we don't (can't) support.
It doesn't have the memory anyway.
It would be shockingly slow.

The Jetson series is really useful when it works. However, at the higher end and newer hardware you would need for WIS the economics just don't work out - you could get an entire machine with a much faster Nvidia GPU for less.

gitalpaca May 26, 2023

Haha, I was afraid that would be the answer but I thought I'd ask anyway. Thanks for humoring me. I guess I'll just have to find room for a P4 somewhere.

rgregrowe Jun 3, 2023

Any thoughts as to whether a Tesla P4 would work OK using an eGPU (either M.2 or Thunderbolt) on an Intel NUC ? I've retired all my desktop and server hardware in favor of PI's and NUC's. Trying to save on power, space, and fan noise.

kristiankielhofner Jun 4, 2023
Maintainer

A Tesla P4 is just a GPU after all, I can't think of anything that would prevent use as an eGPU.

ccbadd · 2023-07-04T17:03:26Z

ccbadd
Jul 4, 2023

My understanding is that a Google Coral can be used for voice. Any chance it could be supported to give a low cost, low power, less space occupying option?

1 reply

kristiankielhofner Jul 4, 2023
Maintainer

The Coral can't be used for voice.

jcuccia · 2023-07-04T22:30:54Z

jcuccia
Jul 4, 2023

Will something like the Nvidia Tesla K80 work with the Willow Inference Server?

1 reply

kristiankielhofner Jul 4, 2023
Maintainer

The Tesla K80 is Kepler (very, very old).

The lowest supported family we recommend is Pascal so something like a Tesla P4, GTX 1070, etc.

justynbell · 2023-10-04T00:40:10Z

justynbell
Oct 4, 2023

Would the GTX 1050ti work? Pascal architecture, has 4GB DDR5, but it's biggest pro is that it doesn't need external power (if I'm not mistaken). Perfect for a small home server that I might not have enough power for.

2 replies

kristiankielhofner Oct 4, 2023
Maintainer

A GTX 1050 ti will technically work with one caveat: we have a yet-to-be merged PR for WIS that provides more fine-grained configuration of loaded models. 4GB VRAM isn't quite enough to load all WIS models but once this is merged you can load only the models you need and WIS will fit in 4GB VRAM. That said anyway you slice it 4GB of VRAM is very low and it doesn't leave you much headroom for future WIS functionality (with additional models).

Do you already have this card or will you be purchasing it for WIS? If you don't already have it another good option that according to eBay is almost exactly the same price (at least in the US) is the Tesla P4, which many Willow users use successfully (including two of our developers). It has twice as much VRAM as the GTX 1050 ti and is likely also a little faster, although I don't have a GTX 1050 ti so I can't confirm that. It doesn't require additional power beyond the PCIe slot and it's actually a bit smaller, uses a single slot, and is lower profile in general.

The only concern might be that it uses passive cooling so depending on the density, ambient temperature, etc it may run "hot". The good news there is with WIS the card runs at idle the vast majority of the time and only ramps up in power for a few hundred milliseconds at a time when you have a voice session. We haven't had any reports of users having cooling issues with the Tesla P4 but you may want to get an idea of the general temperature of the system you're installing it in if you decide to go that route.

justynbell Oct 4, 2023

Do you already have this card or will you be purchasing it for WIS?

I don't have the card, so thank you for the Testla P4 suggestion! I don't know much outside of the consumer "gaming" cards so telling me exactly what to get is what I needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardware requirement for inference server #55

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Hardware requirement for inference server #55

Replies: 7 comments · 8 replies

kristiankielhofner May 18, 2023 Maintainer

RJ May 18, 2023 Author

kristiankielhofner May 18, 2023 Maintainer

kristiankielhofner May 26, 2023 Maintainer

kristiankielhofner Jun 4, 2023 Maintainer

kristiankielhofner Jul 4, 2023 Maintainer

kristiankielhofner Jul 4, 2023 Maintainer

kristiankielhofner Oct 4, 2023 Maintainer

Replies: 7 comments 8 replies

kristiankielhofner
May 18, 2023
Maintainer

RJ
May 18, 2023
Author

kristiankielhofner
May 18, 2023
Maintainer

kristiankielhofner May 26, 2023
Maintainer

kristiankielhofner Jun 4, 2023
Maintainer

kristiankielhofner Jul 4, 2023
Maintainer

kristiankielhofner Jul 4, 2023
Maintainer

kristiankielhofner Oct 4, 2023
Maintainer