Is there anything we can do about the JavaScript overhead that comes from predictions made in a loop? #8212

Vectorrent · 2024-03-16T07:37:36Z

I have a simple, stacked RNN, which predicts text in a loop, character-by-character. Here is a simplified version of that code.

No matter the size of my model, there is a constant ~150ms of latency associated with each prediction, per-layer. For reference:

layers	latency
[16]	150ms/token
[256]	150ms/token
[16, 16]	300ms/token
[256, 256]	300ms/token
[16, 16, 16]	450ms/token
[256, 256, 256]	450ms/token

Currently, I'm running this code in Node.js (on GPU), but I can confirm that the latency persists in WebGL as well.

Is there anything we can do to speed-up predictions here? Text generation is unbearably slow, to the point where TFJS is barely even useful for my task. Conversely, training is fast, even with big batches and many layers! Clearly, the issue comes from repeated calls to .predict(), and the overhead associated with each call. Is there any way to move this computation into the model, and return an entire sequence from a single prediction - rather than token-by-token, in a loop?

For reference, a Brain.js model of comparable size is able to predict an entire sequence of any length nearly instantaneously - via CPU, no less! Is there any way we could integrate such optimizations here?

Any advice would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

Vectorrent added the type:others label Mar 16, 2024

gaikwadrahul8 self-assigned this Mar 18, 2024

gaikwadrahul8 added the comp:node.js label Mar 18, 2024

gaikwadrahul8 assigned mattsoulanille Mar 20, 2024

gaikwadrahul8 added the stat:awaiting tensorflower label Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there anything we can do about the JavaScript overhead that comes from predictions made in a loop? #8212

Is there anything we can do about the JavaScript overhead that comes from predictions made in a loop? #8212

Vectorrent commented Mar 16, 2024 •

edited

Is there anything we can do about the JavaScript overhead that comes from predictions made in a loop? #8212

Is there anything we can do about the JavaScript overhead that comes from predictions made in a loop? #8212

Comments

Vectorrent commented Mar 16, 2024 • edited

Vectorrent commented Mar 16, 2024 •

edited