bug: When there is a large amount of traces, calculating model costs and usage is very slow and ultimately yields no results #2034

secsilm · 2024-05-11T03:22:20Z

Describe the bug

We have been using langfuse in the production environment, and have generated about 1.65 million traces in the past month. When I check the usage in the past week, the response time is still acceptable.

However, when I select the last month, the "model costs", "model usage", and "user consumption" sections take a long time to load (maybe 5-10 minutes?), and then the loading icon disappears without displaying any results. The CPU usage of postgres also surges.

Loading

CPU useage of postgres container

To reproduce

Generate a large number of traces and then view the dashboard.

SDK and container versions

self-hosting Langfuse: 2.38.0

Additional information

No response

Are you interested to contribute a fix for this bug?

No

marcklingen · 2024-05-11T11:36:51Z

We’re currently preparing for v3 to address these performance issues of analytical queries: https://github.com/orgs/langfuse/discussions/1902

In the meantime, db IOPS is mostly the bottleneck that you could try to improve to make this faster

arthurGrigo · 2024-05-20T15:46:46Z

+1

I also have the feeling that there is quite some latency after a prompt chain completed and the full trace is available in the UI.

marcklingen · 2024-05-20T22:13:31Z

+1

I also have the feeling that there is quite some latency after a prompt chain completed and the full trace is available in the UI.

Interesting, this should not be the case as the SDKs by default flush out events every second. How long do you need to wait?

arthurGrigo · 2024-05-20T22:19:53Z

Have not measured it but sometimes 1 to 2 minutes for really complex prompt chains.

I should have mentioned that I run langfuse locally using docker-compose.

marcklingen · 2024-05-20T22:33:05Z

Have not measured it but sometimes 1 to 2 minutes for really complex prompt chains.

I should have mentioned that I run langfuse locally using docker-compose.

thanks for sharing. this will dramatically improve with langfuse v3. In the meantime you could tweak the behavior by increasing the number of threads (docs) as I assume the sdk here is backlogged with events to send to the api.

arthurGrigo · 2024-05-20T22:41:19Z

Have not measured it but sometimes 1 to 2 minutes for really complex prompt chains.

I should have mentioned that I run langfuse locally using docker-compose.

thanks for sharing. this will dramatically improve with langfuse v3. In the meantime you could tweak the behavior by increasing the number of threads (docs) as I assume the sdk here is backlogged with events to send to the api.

Thanks for the hint!
The docs say one should only use it if really necessary. Are there any known drawbacks or bugs when increasing the number of threads?

marcklingen · 2024-05-20T22:43:33Z

Have not measured it but sometimes 1 to 2 minutes for really complex prompt chains.

I should have mentioned that I run langfuse locally using docker-compose.

thanks for sharing. this will dramatically improve with langfuse v3. In the meantime you could tweak the behavior by increasing the number of threads (docs) as I assume the sdk here is backlogged with events to send to the api.

Thanks for the hint! The docs say one should only use it if really necessary. Are there any known drawbacks or bugs when increasing the number of threads?

Performance, as it creates additional background threads and might need more time when you try to flush/shutdown (joining the threads). If this is bearable for you right now, you might better wait for v3 or just try how things improve with threads=2

secsilm added the 🐞❔ unconfirmed bug label May 11, 2024

marcklingen removed the 🐞❔ unconfirmed bug label May 15, 2024

marcklingen added bug Something isn't working performance blocked-v3 labels May 15, 2024 — with Linear

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: When there is a large amount of traces, calculating model costs and usage is very slow and ultimately yields no results #2034

bug: When there is a large amount of traces, calculating model costs and usage is very slow and ultimately yields no results #2034

secsilm commented May 11, 2024

marcklingen commented May 11, 2024

arthurGrigo commented May 20, 2024

marcklingen commented May 20, 2024

arthurGrigo commented May 20, 2024 •

edited

marcklingen commented May 20, 2024

arthurGrigo commented May 20, 2024

marcklingen commented May 20, 2024

bug: When there is a large amount of traces, calculating model costs and usage is very slow and ultimately yields no results #2034

bug: When there is a large amount of traces, calculating model costs and usage is very slow and ultimately yields no results #2034

Comments

secsilm commented May 11, 2024

Describe the bug

To reproduce

SDK and container versions

Additional information

Are you interested to contribute a fix for this bug?

marcklingen commented May 11, 2024

arthurGrigo commented May 20, 2024

marcklingen commented May 20, 2024

arthurGrigo commented May 20, 2024 • edited

marcklingen commented May 20, 2024

arthurGrigo commented May 20, 2024

marcklingen commented May 20, 2024

arthurGrigo commented May 20, 2024 •

edited