Need help with resolving long runtimes when Langfuse tracing is enabled #1788

pooja1423 · 2024-04-19T17:46:05Z

pooja1423
Apr 19, 2024

Hello, we are in the process of deploying Langfuse for our internal Langchain LLM pipelines that retrieve information from documents and we're running into long runtimes when Langfuse tracing in enabled. It might be the way we have it setup and would appreciate any feedback.

We're using the python SDK to setup the Langfuse client. A trace is created for each document and a callback handler is passed into the chain invocation using (trace.get_langchain_handler()). Each document is processed in a separate thread.

This pipeline for 100 documents takes 3 seconds, but the final flush() adds an additional 1+ minutes to the runtime. Here are things I have tried:

Increasing the number of threads on the langfuse client => This results in a lot of null traces
Using a custom client to filter out specific keys (like input, doc]) from post requests to reduce network I/O delays => Reduces the final flush time by 50%

The flush time linearly increases with the number of documents which is limiting us from scaling and iterating quickly. Do you have any suggestions or things we can try/setup differently? Thank you

marcklingen · 2024-04-21T18:38:14Z

marcklingen
Apr 21, 2024
Maintainer

Hi @pooja1423,

thanks for sharing you feedback!

Increasing the number of threads on the langfuse client => This results in a lot of null traces

This should not happen as increasing the number of threads is the best way to increase parallelism/throughput of the SDK. Do you self-host Langfuse? Can you share your project id in case you run on Langfuse cloud?

Using a custom client to filter out specific keys (like input, doc]) from post requests to reduce network I/O delays => Reduces the final flush time by 50%

Reducing the event size generally helps but should not be necessary unless you try to log really large events to Langfuse.

9 replies

v Apr 29, 2024

@marcklingen We dug into the code a bit more and figured out the cause of null traces.

null traces seem to be created in this code by span-update events:

langfuse/web/src/server/api/services/EventProcessor.ts

Lines 172 to 183 in 8cb9d5f

    
           const traceId = 
        
             !this.event.body.traceId && !existingObservation 
        
               ? // Create trace if no traceid 
        
                 ( 
        
                   await prisma.trace.create({ 
        
                     data: { 
        
                       projectId: apiScope.projectId, 
        
                       name: this.event.body.name, 
        
                     }, 
        
                   }) 
        
                 ).id 
        
               : this.event.body.traceId;

We are using langfuse-python and the LangchainCallbackHandler to log traces.

We notice that span-create events have a traceId in the payload

  {
      "id": "e1559d4a-e82a-4ccc-b510-cda572f65698",
      "type": "span-create",
      "body": {
        "traceId": "2e326c01-3540-42e3-a791-6f8cf093f6d4",
        "name": "RunnableSequence",
        "startTime": "2024-04-27T18:44:15.534162Z",
        "metadata": {
          "tags": [
            "map:key:exists"
          ],
          "session_id": "f14f23e097ad8a089ca6f8b7417addfa"
        },
        "input": {
          ...
        },
        "parentObservationId": "a5ebcb85-bcf1-4b09-95b1-cde020ebf1b5",
        "id": "2a10ddad-812f-4403-9e35-96a717fc659e"
      },
      "timestamp": "2024-04-27T18:44:15.534495Z"
    }

but span-update events don't:

  {
      "id": "6151f10f-b671-4744-867d-b2cec965f3a8",
      "type": "span-update",
      "body": {
        "output": true,
        "id": "2a10ddad-812f-4403-9e35-96a717fc659e",
        "endTime": "2024-04-27T18:44:15.607287Z"
      },
      "timestamp": "2024-04-27T18:44:15.607346Z"
    },

This works OK with one thread because langfuse is going to see a span-create before it sees the span-update, it'll be able to find it in the DB here:

langfuse/web/src/server/api/services/EventProcessor.ts

Lines 375 to 379 in a5321fb

    
           const existingObservation = this.event.body.id 
        
             ? await prisma.observation.findFirst({ 
        
                 where: { id: this.event.body.id }, 
        
               }) 
        
             : null;

But when you run the langfuse client with multiple threads, there is no guarantee that the span-create and span-update events are seen in order.

It seems unavoidable to me that you end up in an inconsistent state with the current architecture.

As an alternative, we looked at how we could improve the performance of langfuse when ingesting from a single thread. The big problem in the Langfuse server is that Langfuse currently makes SQL queries for each observation. We have an incomplete prototype that writes to the observations table in batches, and the performance is significantly better:

Before:

POST request with 260 events completes in 5s
POST http://localhost:6007/api/public/ingestion took 3.7955631249933504s with 1007.714KB 307 events
POST http://localhost:6007/api/public/ingestion took 8.681478124985006s with 1446.973KB 314 events
POST http://localhost:6007/api/public/ingestion took 9.812697500048671s with 825.628KB 335 events
Langfuse flush took 0m19s

After:

POST request with 260 events completes in 873ms
POST http://localhost:6007/api/public/ingestion took 1.0489602909656242s with 920.933KB 264 events
POST http://localhost:6007/api/public/ingestion took 0.7444410839816555s with 1653.679KB 400 events
POST http://localhost:6007/api/public/ingestion took 0.5013136669876985s with 705.703KB 292 events

The code for our prototype is here:
vaibhavbetteromics#1

What do you think of merging such an enhancement upstream? We're happy to clean up the code and add tests for it.

marcklingen Apr 29, 2024
Maintainer

Thanks for sharing and the detailed deep dive. The outcome should not depend on the order of the events that are sent to Langfuse, we need to fix this asap. Sorry for slow response during launch week.

It seems like the most robust change here would be to include the traceId in span updates made via the langchain integration that you use.

I hope that i just made the most general fix for this issue, I'll release this as soon as the CI passed. I'd appreciate your feedback if this fixes the issue: langfuse/langfuse-python#629

marcklingen Apr 29, 2024
Maintainer

By the way, potential fix released here: https://github.com/langfuse/langfuse-python/releases/tag/v2.27.3

Please let me know if this solves the issue

v May 2, 2024

Hey we tried this fix, and while it solves the problem with null traces, it doesn't improve the performance enough of logging traces for us.

We're proceeding to deploy this fork for now:
vaibhavbetteromics#1

We'll revisit after v3 is released to see if this fork becomes obsolete.

maxdeichmann May 3, 2024
Maintainer

Hi @v,

thanks for coding up the sample + providing the insights. I was wondering whether you are using Langchain streaming? If yes, another bottleneck is the tokenisation which takes very long + is memory heavy. I will take the change from here, make some improvements, and add it to our code base. Ill let you know once we have this merged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Need help with resolving long runtimes when Langfuse tracing is enabled #1788

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Langfuse

Need help with resolving long runtimes when Langfuse tracing is enabled #1788

pooja1423 Apr 19, 2024

Replies: 1 comment · 9 replies

marcklingen Apr 21, 2024 Maintainer

v Apr 29, 2024

marcklingen Apr 29, 2024 Maintainer

marcklingen Apr 29, 2024 Maintainer

v May 2, 2024

maxdeichmann May 3, 2024 Maintainer

pooja1423
Apr 19, 2024

Replies: 1 comment 9 replies

marcklingen
Apr 21, 2024
Maintainer

marcklingen Apr 29, 2024
Maintainer

marcklingen Apr 29, 2024
Maintainer

maxdeichmann May 3, 2024
Maintainer