Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: use continue instead of break in nats fetch message loop #4943

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

atjhoendz
Copy link
Contributor

@atjhoendz atjhoendz commented Apr 16, 2024

Description

I found a memory leak issue in mage nats streaming pipeline, after I implemented it in production with more than 10 triggers of nats streaming pipeline.

I have fixed the memory issue by using "continue" instead of "break" in the NATS fetch message loop and updating the NATS close client method to wait for the result.

Before

  1. Start the dev server
    image

  2. Turn on the trigger, 10 nats push subscriber
    image

  3. Test publish 10.000 messages
    image

  4. The mage-ai-app-1 was killed due to insufficient resources
    image

  5. The memory continues to grow, leading to the termination of mage-ai-server-1 due to insufficient resources.
    image

After

  1. Start the dev server
    image

  2. Turn on the trigger, 10 nats push subscriber
    image

  3. Test publish 10.000 messages
    image

  4. Memory usage is not as high as before
    image
    image

Need Help

Based on the data I've collected, I've noticed a significant spike in CPU usage during message processing, and I'm not sure why this is happening and need your help to investigate the cause of the high CPU usage.

How Has This Been Tested?

  • Test publishing 10,000 messages. The pipeline code only prints NATS messages.

Checklist

  • The PR is tagged with proper labels (bug, enhancement, feature, documentation)
  • I have performed a self-review of my own code
  • I have added unit tests that prove my fix is effective or that my feature works
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation

cc:

… and update the nats close client method to wait for the result
@wangxiaoyou1993
Copy link
Member

@atjhoendz do you want to make this PR ready for review so that we can merge it?

@atjhoendz atjhoendz marked this pull request as ready for review April 17, 2024 02:12
@atjhoendz
Copy link
Contributor Author

atjhoendz commented Apr 17, 2024

@atjhoendz do you want to make this PR ready for review so that we can merge it?

Sorry I changed my mind, I think it would be better to investigate the high cpu usage first before merging.

@atjhoendz atjhoendz marked this pull request as draft April 17, 2024 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants