Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak #216

Open
boncea opened this issue Mar 14, 2019 · 11 comments
Open

Possible memory leak #216

boncea opened this issue Mar 14, 2019 · 11 comments

Comments

@boncea
Copy link

boncea commented Mar 14, 2019

Recently I had to investigate a memory leak in our system. Tracking it down, we realized that it's related to newrelic. After more investigation it seems like it's related to this bundle, but I didn't find yet what exactly is causing it. I have to mention that I'm using symfony 4.2.

Here is my configuration:

ekino_new_relic:
    enabled: true
    application_name: 'my_app'
    license_key: "my_license_key"
    twig: false
    exceptions: true
    deprecations: true
    http:
        enabled: true
        transaction_naming: controller
    monolog:
        enabled: true
        level: error
    commands:
        enabled: true

Here is a graph with memory consumption in Amazon ECS:
Screen Shot 2019-03-14 at 5 36 17 PM

The first part of the graph is with the configuration from above, the second small flat part is with both newrelic extension and ekino bundle disabled and the last flat part is with newrelic extension enabled and bundle disabled.

I'll continue investigating it, but please comment here if you have any idea.

@Nyholm
Copy link
Collaborator

Nyholm commented Mar 19, 2019

Thank you @boncea.

Try disabling some features of the bundle one by one. Always when taking about memory leaks, then Monolog is a usual suspect. Especially if you use "fingers_crossed" strategy with long running processes.

@boncea
Copy link
Author

boncea commented Mar 20, 2019

My first suspect was also monolog, but it still happened with it disabled. In the end, I had only http enabled and it still happened. After more digging into the bundle code, I ended to the conclusion that it needs to be related to setting the application name. More specifically, when the interpreter setApplicationName method is called, it forwards the request to newrelic_set_appname from newrelic agent. Here, they say something interesting:
Usually, when you change an application name, the agent simply discards the current transaction and does not send any of the accumulated metrics to the daemon.

I didn't find any more information from newrelic side about what "discarding" of the transaction would mean, but I'm pretty much sure that they don't actually remove it instantly. I enabled debug and monitored some logs for newrelic and I've seen logs like Info: removing "application-name" with run id "something" for lack of activity within 10m0s

Initially I tried to fix this in my fork, by ending the transaction before setting application name, like this (in https://github.com/ekino/EkinoNewRelicBundle/blob/master/Listener/RequestListener.php#L79):

if ($appName !== \ini_get('newrelic.appname')) {
    $this->interactor->endTransaction(!$this->config->getXmit());
    $this->interactor->setApplicationName($appName, $this->config->getLicenseKey());
}

Surprisingly for me, this didn't fix the issue, so now I think there are two possible reasons:

  1. newrelic_end_transaction doesn't actually remove instantly the transaction from memory (and I didn't find any other way in newrelic docs to "remove" it)
  2. we need to manually call newrelic_end_transaction if we set different application name (which I didn't try)

After I changed the config to not set the app name from there, but directly through ini file, the memory seems to stay quite flat again (I'll confirm after it runs for some more time).

@Nyholm
Copy link
Collaborator

Nyholm commented Mar 20, 2019

Hm. Maybe @astorm could help us a little here.

@astorm
Copy link

astorm commented Mar 20, 2019

@Nyholm @boncea Happy to take a look -- based on what I'm reading above I see three possible sources for the leak

  1. The Agent is leaking
  2. Something in ekino/EkinoNewRelicBundle is leaking
  3. Something about how @boncea is using ekino/EkinoNewRelicBundle or the agent is leaking

Nothing immediately pops out as a known issue -- so I have a few questions.

@boncea

After more digging into the bundle code, I ended to the conclusion that it needs to be related to setting the application name

Could you talk a bit more about what led you to this conclusion? Or would it be possible to share some base-level reproducible test case? (i.e. Symfony + EkinoNewRelicBundle + [small bit of code], or even better [some small bit of code with the agent].

@astorm
Copy link

astorm commented Mar 31, 2019

@boncea we did some testing this week and wern't able to recreate your issue with a stock Symfont 4.2 system using either the built in web server, or PHP-FPM. So if this is the agent's issue, it's some combination of non-stock PHP + the Agent that's causing the issue. Without more information we're at a bit of standstill. If you can provide a reproduction (Docker container?) where the problem occurs that would be super useful. Lacking that -- OS name/version, PHP version, and which PHP SAPI with which web server would be useful details to help us track this down.

Also, if you solved the problem yourself we'd love to know what you did.

Thanks!

@boncea
Copy link
Author

boncea commented Apr 1, 2019

Could you talk a bit more about what led you to this conclusion?

As I mentioned before, with the bundle disabled and with the agent enabled, the leak doesn't happen.
The leak still happened even when I disabled every listener except http, so with the following config:

ekino_new_relic:
    enabled: true
    application_name: 'application_name_different_than_in_ini_file'
    license_key: "my_license_key"
    twig: false
    exceptions: false
    deprecations: false
    http:
        enabled: true
        transaction_naming: controller
    monolog:
        enabled: false
        level: error
    commands:
        enabled: false

With the above config, we have only RequestListener and ResponseListener enabled. I was checking what those two are doing and based on my configuration, ResponseListener shouldn't do anything, while RequestListener should only do setApplicationName and setTransactionName. After that I found out that when setting a new application name, newrelic is discarding the current transaction and creates a new one (check the phpdoc for newrelic_set_appname). This is why I started focusing on this part. After I removed the custom application name from the config, the leak didn't happen anymore, so now I have the following config:

ekino_new_relic:
    enabled: true
    license_key: "my_license_key"
    twig: false
    exceptions: true
    deprecations: true
    http:
        enabled: true
        transaction_naming: controller
    monolog:
        enabled: true
        level: error
    commands:
        enabled: true

Unfortunately I cannot provide the dockerfile, but I use the following: PHP 7.2.16 with php-fpm and cgi-fcgi, Symfony 4.2.2, EkinoNewrelicBundle 2.0.2. We host the app on AWS ECS and the OS info is Amazon linux 1 4.14.88-72.73.amzn1.x86_64 #1 SMP Fri Dec 14 20:12:13 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@astorm
Copy link

astorm commented Apr 11, 2019

@boncea I've carved out some time to work on tracking this down this week, so hopefully we can get to the bottom of it. You're not the first person to report fishy behavior with newrelic_set_app_name, so I think we can rule ekino out.

I've reproduced the environment you mentioned above in EC2 (using PHP 7.2.17 instead of php 7.2.16) with a simple Symfony Controller that renders a twig endpoint. I'm driving traffic at that endpoint, and none of the PHP-FPM processes seem to be using an unusual amount of memory.

If possible, can you share the php extensions you're using (a phpinfo page should do that).

Can you talk a little bit about what your application does? Like -- is it making database queries? Working with other datastores? Are you using other New Relic API methods during the same request ? Do you know if it's the php-fpm processes that are eating up memory?

Tracking down these sorts of "don't happen for everyone" memory leaks and segmentation faults usually comes down to needing to reproduce the issue, so if there's any additional information you can share.

I'm also on the Symfony Slack if you want to reach out there and avoid the long communication cycles.

@boncea
Copy link
Author

boncea commented Apr 11, 2019

I have a health-check endpoint which basically checks (by doing ping) the connections to databases: mysql, mongodb and redis. This endpoint is called every four seconds. For some reason, in the first period after deployment, the leak was not that obvious as after 2-3 hours of running. Regarding the databases, I don't create any connection manually, but I test them by injecting the following services:

@doctrine.dbal.default_connection
@doctrine_mongodb.odm.default_connection
@cache.default_redis_provider

Answering to you other questions, I don't use any other NewRelic API methods than the ones from Ekino which I described previously. The application runs inside docker and the docker image extends the official php:7.2-fpm. As for the PHP extensions, I have

igbinary
intl
redis.so
mongodb.so
opcache
pdo pdo_mysql

@astorm I'll contact you on slack so that you can ask me other details.

@Nyholm
Copy link
Collaborator

Nyholm commented Apr 11, 2019

Excellent. Thank you both @boncea and @astorm.

Please remember to write a short summary here after you find/resolve the issue.

@astorm
Copy link

astorm commented Apr 28, 2019

👋 The slow motion train continues on this one. The team's found some possible memory handling problems around the edges of the New Relic daemon -- specifically with daemon restarts while an application's under heavy load. These may be the culprit for the above leaks. It's my understanding the team's planning a fix for a future (hopefully the next) agent release. I'll keep you'all posted as progress continues. (Something something spinning plates). Thanks for bringing this to the team's attention, it probably wouldn't have been caught otherwise!

@astorm
Copy link

astorm commented May 21, 2019

@Nyholm @boncea Hello! It's been 23 days, but we're back. There's been a new release of the PHP Agent for New Relic that fixes up some problems with memory handling around transaction start/stop -- specifically to address problem with newrelic_set_appname. https://docs.newrelic.com/docs/release-notes/agent-release-notes/php-release-notes/php-agent-870242

I have a working theory that this may also provide a fix for the strange memory usage patterns described above.

@boncea -- if we haven't lost the thread, I'm wondering if you could run your tests with the newest version of the PHP Agent and see if they clear up the memory leak you described earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants