JavaScript heap out of memory during import #259

kierangirvan · 2024-02-02T10:03:27Z

Describe the bug
Whilst attempting to upload a large jtl file (1.1GB), the upload seems to work, but when the file is being processed (yellow icon in test report view), it never completes and an exception is thrown in the be to suggest we've run out of memory.

To Reproduce
Attempt to upload 1.1GB jtl file.

Expected behavior
The tesy results should eventually become visible in the jtlreporter fe

Screenshots
We are running this in AWS ECS on a fargate task, you can see that from 17:12 onwards the kpi file is being processed:

February 01, 2024 at 17:12 (UTC) {"level":"info","message":"Starting KPI file streaming and saving to db, item_id: 76dc39fc-a417-48d9-8d78-8f8a47a1df3a"}

Almost 90minutes later, the following exception is thrown by the be container:

February 01, 2024 at 18:46 (UTC) FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
February 01, 2024 at 18:46 (UTC) <--- JS stacktrace --->
February 01, 2024 at 18:46 (UTC) [17:0x7fc901654300] 72456860 ms: Scavenge (reduce) 2045.4 (2080.5) -> 2044.5 (2080.5) MB, 1.9 / 0.0 ms (average mu = 0.086, current mu = 0.001) allocation failure;
February 01, 2024 at 18:46 (UTC) [17:0x7fc901654300] 72456856 ms: Scavenge (reduce) 2045.4 (2080.5) -> 2044.5 (2080.5) MB, 1.8 / 0.0 ms (average mu = 0.086, current mu = 0.001) allocation failure;
February 01, 2024 at 18:46 (UTC) [17:0x7fc901654300] 72456853 ms: Scavenge (reduce) 2045.4 (2080.5) -> 2044.5 (2080.5) MB, 2.0 / 0.0 ms (average mu = 0.086, current mu = 0.001) allocation failure;
February 01, 2024 at 18:46 (UTC) <--- Last few GCs --->

The container is then marked unhealthy and is replaced by a new container. From what I can see we are not running hot on either CPU/memory on the task itself:

So I assume we need to set the JVM in question to have a bigger slice of the memory. Do you know what or how to set this?

The text was updated successfully, but these errors were encountered:

kierangirvan · 2024-02-02T10:24:56Z

Just doing some research, and it would seem there is an node env variable which can be set i.e.

export NODE_OPTIONS=--max_old_space_size=4096

See here for details > https://www.npmjs.com/package/increase-memory-limit

Do you think this is worth a try?

ludeknovy · 2024-02-02T10:43:22Z

Hi @kierangirvan!
It definitely looks like a memory issue.
Yes, it's worth giving export NODE_OPTIONS=--max_old_space_size=4096 a try, I guess.

Was there any other log message Starting KPI file streaming and saving to db?

There must be a memory leak somewhere, I suppose, although the overall design is to process the file in chunks. So I would like to know whether it failed during parsing/saving the data into the DB or during processing.

Also, did you consider streaming the data into the app while your test is running? That would reduce the amount of time spent on parsing the data significantly.
https://jtlreporter.site/docs/integrations/samples-streaming
https://jtlreporter.site/docs/integrations/jmeter#2-continuous-results-uploading

kierangirvan · 2024-02-02T11:00:34Z

Thanks for your quick response.

It does suggest it is attempting to save to the DB, this step takes for ages usually, but we've come to live with that, so the previous log entry before it runs out of memory is:

February 01, 2024 at 17:12 (UTC) 
{"level":"info","message":"Starting KPI file streaming and saving to db, item_id: 76dc39fc-a417-48d9-8d78-8f8a47a1df3a"}

We are using Taurus entirely for our test design i.e. we do not dip into jmx, its entirely yaml based, and I believe it is not possible to enable the backend listener with Taurus via yaml, you have to convert the whole scenario to jmx to achieve this which we really don't want to do.

ludeknovy · 2024-02-02T11:13:27Z

If there's no other log, then yes, there must be an issue somewhere here https://github.com/ludeknovy/jtl-reporter-be/blob/master/src/server/controllers/item/create-item-controller.ts#L184.

If it is possible to anonymize your .jtl file and share it with me, this way I could have a look and check if I would be able to spot the issue.

Oh, I see. I haven't checked the Taurus lately, but since they have blazemeter support, which I believe sends the data during test execution, there must be a way to achieve the same for any other tool.

ludeknovy · 2024-02-03T14:15:30Z

One more note to the 2) actually it seems to be possible if you do custom jmeter installation / copy the plugin into plugins folder? https://gettaurus.org/docs/JMeter/#JMeter-Location-Auto-Installation

kierangirvan · 2024-02-05T11:59:08Z

Thanks @ludeknovy

I think the issue boils down to our ability to call the backend listener within the yaml itself. We have purposely built everything in yaml (and not jmx), and I do not believe there is a way to call the backend listener within the taurus yaml configuration.

Regarding the out of memory issue itself - we have included the following node heap configuration and have now successfully uploaded a 1.1GB kpi file. We will run a few more uploads to be certain, but that seems to have done the trick.

NODE_OPTIONS=--max_old_space_size=4096

I will close this issue once we have successfully uploaded a few more tests in the next few days.

ludeknovy · 2024-02-05T12:23:26Z

Thanks for checking it.
If the executor used is jmeter, then it should work, I believe. But once I have a minute, I will test it myself.

ludeknovy · 2024-02-05T17:58:15Z

You were right. It seems that it's not possible in case no jmx is used. Unfortunately, that's due to the taurus design, I did not find an easy way to extend it, and forking it is does not look like a good idea.

milanpanik · 2024-02-09T08:27:07Z

I had the similar problem, I increased the memory but now I'm encountering problems with slow DB I guess :) attaching both cfg and error log of failed BE service. It seems that DB is busy therefore BE failed and it needs manual restart to work again.

ludeknovy · 2024-02-09T10:13:24Z

Hi @milan-panik !
From the provided information, it looks like a networking issue - the server could not connect to the database (eai_again - indicates a problem with dns resolution). And that resulted in backend failure — I will try to handle it so it would not crash the whole application.

milanpanik · 2024-02-09T10:30:35Z

It happens only during peak hours, i.e. when the batch of tests ends at the same time and lot of reports are being uploaded.

ludeknovy · 2024-02-09T10:38:37Z

@milan-panik
ok, would it be possible to provide logs from the jtl-reported-db service from time when this problem occurred?
Maybe it will help to understand what is going on.

milanpanik · 2024-02-09T10:46:10Z

be died at 00:45 on 2024-02-07. DB logs are:

ludeknovy · 2024-02-09T10:55:44Z

@milan-panik Thanks! I see in your config increased value for max_wal_size, does the issue occur with it as well? Or was it set afterwards? It looks like the load for the database is too high - you mentioned you have many test reports processed at the same time.

ludeknovy · 2024-02-09T11:00:04Z

Do you have enabled the option to delete samples after a report is generated?

ludeknovy · 2024-02-09T16:00:27Z

I've removed a vacuum query after samples purge—it was a way too heavy operation. By default, it's handled by autovacuum anyway. So if you had Delete sample data after processing enabled, changes in latest docker image should help.

milanpanik · 2024-02-13T08:04:52Z

Thank you Ludek. I'm bit lost though, has it already been released? bc I've checked releases and related changelog, and cannot find it

ludeknovy · 2024-02-14T08:37:32Z

@milan-panik it was not released yet, but it's available in latest image: novyl/jtl-reporter-be:latest

ludeknovy · 2024-02-19T08:23:36Z

@kierangirvan I've pushed a possible fix, but I would appreciate if you could test it and let me know.

kierangirvan · 2024-02-29T10:18:44Z

Thanks @ludeknovy

I'll get the latest build pushed out in the coming week and confirm if this has helped.

stale · 2024-03-30T19:48:56Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ludeknovy · 2024-04-12T15:57:44Z

I've found a memory leak. I've prepared a fix for it that will release the memory. But I need to change the whole solution - so the high memory usage would not even be there. However, that won't be possible without changing the DB docker image, as it needs to include the timescale toolkit - it will take some time to prepare the image as the HA version does not support ARM.

ludeknovy · 2024-04-23T12:03:58Z

I've prepared new docker images for the project: https://hub.docker.com/r/novyl/jtl-reporter-db
I think I have the proper fix ready, it will be released in v5 - and it will require some manual steps to upgrade from v4 (backup and restore the DB)

stale · 2024-05-26T13:00:48Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

kierangirvan assigned ludeknovy Feb 2, 2024

This was referenced Feb 9, 2024

Fix: Abort file processing when DB is not responsible ludeknovy/jtl-reporter-be#289

Merged

Replacing vacuum full ludeknovy/jtl-reporter-be#290

Merged

Removing vacuum query ludeknovy/jtl-reporter-be#291

Merged

ludeknovy mentioned this issue Feb 15, 2024

Clear tempBuffer after file processing to ensure the array is no long… ludeknovy/jtl-reporter-be#295

Merged

ludeknovy mentioned this issue Mar 1, 2024

Remove listeners and ensure memory cleanup in CSV processing ludeknovy/jtl-reporter-be#300

Merged

stale bot added the wontfix This will not be worked on label Mar 30, 2024

ludeknovy mentioned this issue Apr 3, 2024

Large file (2.2 GB) gets stuck when trying to upload #269

Closed

ludeknovy mentioned this issue Apr 12, 2024

Optimize memory usage in item data processing ludeknovy/jtl-reporter-be#317

Merged

stale bot closed this as completed Apr 23, 2024

ludeknovy reopened this Apr 23, 2024

stale bot removed the wontfix This will not be worked on label Apr 23, 2024

stale bot added the wontfix This will not be worked on label May 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JavaScript heap out of memory during import #259

JavaScript heap out of memory during import #259

kierangirvan commented Feb 2, 2024

kierangirvan commented Feb 2, 2024

ludeknovy commented Feb 2, 2024

kierangirvan commented Feb 2, 2024

ludeknovy commented Feb 2, 2024 •

edited

ludeknovy commented Feb 3, 2024

kierangirvan commented Feb 5, 2024

ludeknovy commented Feb 5, 2024

ludeknovy commented Feb 5, 2024

milanpanik commented Feb 9, 2024 •

edited

ludeknovy commented Feb 9, 2024

milanpanik commented Feb 9, 2024

ludeknovy commented Feb 9, 2024

milanpanik commented Feb 9, 2024

ludeknovy commented Feb 9, 2024 •

edited

ludeknovy commented Feb 9, 2024

ludeknovy commented Feb 9, 2024

milanpanik commented Feb 13, 2024

ludeknovy commented Feb 14, 2024

ludeknovy commented Feb 19, 2024

kierangirvan commented Feb 29, 2024

stale bot commented Mar 30, 2024

ludeknovy commented Apr 12, 2024

ludeknovy commented Apr 23, 2024

stale bot commented May 26, 2024

JavaScript heap out of memory during import #259

JavaScript heap out of memory during import #259

Comments

kierangirvan commented Feb 2, 2024

kierangirvan commented Feb 2, 2024

ludeknovy commented Feb 2, 2024

kierangirvan commented Feb 2, 2024

ludeknovy commented Feb 2, 2024 • edited

ludeknovy commented Feb 3, 2024

kierangirvan commented Feb 5, 2024

ludeknovy commented Feb 5, 2024

ludeknovy commented Feb 5, 2024

milanpanik commented Feb 9, 2024 • edited

ludeknovy commented Feb 9, 2024

milanpanik commented Feb 9, 2024

ludeknovy commented Feb 9, 2024

milanpanik commented Feb 9, 2024

ludeknovy commented Feb 9, 2024 • edited

ludeknovy commented Feb 9, 2024

ludeknovy commented Feb 9, 2024

milanpanik commented Feb 13, 2024

ludeknovy commented Feb 14, 2024

ludeknovy commented Feb 19, 2024

kierangirvan commented Feb 29, 2024

stale bot commented Mar 30, 2024

ludeknovy commented Apr 12, 2024

ludeknovy commented Apr 23, 2024

stale bot commented May 26, 2024

ludeknovy commented Feb 2, 2024 •

edited

milanpanik commented Feb 9, 2024 •

edited

ludeknovy commented Feb 9, 2024 •

edited