Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on scan #1488

Open
y0d4a opened this issue Sep 12, 2023 · 5 comments
Open

Memory leak on scan #1488

y0d4a opened this issue Sep 12, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@y0d4a
Copy link

y0d4a commented Sep 12, 2023

When OpenVAS start to scanning

https://developer.hashicorp.com/boundary/docs
https://developer.hashicorp.com/vault

we are getting memory leak by process:

gb_log4j_CVE-2021-44228_http_web_dirs_active.nasl

This seems is some kind of bug.
If you can, replicate it, all software are free

@y0d4a y0d4a added the bug Something isn't working label Sep 12, 2023
@ArnoStiefvater
Copy link
Member

Hey @y0d4a

Thanks for creating the issue.

Just to make sure. You are scanning the Products "boundary" and "Vault" from haschicorp?

How do you know that the posted nasl script is the culprit? Do you have additional logs or reasons to believe so?

@y0d4a
Copy link
Author

y0d4a commented Sep 12, 2023

Hi, yes those two products.
I saw with monitoring the process, when he come to IPs with that services he scanning for very long time and on the end, he used all of memory and interrupt the scan (sometimes crash openvas service - when is run from docker)

@cfi-gb
Copy link
Member

cfi-gb commented Nov 29, 2023

Ref to two relevant community forum postings (for tracking purposes)

and a more recent issue posted over at greenbone/ospd-openvas#974

@wdoekes
Copy link

wdoekes commented Mar 20, 2024

In greenbone/ospd-openvas#974 I've commented my findings. Summarizing:

Example of "leak"

I dropped the db that seemed to be transient:

-server_time_usec:1709405329080351
-uptime_in_seconds:588
+server_time_usec:1709406156005681
+uptime_in_seconds:1415
...
 # Memory
-used_memory:3344099096
-used_memory_human:3.11G
-used_memory_rss:3001507840
-used_memory_rss_human:2.80G
+used_memory:171441592
+used_memory_human:163.50M
+used_memory_rss:194514944
+used_memory_rss_human:185.50M
...

 # Keyspace
 db0:keys=1,expires=0,avg_ttl=0
 db1:keys=177456,expires=0,avg_ttl=0
-db6:keys=3468,expires=0,avg_ttl=0

Dropping those 3400 entries in db6 freed a whopping 2GB.

The keys I freed look like:

$ sudo docker exec -it greenbone-community-container_redis-server_1 redis-cli -s /run/redis/redis.sock -n 6
redis /run/redis/redis.sock[6]> keys *
   1) "Cache/node.example.com/8200/excluding_404_body/URL_/ui/vcav-bootstrap/rest/vcav-providers/config.neon"
   2) "Cache/node.example.com/8200/excluding_404_body/URL_/ui/vcav-bootstrap/rest/WEB-INF/local.properties"
   3) "Cache/node.example.com/8200/excluding_404_body/URL_/ui/vropspluginui/rest/services/.env.example"

And it turned out that the contents were that of a HashiCorp Vault instance: any URL after /ui/ would return a 200 and about 700kB of html:

$ curl --fail -k https://vault.example.com:8200/ui/whatever -o/dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  786k  100  786k    0     0  4990k      0 --:--:-- --:--:-- --:--:-- 5009k

With 28000+ URLs scanned, this would quickly add up (about 350kB stored in redis per URL: 10GB).

Workarounds to keep redis from getting killed

Changes for community docker-compose.yml:

  redis-server:
    image: greenbone/redis-server
    command:
      # https://forum.greenbone.net/t/redis-oom-killed-on-one-host-scan/15722/5
      - /bin/sh
      - -c
      - 'rm -f /run/redis/redis.sock && cat /etc/redis/redis.conf >/run/redis/redis.conf && printf "%s\n" "maxmemory 12884901888" "maxmemory-policy allkeys-lru" "maxclients 150" "tcp-keepalive 15" >>/run/redis/redis.conf && redis-server /run/redis/redis.conf'
    logging:
      driver: journald
    restart: on-failure
    volumes:
      - redis_socket_vol:/run/redis/

The allkeys-lru above is wrong. You'll end up losing the important stuff in Keyspaces 0 and 1. Better is using volatile-ttl, but it doesn't do anything effectively as none of the stored items has a non-INF ttl. So for now, I went with noeviction.

The settings:

  • maxmemory 12884901888 12GB, adjust as needed
  • maxmemory-policy noeviction
  • maxclients 150 a single run with 6 simultaneous hosts and 3 simultaneous scans per host already does about ~40 open redis connections; tweak as appropriate
  • tcp-keepalive 15 not sure, copied from the forum

Workaround effects

Redis now won't die, but instead the users of redis report failures:

  • openvas dying with segfaults due to NULL pointer accesses:
    kernel: openvas[166925]: segfault at 0 ip 000055c611a13fe6 sp 00007ffe63419aa0 error 4 in openvas[55c611a13000+9000]
  • ospd.py dying because of redis refusing to do an LRANGE:
    redis.exceptions.OutOfMemoryError: Command # 1 (LRANGE internal/results 0 -1) of pipeline caused error: command not allowed when used memory > 'maxmemory'.

This also aborts the scan.

Workaround to reduce memory usage of redis

As reported elsewhere, the immediate culprit was "caching of web pages during CGI scanning".

An alternative fix that appears to work is this:

--- greenbone-community-container_vt_data_vol/_data/http_keepalive.inc.orig	2024-03-18 15:46:31.480951508 +0100
+++ greenbone-community-container_vt_data_vol/_data/http_keepalive.inc	2024-03-18 15:52:51.764904305 +0100
@@ -726,7 +726,8 @@ function http_get_cache( port, item, hos
     # Internal Server Errors (5xx)
     # Too Many Requests (429)
     # Request Timeout (408)
-    if( res !~ "^HTTP/1\.[01] (5(0[0-9]|1[01])|4(08|29))" )
+    # Size of response must be less than 1.5*64k
+    if( res !~ "^HTTP/1\.[01] (5(0[0-9]|1[01])|4(08|29))" && strlen( res ) < 98304 )
       replace_kb_item( name:"Cache/" + host + "/" + port + "/" + key + "/URL_" + item, value:res );
 
   }

This reduces the effectiveness of caching, but now all these large web results are not cached and memory stays well below 2GB even when running multiple scans simultaneously.

Better workarounds

Limiting caching to pages shorter than 96kB is a rather crude way. Better would be if we could make the limit more dynamic:

  • stopping caching of a run as soon as there is memory pressure;
  • flagging certain objects as less important (starting with a ttl for everything in Keyspaces above 1).

Right now I don't know of ways to get the current memory usage of a Keyspace from redis, but the library storing the values could record it itself in a separate redis key using INCRBY and maybe stop adding more to the cache once it hits a limit.

Links into the source / places to look when considering a fix:

@cfi-gb
Copy link
Member

cfi-gb commented Mar 20, 2024

Usage of the following might be also an option (AFAICT this needs adjustments to the redis-server.conf):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants