openvas stuck with 100% CPU in destroy_scan_globals #1271

dhedberg · 2022-12-19T11:47:09Z

We're on the latest greenbone/ospd-openvas:stable, dated 2022-12-12T14:39:05.303393489Z, and we're seeing an issue where the openvas processes hang around after the tasks have been marked as done, taking up 100% of a CPU core.

The container is running in kubernetes with a deployment converted (with some small adaptations) from the community container docker-compose file.

Example:

root        1860 60.9  0.1 11574221292 29444 ?   tNL  Dec18 510:51 openvas --scan-start f59ff7de-7f35-4435-affe-bc7ed162b5bd

The task is already marked as finished. Attaching with gdb gives:

(gdb) bt
#0  0x00007f39f099ef1e in resize_set (reallocated_buckets_bitmap=0x7472682d5010, old_size=2660584736, hash_table=0x563b9e954760) at ../../../glib/ghash.c:834
#1  g_hash_table_resize (hash_table=0x563b9e954760) at ../../../glib/ghash.c:888
#2  0x00007f39f09a0d3e in g_hash_table_destroy (hash_table=0x563b9e954760) at ../../../glib/ghash.c:1488
#3  0x00007f39f0aa18c0 in destroy_scan_globals () from /usr/local/lib/libopenvas_misc.so.22
#4  0x0000563b9d621107 in openvas ()
#5  0x00007f39f0591d0a in __libc_start_main (main=0x563b9d61cd50 <main>, argc=3, argv=0x7ffe314163d8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe314163c8)
    at ../csu/libc-start.c:308
#6  0x0000563b9d61cd8a in _start ()

it seems to be stuck in:

=> 0x00007f39f099ef00 <+960>:	add    $0x1,%ecx
   0x00007f39f099ef03 <+963>:	add    %ecx,%edx
   0x00007f39f099ef05 <+965>:	and    %r12d,%edx
   0x00007f39f099ef08 <+968>:	mov    %edx,%eax
   0x00007f39f099ef0a <+970>:	mov    %edx,%r13d
   0x00007f39f099ef0d <+973>:	shr    $0x5,%eax
   0x00007f39f099ef10 <+976>:	and    $0x1f,%r13d
   0x00007f39f099ef14 <+980>:	lea    (%rdi,%rax,4),%r11
   0x00007f39f099ef18 <+984>:	mov    (%r11),%eax
   0x00007f39f099ef1b <+987>:	bt     %edx,%eax
   0x00007f39f099ef1e <+990>:	jb     0x7f39f099ef00 <g_hash_table_resize+960>

.. which I think corresponds to

          while (get_status_bit (reallocated_buckets_bitmap, hash_val)) \
            {                                                           \
              step++;                                                   \
              hash_val += step;                                         \
              hash_val &= hash_table->mask;                             \
            }                                                           \

in glib/ghash.c

That's as far as I have dug; I'm guessing it might be some sort of memory corruption or concurrency issue?

The text was updated successfully, but these errors were encountered:

ArnoStiefvater · 2023-05-09T05:15:17Z

Hey @dhedberg ,

Thanks for the bug report and sorry for the late reply.

We tried to replicate the issue but without success. We guess that this is a possible bug in glib.

Does this issue still happen or have you found a solution since then? Do you have some minimal reproducer, so we can replicate the issue?

dhedberg · 2023-05-09T10:30:21Z

Thanks for looking into this.

As far as I can tell from our logs and CPU usage monitoring, the issue still persists. We're working around the issue for now by regularly killing any lingering openvas --scan-start processes that matches tasks marked as Done in the API with a cronjob. We also automatically update to the latest stable images (at least) once a week.

I don't have a minimal reproducer at hand. We don't do that much customization; we create a custom port list from the IANA one with one port removed, a custom scan config from "Full and fast" with around 10 oids disabled, and have a few tasks which scan a bunch of IP ranges with ssh auth enabled. Usually there's only one task running at a time.

vedrixNL · 2023-09-11T12:13:07Z

Hi all, I seem to have the same issue.

OSPD[7] 2023-09-11 11:04:31,470: INFO: (ospd.command.command) Scan 3b93c961-e15f-460e-859a-15fa12c26368 added to the queue in position 2.
OSPD[7] 2023-09-11 11:04:40,670: INFO: (ospd.ospd) Currently 1 queued scans.
OSPD[7] 2023-09-11 11:04:41,325: INFO: (ospd.ospd) Starting scan 3b93c961-e15f-460e-859a-15fa12c26368.
OSPD[7] 2023-09-11 11:47:09,521: INFO: (ospd.ospd) 3b93c961-e15f-460e-859a-15fa12c26368: Host scan finished.
OSPD[7] 2023-09-11 11:47:09,528: INFO: (ospd.ospd) 3b93c961-e15f-460e-859a-15fa12c26368: Scan finished.

root@ospd-openvas:/ospd-openvas# ps fax
PID TTY STAT TIME COMMAND
33270 pts/0 Ss 0:00 bash
33286 pts/0 R+ 0:00 _ ps fax
1 ? Ss 0:00 /sbin/docker-init -- /usr/bin/tini -- /usr/local/bin/entrypoint ospd-openvas -f --config /etc/gvm/ospd-openvas.conf --mqtt-broker
6 ? S 0:00 /usr/bin/tini -- /usr/local/bin/entrypoint ospd-openvas -f --config /etc/gvm/ospd-openvas.conf --mqtt-broker-address mqtt-broker
7 ? Sl 3:19 _ /usr/bin/python3 /usr/local/bin/ospd-openvas -f --config /etc/gvm/ospd-openvas.conf --mqtt-broker-address mqtt-broker --notus
20 ? Sl 0:09 _ /usr/bin/python3 /usr/local/bin/ospd-openvas -f --config /etc/gvm/ospd-openvas.conf --mqtt-broker-address mqtt-broker --n
1133 ? RNL 3:50 openvas --scan-start 3b93c961-e15f-460e-859a-15fa12c26368

It seems to stay in memory and not cleaning. Restarting the docker process helps, but somewhere ospd-openvas is not giving the memory claim free.

R = running or runnable (on run queue)
N = low-priority (nice to other users)
L = has pages locked into memory (for real-time and custom IO)

Especially the last part L is happening. Is this a bug in ospd-openvas?
I've never experienced this issue before when using Debian 11.5 with same specs as below, except Debian 12

Debian 12 (bookworm)
8 vCPU
16 MEM
Used the docker solution from OpenVas
Latest images pulled and installed

Extra information:

Tasks: 169 total, 4 running, 165 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.8 us, 0.2 sy, 70.7 ni, 23.3 id, 0.0 wa, 0.0 hi, 2.5 si, 2.6 st
MiB Mem : 15823.6 total, 249.8 free, 2185.4 used, 13855.3 buff/cache
MiB Swap: 8192.0 total, 8075.0 free, 117.0 used. 13638.3 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

1153249 docker 30 10 10.8t 16228 7556 R 100.0 0.1 9,57 openvas --scan-start 57278eb0-2a35-456b-8661-ec049570ea49
1153257 docker 30 10 10.8t 16240 7568 R 99.7 0.1 9,53 openvas --scan-start c0ce0ed3-6856-48d1-9321-86d0593afcd1
1153253 docker 30 10 10.8t 16220 7552 R 99.3 0.1 9,55 openvas --scan-start 3b420acc-947c-4fd3-a4f8-2e0b94465c9c

Scans keep on running infinitely. When using strace on the pid no output is shown. The process is not a zombie. Somewhere the process is not getting killed.

vedrixNL · 2023-09-12T08:13:45Z

I've tried with the default docker-compose file on a Debian 12 server. The VM is running on Amazon btw. Doesn't make a difference, because I used a Debian 11.5 before on Amazon (same architecture) and that one worked just fine. Same docker-compose file.

dhedberg added the bug Something isn't working label Dec 19, 2022

cfi-gb mentioned this issue Sep 4, 2023

ospd-openvas:22.4.6 - scan process never stop greenbone/ospd-openvas#884

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openvas stuck with 100% CPU in destroy_scan_globals #1271

openvas stuck with 100% CPU in destroy_scan_globals #1271

dhedberg commented Dec 19, 2022

ArnoStiefvater commented May 9, 2023

dhedberg commented May 9, 2023

vedrixNL commented Sep 11, 2023 •

edited

vedrixNL commented Sep 12, 2023 •

edited

openvas stuck with 100% CPU in destroy_scan_globals #1271

openvas stuck with 100% CPU in destroy_scan_globals #1271

Comments

dhedberg commented Dec 19, 2022

ArnoStiefvater commented May 9, 2023

dhedberg commented May 9, 2023

vedrixNL commented Sep 11, 2023 • edited

vedrixNL commented Sep 12, 2023 • edited

vedrixNL commented Sep 11, 2023 •

edited

vedrixNL commented Sep 12, 2023 •

edited