-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assert in HSH_Lookup with Varnish 7.2.1 #3879
Comments
No idea what's going on here but the file storage seems to prove highly unreliable these days on aarch64 servers. Could you try replacing your file storage with a 100GB malloc storage instead? Why did you add another 12GB malloc storage? And how is |
Unfortunately I "only" have 16GB of RAM on this machine. The idea was to split data in two:
/tmp in not in tmpfs but just in the standard disk partition |
Well I'm now running without the file storage, using -s malloc,12288m -s static=malloc,1024m However it seems I still have random panic issue:
|
Extra backtrace info:
|
FYI I've just upgraded the jemalloc lib (from 3.6.0 to 5.3.0), I'm monitoring if it has any impact on this issue (it should at least fix the : Error in munmap(): Invalid argument error) |
This removes the automatic check made only on Linux systems for jemalloc now that operating systems we care about ship a much more recent version than the 3.6 that is known to work really well with Varnish on x86_64 hardware. Of the platforms we provide packages for, only Ubuntu 18.04 (bionic) and RHEL 7 (via EPEL) ship jemalloc 3.6.0 and the rest already moved on to a 5.x series. It appears that jemalloc 3.6 on aarch64 results in unstable Varnish workloads while jemalloc 5 is known to generate a lot of waste with its default configuration with a highly threaded workload like Varnish would operate, based on feedback we have seen over years. From now on, jemalloc is found via pkg-config, only if it is explicitly requested at configure time. With explicit opt-in, the Linux-only check is gone and we solely rely on pkg-config to find the library. This can of course be overriden as usual at configure time: ./configure --with-jemalloc JEMALLOC_CFLAGS=... JEMALLOC_LIBS=... Or, if you don't like long command lines, via environment variables. Refs varnishcache#3867 Refs varnishcache#3879
I would also like to be interested to learn how it fares without jemalloc at all, see #3881. |
Thx, I'll also try without jemalloc at all! |
FYI Varnish has been running with the original -s malloc,12288m -s static=file,/tmp/varnish_storage.bin,100G and jemalloc 5.3.0 without any issue so far! |
Could you please try another build with jemalloc 3.6?
And then run varnishd with this jemalloc configuration in your environment: export MALLOC_CONF="abort:true,redzone:true,junk:true" |
Hi @dridi Just to be sure, release 7.2.1 should be fine for this test, no need to use develop? |
Hi @dridi FYI it's not possible to start varnish with jemalloc 3.6 & the debug config on arm because of the munmap issue: déc. 25 21:42:30 ip-172-29-97-79.eu-west-1.compute.internal varnishd[22282]: Version: varnish-7.2.1 revision NOGIT With jemalloc 5.3.0 it starts properly (but there's other issues after a while with instability / high CPU usage) |
Hi @dridi I'm now evaluating the 7.2.1 version compiled without jemalloc (+ varnish-modules) on a arm AWS graviton2 processor (compiled with gcc10 & mcpu=neoverse-n1 flags). |
Hi @dridi FYI 7.2.1 without jemalloc using patch from #3881 seems to work pretty well on AWS arm. I see a few spike in CPU usage during aggressive google crawls I didn't have before (x86_64 instances), but response time are good & stable so it's not really an issue (here it's an x2gd.large with 2 cores and 32GB of RAM) |
I confirm after more than 1 week of use everything is still really stable on ARM without jemalloc. |
Hi,
I'm using varnish with a malloc allocator and a file allocator for static files.
Settings are the following:
ExecStart=/usr/sbin/varnishd -a :80 -p workspace_session=1024 -p workspace_thread=4096 -p workspace_client=131072 -p workspace_backend=131072 -p listen_depth=16383 -p http_resp_hdr_len=65536 -p http_resp_size=98304 -p workspace_backend=131072 -p thread_pool_min=200 -p thread_pool_max=5000 -p feature=+http2 -P %t/%N/varnishd.pid -T localhost:6082 -f /etc/varnish/default.vcl -s malloc,12288m -s static=file,/tmp/varnish_storage.bin,100G
Expected Behavior
No crash
Current Behavior
I figured out varnish was complaining in the systemctl log.
Active: active (running) since lun. 2022-12-05 20:39:57 CET; 1 day 18h ago
déc. 06 21:25:33 varnishd[2813]: Child (8130) Started
déc. 06 21:25:33 varnishd[2813]: Child (8130) said Child starts
déc. 06 21:25:33 varnishd[2813]: Child (8130) said : Error in munmap(): Invalid argument
déc. 06 21:25:33 varnishd[2813]: Child (8130) said SMF.static mmap'ed 107374182400 bytes of 107374182400
déc. 07 07:59:26 varnishd[2813]: Child (8130) died signal=6
déc. 07 07:59:26 varnishd[2813]: Child (8130) Panic at: Wed, 07 Dec 2022 06:59:26 GMT
Assert error in HSH_Lookup(), cache/cache_hash.c line 426:
Condition((oc)->magic == 0x4d301302) not true....
So the first warning happened a day after starting the server. Then a few hours later, we can see the assert.
This is happening on a ARM architecture (t4g AWS servers) on Amazon Linux 2 + kernel 5.15.75-48.135
I'm using 4 vmod: directors, header, std & xkey
I'm fetching the binary from https://packagecloud.io/install/repositories/varnishcache/varnish72/script.rpm.sh & https://github.com/varnish/varnish-modules/releases/download/0.21.0/varnish-modules-0.21.0.tar.gz
My vcl_hash is not that complicated:
Any idea what could be wrong?
Best regards,
Jocelyn Fournier
The text was updated successfully, but these errors were encountered: