Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master service crashes after navigating to CH or EX section in cgi #888

Open
RaodenTheRed opened this issue Aug 20, 2020 · 2 comments
Open

Comments

@RaodenTheRed
Copy link

lfs-master service crashes after navigating to ch or ex sections in cgi server deployed on the master server. These links result in error:

Traceback (most recent call last):
File "/usr/share/mfscgi/mfs.cgi", line 981, in
availability, replication, deletion = cltoma_chunks_health(0)
File "/usr/share/mfscgi/mfs.cgi", line 194, in cltoma_chunks_health
goals = cltoma_list_goals()
File "/usr/share/mfscgi/mfs.cgi", line 187, in cltoma_list_goals
response = send_and_receive(masterhost, masterport, request, LIZ_MATOCL_LIST_GOALS, 0)
File "/usr/share/mfscgi/mfs.cgi", line 305, in send_and_receive
header = myrecv(s, 8)
File "/usr/share/mfscgi/mfs.cgi", line 285, in myrecv
raise RuntimeError, "socket connection broken"
RuntimeError: socket connection broken

master and chunk servers are running 3.12.0 on arm based systems (Odroid HC2s running Ubuntu 18.04 as the chunk servers and Odroid N2+ running 20.04 as the master and cgi server)

root@LFS-OdroidMaster:/etc/lizardfs# dpkg -l | grep lizard
ii lizardfs-cgi 3.12.0+dfsg-4ubuntu1 all LizardFS - CGI monitor
ii lizardfs-cgiserv 3.12.0+dfsg-4ubuntu1 arm64 simple CGI-capable HTTP server to run LizardFS CGI monitor
ii lizardfs-common 3.12.0+dfsg-4ubuntu1 all LizardFS - common files
ii lizardfs-master 3.12.0+dfsg-4ubuntu1 arm64 LizardFS - master server

Master logs show the following after crash:

root@LFS-OdroidMaster:/var/lib/lizardfs# service lizardfs-master status
● lizardfs-master.service - LizardFS master server daemon
Loaded: loaded (/lib/systemd/system/lizardfs-master.service; enabled; vendor preset: enabled)
Active: failed (Result: signal) since Wed 2020-08-19 19:06:40 CDT; 4s ago
Docs: man:lfsmaster
Process: 17300 ExecStart=/usr/sbin/lfsmaster -d start (code=killed, signal=ABRT)
Main PID: 17300 (code=killed, signal=ABRT)
Status: "mfsmaster daemon initialized properly."

Aug 19 19:06:37 LFS-OdroidMaster lfsmaster[17300]: master <-> chunkservers module: listen on :9420
Aug 19 19:06:37 LFS-OdroidMaster lfsmaster[17300]: master <-> tapeservers module: listen on (
:9424)
Aug 19 19:06:37 LFS-OdroidMaster lfsmaster[17300]: main master server module: listen on *:9421
Aug 19 19:06:37 LFS-OdroidMaster lfsmaster[17300]: open files limit: 32768
Aug 19 19:06:37 LFS-OdroidMaster lfsmaster[17300]: mfsmaster daemon initialized properly
Aug 19 19:06:37 LFS-OdroidMaster systemd[1]: Started LizardFS master server daemon.
Aug 19 19:06:39 LFS-OdroidMaster lfsmaster[17300]: mfsmaster[17300]: failed assertion 'std::distance(buffer.data(), destination) == (int32_t)>
Aug 19 19:06:39 LFS-OdroidMaster mfsmaster[17300]: failed assertion 'std::distance(buffer.data(), destination) == (int32_t)buffer.size()'
Aug 19 19:06:40 LFS-OdroidMaster systemd[1]: lizardfs-master.service: Main process exited, code=killed, status=6/ABRT
Aug 19 19:06:40 LFS-OdroidMaster systemd[1]: lizardfs-master.service: Failed with result 'signal'

journalctl -xe output:

Aug 19 19:36:46 LFS-OdroidMaster lfsmaster[17572]: terminate called after throwing an instance of 'std::bad_alloc'
Aug 19 19:36:46 LFS-OdroidMaster lfsmaster[17572]: what(): std::bad_alloc
Aug 19 19:36:46 LFS-OdroidMaster systemd[1]: lizardfs-master.service: Main process exited, code=killed, status=6/ABRT
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support

-- An ExecStart= process belonging to unit lizardfs-master.service has exited.

-- The process' exit code is 'killed' and its exit status is 6.
Aug 19 19:36:46 LFS-OdroidMaster systemd[1]: lizardfs-master.service: Failed with result 'signal'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support

-- The unit lizardfs-master.service has entered the 'failed' state with result 'signal'.

uname on master node:
root@LFS-OdroidMaster:/etc/lizardfs# uname -a
Linux LFS-OdroidMaster 4.9.230-93 #1 SMP PREEMPT Thu Jul 23 18:32:40 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

@RaodenTheRed RaodenTheRed changed the title Master service crashes after navigating to CH or section in cgi Master service crashes after navigating to CH or EX section in cgi Aug 20, 2020
@Zorlin
Copy link

Zorlin commented Aug 20, 2020

off-topic, the HC2 is an awesome unit for MooseFS/LizardFS. I've got a cluster of 4. Love them.

@fogti
Copy link

fogti commented Aug 22, 2020

The assertion failed line seems to come from

sassert(std::distance(buffer.data(), destination) == (int32_t)buffer.size());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants