You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
v3.0.116 installed from ArchLinux|ARM repo, running kernel 4.14.180 on ODROID-HC2 (Samsung Exynos5422 ARM Cortex-A15/Cortex-A7)
1 master and 1 metadatalogger as above, and 3-4 chunk nodes running the same mfs and OS on Espressobins (Marvell Armada 3700LP (88F37200) ARM/Cortex A53) each with 3-4 2TB disks
All fs objects: 96233
Total space: 18 TiB
Free space: 7.0 TiB
RAM used: 281 MiB
I know, it's a weird installation. It's my home test bed, an attempt at an ultra low power (and low cost) moosefs installation. It's been running largely without problems for almost 5 years now.
Describe the problem you observed:
After upgrading from 3.0.112 to 3.0.116 mfs-master core dumps and crashes during normal operation. I can't find a trigger, but it takes longer to crash if left idle, but only minutes if you try to write or read some files. The longest I've kept it running was leaving it idle with 0 client mounts last night and it made it about 6 hours before core dumping.
The cluster was previously running 3.0.112 without error, and downgrading the system back to that version has, so far, fixed my problem. There appears to be something introduced in the code between these versions that doesn't play well with ARM's strict unaligned data access restrictions in the architecture.
Aug 20 01:18:29 imsal kernel: Alignment trap: not handling instruction edc40b0a at [<004f495c>]
Aug 20 01:18:29 imsal kernel: Unhandled fault: alignment exception (0x811) at 0xae16a81b
Aug 20 01:18:29 imsal kernel: pgd = e06d4000
Aug 20 01:18:29 imsal kernel: [ae16a81b] *pgd=b5570835
Aug 20 01:18:29 imsal kernel: audit: type=1701 audit(1660976309.486:67): auid=4294967295 uid=979 gid=979 ses=4294967295 pid=1165 comm="mfsmaster" exe="/usr/bin/mfsmaster" sig=7 res=1
Aug 20 01:18:29 imsal kernel: audit: type=1130 audit(1660976309.526:68): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@4-6109-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=succe>
Aug 20 01:18:29 imsal audit[1165]: ANOM_ABEND auid=4294967295 uid=979 gid=979 ses=4294967295 pid=1165 comm="mfsmaster" exe="/usr/bin/mfsmaster" sig=7 res=1
Aug 20 01:18:29 imsal audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@4-6109-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 20 01:18:29 imsal systemd[1]: Started Process Core Dump (PID 6109/UID 0).
-- Subject: A start job for unit systemd-coredump@4-6109-0.service has finished successfully
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A start job for unit systemd-coredump@4-6109-0.service has finished successfully.
--
-- The job identifier is 875.
Aug 20 01:19:04 imsal systemd-coredump[6113]: [LNK] Process 1165 (mfsmaster) of user 979 dumped core.
Module linux-vdso.so.1 with build-id 88dc578fb02d73083639658edaf67047678a0f1d
Module libpthread.so.0 with build-id 0b0422739722054f65f9f78c4ac441ebc21cd01e
Module libdl.so.2 with build-id f585c7c9f6babaf9ccb90a5feeb3f6902fd810c8
Module libffi.so.8 with build-id 218e198d7c786e474efd2d9b745615880c5120df
Module libp11-kit.so.0 with build-id 8166838a069281c28c7f9434827c3f73d45d5451
Module libcrypto.so.1.1 with build-id 5261d99d530924d3ff0ffdc8b67d1caece137c63
Module libcrypt.so.2 with build-id 421adaa3adb6e116dabffb7b515b0b38285d1ec8
Module libm.so.6 with build-id 03e814c990762eeb9da12de241a4f42322248e45
Module libnss_systemd.so.2 with build-id 551575f58085900ad16f5f6fc91c3e6e32358f02
Module ld-linux-armhf.so.3 with build-id 072bb4cd73afd5d62040c7f3f482dbe17719bfea
Module libc.so.6 with build-id ad84e29cae6a8880108cc3a95754d84ca22799e8
Module libgcc_s.so.1 with build-id 5dfba9be74e9275dc2b88197d5e4a7eb31caa30b
Module libz.so.1 with build-id f5e8b23636191e87948dc2c6f3c5fc2f243d9b08
Module mfsmaster with build-id e065a335744ea27b193ef18932c039c11c17aef9
Stack trace of thread 1165:
#0 0x00000000004f4960 n/a (mfsmaster + 0x26960)
ELF object binary architecture: ARM
-- Subject: Process 1165 (mfsmaster) dumped core
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
--
-- Process 1165 (mfsmaster) crashed and dumped core.
--
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.
Aug 20 01:19:05 imsal systemd[1]: systemd-coredump@4-6109-0.service: Deactivated successfully.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The unit systemd-coredump@4-6109-0.service has successfully entered the 'dead' state.
Aug 20 01:19:05 imsal kernel: audit: type=1131 audit(1660976345.738:69): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@4-6109-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=succe>
Aug 20 01:19:05 imsal kernel: dwmmc_exynos 12220000.mmc: Unexpected interrupt latency
Aug 20 01:19:05 imsal kernel: audit: type=1131 audit(1660976345.770:70): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=moosefs-master comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Aug 20 01:19:05 imsal audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@4-6109-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 20 01:19:05 imsal audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=moosefs-master comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Aug 20 01:19:05 imsal mfsmaster[1166]: background data writer - HUP/ERR detected on data pipe: EACCES (Permission denied)
Aug 20 01:19:05 imsal systemd[1]: moosefs-master.service: Main process exited, code=dumped, status=7/BUS
The text was updated successfully, but these errors were encountered:
We create those packages using a Raspberry Pi and never had a problem. The first step would be to try and compile MooseFS from source (with symbols) on the exact machine it is later run on and see, if the problem persists. If yes, then we can try to investigate what exactly seems to be the issue. If no, then it would mean those packages simply are not compatible with ODROIDs.
Are you able to compile MooseFS from source?
BTW the difference between the .112 and .116 might be simply because .112 was compiled on older OS and kernel versions.
System information
v3.0.116 installed from ArchLinux|ARM repo, running kernel 4.14.180 on ODROID-HC2 (Samsung Exynos5422 ARM Cortex-A15/Cortex-A7)
1 master and 1 metadatalogger as above, and 3-4 chunk nodes running the same mfs and OS on Espressobins (Marvell Armada 3700LP (88F37200) ARM/Cortex A53) each with 3-4 2TB disks
I know, it's a weird installation. It's my home test bed, an attempt at an ultra low power (and low cost) moosefs installation. It's been running largely without problems for almost 5 years now.
Describe the problem you observed:
After upgrading from 3.0.112 to 3.0.116 mfs-master core dumps and crashes during normal operation. I can't find a trigger, but it takes longer to crash if left idle, but only minutes if you try to write or read some files. The longest I've kept it running was leaving it idle with 0 client mounts last night and it made it about 6 hours before core dumping.
The cluster was previously running 3.0.112 without error, and downgrading the system back to that version has, so far, fixed my problem. There appears to be something introduced in the code between these versions that doesn't play well with ARM's strict unaligned data access restrictions in the architecture.
The text was updated successfully, but these errors were encountered: