Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notes on kickstarting the RISC-V software layer #552

Open
bedroge opened this issue Apr 23, 2024 · 21 comments
Open

Notes on kickstarting the RISC-V software layer #552

bedroge opened this issue Apr 23, 2024 · 21 comments
Labels

Comments

@bedroge
Copy link
Collaborator

bedroge commented Apr 23, 2024

With a compatibility layer (EESSI/compatibility-layer#204) and software build container (EESSI/filesystem-layer#132 and https://github.com/orgs/EESSI/packages/container/package/build-node) in place, we are ready to start working on a RISC-V software layer. In this issue we can keep track/notes of the work being done and issues that we encounter.

@bedroge
Copy link
Collaborator Author

bedroge commented Apr 23, 2024

The repository that we use is /cvmfs/riscv.eessi.io, added in EESSI/filesystem-layer#181. The structure is the same as in /cvmfs/software.eessi.io.

For now we first focus on generic builds (added to easybuild in easybuilders/easybuild-framework#4489). Flags for optimized builds are still lacking, see https://github.com/easybuilders/easybuild-framework/blob/develop/easybuild/toolchains/compiler/gcc.py#L82.

@bedroge
Copy link
Collaborator Author

bedroge commented Apr 23, 2024

In order to get EasyBuild installed, I've used the following:

singularity build --sandbox /nvme/build-container docker://ghcr.io/eessi/build-node:debian-sid
EESSI_CVMFS_REPO_OVERRIDE=/cvmfs/riscv.eessi.io ./eessi_container.sh -c /nvme/build-container --access rw
/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/startprefix
git clone https://github.com/EESSI/software-layer
cd software-layer
wget https://github.com/EESSI/software-layer/pull/537.diff
export EESSI_CVMFS_REPO_OVERRIDE=/cvmfs/riscv.eessi.io EESSI_VERSION_OVERRIDE=20240402 EESSI_SOFTWARE_SUBDIR_OVERRIDE=riscv64/generic
./EESSI-install-software.sh

We explicitly override some variables to reflect the new repo/version/CPU target, and then it sort of mimics what the bot would do by taking the diff file from #537 and running the install script. This worked perfectly fine. 🎉

@bedroge
Copy link
Collaborator Author

bedroge commented Apr 23, 2024

Now EasyBuild is available in the repo, one could easily start trying to build additional software interactively:

# Launch the container
EESSI_CVMFS_REPO_OVERRIDE=/cvmfs/riscv.eessi.io ./eessi_container.sh -c docker://ghcr.io/eessi/build-node:debian-sid --access rw

# Start a prefix shell in the container:
/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/startprefix

# EESSI init
export EESSI_CVMFS_REPO_OVERRIDE=/cvmfs/riscv.eessi.io EESSI_VERSION_OVERRIDE=20240402 EESSI_SOFTWARE_SUBDIR_OVERRIDE=riscv64/generic
source /cvmfs/riscv.eessi.io/versions/20240402/init/bash

# Set up EB and start a build
git clone https://github.com/EESSI/software-layer
cd software-layer
export WORKDIR=/tmp/eb
source configure_easybuild
module load EasyBuild
eb --optarch=GENERIC -r foss-2023b.eb

@bedroge
Copy link
Collaborator Author

bedroge commented Apr 23, 2024

As a first attempt, I tried building GCC 13.2.0, but that failed due to the hook that sets up a wrapper for ld. It uses config.guess to determine the system type, and this returns risc64-unknown-linux-gnu. It will then look for riscv64-unknown-linux-gnu-ld* in $EPREFIX/usr/bin, but Gentoo was built with CHOST = riscv64-pc-linux-gnu, so the binaries also use that in their filenames.

I've opened a PR at the Gentoo repo to change the CHOST: gentoo/gentoo#36353.

Meanwhile I worked around the issue by hardcoding it in the hook to:

cmd_prefix = 'riscv64-pc-linux-gnu-'

Furthermore, ld.gold has to be removed in the next line for cmd in ('ld', 'ld.gold', 'ld.bfd'):, since we don't have ld.gold in our RISC-V compat layer.

With these small changes I could successfully build GCC 13.2.0 (not ingested yet).

@bedroge
Copy link
Collaborator Author

bedroge commented Apr 24, 2024

FFTW fails due to:

checking for sinq in -lquadmath... no
configure: error: quad precision requires libquadmath for quad-precision trigonometric routines

Looks like our GCC doesn't include libquadmath, I suppose it doesn't work on RISC-V (?). This Fedora page has a message enable support for riscv64, so maybe we need GCC 14. For now we could try building FFTW without it.

edit: I was checking the FFTW easyblock, and I found that this is already disabled for Arm and PowerPC, so we should make a PR to do the same for RISC-V:
https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/f/fftw.py#L143

edit2: PR created: easybuilders/easybuild-easyblocks#3314

@bedroge
Copy link
Collaborator Author

bedroge commented Apr 25, 2024

When trying to build foss 2023b, I ran into the next issue with UCX, which has an outdated config.guess:

checking build system type... ./config.guess: unable to guess system type

This script, last modified 2013-06-10, has failed to recognize
the operating system you are using. It is advised that you
download the most up to date version of the config scripts from

  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
and
  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD

If the version you run (./config.guess) is already up to date, please
send the following data and any information you think might be
pertinent to <config-patches@gnu.org> in order to provide the needed
information to handle your system.

config.guess timestamp = 2013-06-10

uname -m = riscv64
uname -r = 5.15.0-starfive
uname -s = Linux
uname -v = #1 SMP Fri Nov 24 07:22:28 UTC 2023

/usr/bin/uname -p = unknown
/bin/uname -X     = 

hostinfo               = 
/bin/universe          = 
/usr/bin/arch -k       = 
/bin/arch              = riscv64
/usr/bin/oslevel       = 
/usr/convex/getsysinfo = 

UNAME_MACHINE = riscv64
UNAME_RELEASE = 5.15.0-starfive
UNAME_SYSTEM  = Linux
UNAME_VERSION = #1 SMP Fri Nov 24 07:22:28 UTC 2023
configure: error: cannot guess build type; you must specify one

So we need to patch this by providing a newer version of config.guess before the configure step.

edit:
I worked around the issue by using a hook that copies EB's config.guess to the UCX build dir:

        config_guess_path = self.obtain_config_guess()
        copy_file(config_guess_path, self.start_dir)

This allows the configure step to complete, but the build fails almost immediately due to:

/tmp/eb/easybuild/build/UCX/1.15.0/GCCcore-13.2.0/ucx-1.15.0/src/ucm/bistro/bistro.h:24:4: error: #error "Unsupported architecture"
   24 | #  error "Unsupported architecture"
      |    ^~~~~

edit2: looks like RISC-V support was added in UCX 1.16.0 (which was released 10 days ago).

@bedroge
Copy link
Collaborator Author

bedroge commented Apr 26, 2024

The config.guess issue would normally be solved by EB itself, but it's not happening for UCX, because that easyconfig is using a wrapper script around ./configure. This PR changes it, which should solve the issue: easybuilders/easybuild-easyconfigs#20428.

I also have a patch that backports RISC-V support into UCX 1.15.0: easybuilders/easybuild-easyconfigs#20429.

@bedroge
Copy link
Collaborator Author

bedroge commented Apr 28, 2024

Next issue: the foss 2023b toolchain has UCC 1.2.0, but RISC-V support was only added in 1.3.0: openucx/ucc#829.
The diff is quite small, so it should be easy to backport this to 1.2.0.

Edit: solved in PR easybuilders/easybuild-easyconfigs#20432.

@bedroge
Copy link
Collaborator Author

bedroge commented Apr 28, 2024

BLIS 0.9.0 fails in the configure step:

configure: automatic configuration requested.
/cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/usr/bin/ld: /tmp/eb-oaju2ohj/cc7gwuui.o: in function `main':
config_detect.c:(.text+0x2aa): undefined reference to `bli_cpuid_query_id'
collect2: error: ld returned 1 exit status
./configure: line 1212: ./auto-detect.x: No such file or directory
configure: hardware detection driver returned ''.
configure: checking configuration against contents of 'config_registry'.
configure: 'auto-detected configuration '' is NOT registered!
configure: 
configure: *** Cannot continue with unregistered configuration ''. ***
configure: 

There are some BLIS PRs related to adding RISC-V functionality, so I'll have a look at those.

@bedroge
Copy link
Collaborator Author

bedroge commented May 3, 2024

Backported RISC-V support to BLIS 0.9.0: easybuilders/easybuild-easyconfigs#20468.

OpenBLAS also built without any issues, so we're getting really close to having a full foss/2023b toolchain.

@bedroge
Copy link
Collaborator Author

bedroge commented May 3, 2024

FlexiBLAS and ScaLAPACK also installed without issues, so we now have foss/2023b!

@bedroge
Copy link
Collaborator Author

bedroge commented May 7, 2024

R 4.3.3 is now available as well. It required some (small) changes in the easyblocks/easyconfigs of Mesa, LLVM, and Java. I'll open PRs for those and list them here.

RISC-V support for Java:
easybuilders/easybuild-easyblocks#3323
easybuilders/easybuild-easyconfigs#20495

RISC-V support for Mesa:
easybuilders/easybuild-easyblocks#3324

RISC-V support for LLVM:
easybuilders/easybuild-easyblocks#3325

In order to replace the dependency on Java 11 by Java 21, I used the following hook:

def parse_hook_use_newer_java(ec, *args, **kwargs):
    if ec.name == 'R' and ec.version in ['4.3.3'] and get_cpu_family() == RISCV:
        deps = ec['dependencies']
        java_dep = None
        java_name, java_version = ('Java', '11')
        for idx, dep in enumerate(deps):
            if dep[0] == java_name and dep[1] == java_version:
                java_dep = dep
                break
        if java_dep:
            deps[idx] = ('Java', '21', '', SYSTEM)

@julianmorillo
Copy link

dlb (https://pm.bsc.es/dlb) built without issues. Attached is the corresponding tar file.
eessi-20240402-software-linux-riscv64-generic-1715088854.tar.gz

@bedroge
Copy link
Collaborator Author

bedroge commented May 14, 2024

While trying to install GROMACS, I ran into issues with its dependency SciPy-bundle, some numpy tests fail:

FAILED core/tests/test_numeric.py::TestBoolCmp::test_float - AssertionError: 
FAILED core/tests/test_umath.py::TestFPClass::test_fpclass[-4] - AssertionError: 
FAILED core/tests/test_umath.py::TestFPClass::test_fpclass[-2] - AssertionError: 
FAILED core/tests/test_umath.py::TestFPClass::test_fpclass[-1] - AssertionError: 
FAILED core/tests/test_umath.py::TestFPClass::test_fpclass[1] - AssertionError: 
FAILED core/tests/test_umath.py::TestFPClass::test_fp_noncontiguous[f] - AssertionError: 
===== 6 failed, 33239 passed, 943 skipped, 1303 deselected, 31 xfailed, 3 xpassed, 58 warnings in 1640.83s (0:27:20) =====

I found numpy/numpy#25246 which disables most of these on RISC-V, so for now I've ignored the test failures. Now GROMACS itself is failing in the test step as well:

99% tests passed, 1 tests failed out of 91

Label Time Summary:
GTest              = 759.58 sec*proc (87 tests)
IntegrationTest    = 285.44 sec*proc (30 tests)
MpiTest            = 420.83 sec*proc (23 tests)
QuickGpuTest       =  83.55 sec*proc (20 tests)
SlowGpuTest        = 493.55 sec*proc (14 tests)
SlowTest           = 392.31 sec*proc (13 tests)
UnitTest           =  81.82 sec*proc (44 tests)

Total Test time (real) = 760.26 sec

The following tests FAILED:
          2 - GmxapiMpiTests (Failed)

Full output of the failing test:

starting mdrun 'Water and methane'
4 steps,      0.0 ps (continuing from step 2,      0.0 ps).
[starfive:369549:0:369549] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
[starfive:369548:0:369558] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 369558) ====
 0  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/UCX/1.15.0-GCCcore-13.2.0/lib64/libucs.so.0(ucs_handle_error+0x1fc) [0x3f9edc8044]
 1  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/UCX/1.15.0-GCCcore-13.2.0/lib64/libucs.so.0(+0x2111e) [0x3f9edc811e]
 2  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/UCX/1.15.0-GCCcore-13.2.0/lib64/libucs.so.0(+0x21280) [0x3f9edc8280]
 3  linux-vdso.so.1(__vdso_rt_sigreturn+0) [0x3fac463800]
 4  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_Z35nbnxn_kernel_ElecRF_VdwLJ_VgrpF_refPK16NbnxnPairlistCpuPK16nbnxn_atomdata_tPK19interaction_const_tPA3_KdP23nbnxn_atomdata_output_t+0x1ebc) [0x3fab7c1b74]
 5  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(+0x2a98cc) [0x3fab7ba8cc]
 6  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GCCcore/13.2.0/lib64/libgomp.so.1(+0x19d38) [0x3fab105d38]
 7  /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(+0x6b0f4) [0x3faafe20f4]
 8  /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(+0xb6da0) [0x3fab02dda0]
=================================
[starfive:369548] *** Process received signal ***
[starfive:369548] Signal: Segmentation fault (11)
[starfive:369548] Signal code:  (-6)
[starfive:369548] Failing at address: 0x3e80005a38c
[starfive:369548] [ 0] linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x3fac463800]
[starfive:369548] [ 1] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_Z35nbnxn_kernel_ElecRF_VdwLJ_VgrpF_refPK16NbnxnPairlistCpuPK16nbnxn_atomdata_tPK19interaction_const_tPA3_KdP23nbnxn_atomdata_output_t+0x1ebc)[0x3fab7c1b74]
[starfive:369548] [ 2] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(+0x2a98cc)[0x3fab7ba8cc]
[starfive:369548] [ 3] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GCCcore/13.2.0/lib64/libgomp.so.1(+0x19d38)[0x3fab105d38]
[starfive:369548] [ 4] /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(+0x6b0f4)[0x3faafe20f4]
[starfive:369548] [ 5] /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(+0xb6da0)[0x3fab02dda0]
[starfive:369548] *** End of error message ***
==== backtrace (tid: 369549) ====
 0  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/UCX/1.15.0-GCCcore-13.2.0/lib64/libucs.so.0(ucs_handle_error+0x1fc) [0x3f7cb9c044]
 1  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/UCX/1.15.0-GCCcore-13.2.0/lib64/libucs.so.0(+0x2111e) [0x3f7cb9c11e]
 2  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/UCX/1.15.0-GCCcore-13.2.0/lib64/libucs.so.0(+0x21280) [0x3f7cb9c280]
 3  linux-vdso.so.1(__vdso_rt_sigreturn+0) [0x3f86236800]
 4  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_Z35nbnxn_kernel_ElecRF_VdwLJ_VgrpF_refPK16NbnxnPairlistCpuPK16nbnxn_atomdata_tPK19interaction_const_tPA3_KdP23nbnxn_atomdata_output_t+0x1ebc) [0x3f85594b74]
 5  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(+0x2a98cc) [0x3f8558d8cc]
 6  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GCCcore/13.2.0/lib64/libgomp.so.1(GOMP_parallel+0x38) [0x3f84ed19c4]
 7  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_ZNK18nonbonded_verlet_t23dispatchNonbondedKernelEN3gmx19InteractionLocalityERK19interaction_const_tRKNS0_12StepWorkloadEiNS0_8ArrayRefIKNS0_11BasicVectorIdEEEENS8_IdEESD_P6t_nrnb+0xd4) [0x3f8558e146]
 8  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(+0x7e3556) [0x3f85ac7556]
 9  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_Z8do_forceP8_IO_FILEPK9t_commrecPK14gmx_multisim_tRK10t_inputrecRKN3gmx18MDModulesNotifiersEPNSA_3AwhEP10gmx_enfrotPNSA_10ImdSessionEP6pull_tlP6t_nrnbP13gmx_wallcyclePK14gmx_localtop_tPA3_KdNSA_19ArrayRefWithPaddingINSA_11BasicVectorIdEEEENSA_8ArrayRefISY_EEPK9history_tPNSA_16ForceBuffersViewEPA3_dPK9t_mdatomsP14gmx_enerdata_tNS10_IST_EEP10t_forcerecRKNSA_21MdrunScheduleWorkloadEPNSA_19VirtualSitesHandlerEPddP9gmx_edsamP24CpuPpLongRangeNonbondedsRK22DDBalanceRegionHandler+0xdf0) [0x3f85ac9970]
10  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_ZN3gmx15LegacySimulator5do_mdEv+0x39da) [0x3f85bc1870]
11  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_ZN3gmx8Mdrunner8mdrunnerEv+0x6e60) [0x3f85bebcb6]
12  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgmxapi_mpi_d.so.0(_ZN6gmxapi11SessionImpl3runEv+0x18) [0x3f8621870e]
13  /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgmxapi_mpi_d.so.0(_ZN6gmxapi7Session3runEv+0xe) [0x3f86218854]
14  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/gmxapi-mpi-test() [0x2ebd2]
15  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x30) [0x3f852b17da]
16  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing4Test3RunEv+0xc2) [0x3f852a26fa]
17  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8TestInfo3RunEv+0x11c) [0x3f852a2824]
18  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing9TestSuite3RunEv+0xbc) [0x3f852a28ea]
19  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8internal12UnitTestImpl11RunAllTestsEv+0x1fa) [0x3f852ab23e]
20  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8UnitTest3RunEv+0x52) [0x3f852a2a46]
21  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/gmxapi-mpi-test() [0x26dbe]
22  /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(+0x27688) [0x3f84d71688]
23  /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(__libc_start_main+0x74) [0x3f84d71730]
24  /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/gmxapi-mpi-test() [0x26fd8]
=================================
[starfive:369549] *** Process received signal ***
[starfive:369549] Signal: Segmentation fault (11)
[starfive:369549] Signal code:  (-6)
[starfive:369549] Failing at address: 0x3e80005a38d
[starfive:369549] [ 0] linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x3f86236800]
[starfive:369549] [ 1] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_Z35nbnxn_kernel_ElecRF_VdwLJ_VgrpF_refPK16NbnxnPairlistCpuPK16nbnxn_atomdata_tPK19interaction_const_tPA3_KdP23nbnxn_atomdata_output_t+0x1ebc)[0x3f85594b74]
[starfive:369549] [ 2] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(+0x2a98cc)[0x3f8558d8cc]
[starfive:369549] [ 3] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GCCcore/13.2.0/lib64/libgomp.so.1(GOMP_parallel+0x38)[0x3f84ed19c4]
[starfive:369549] [ 4] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_ZNK18nonbonded_verlet_t23dispatchNonbondedKernelEN3gmx19InteractionLocalityERK19interaction_const_tRKNS0_12StepWorkloadEiNS0_8ArrayRefIKNS0_11BasicVectorIdEEEENS8_IdEESD_P6t_nrnb+0xd4)[0x3f8558e146]
[starfive:369549] [ 5] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(+0x7e3556)[0x3f85ac7556]
[starfive:369549] [ 6] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_Z8do_forceP8_IO_FILEPK9t_commrecPK14gmx_multisim_tRK10t_inputrecRKN3gmx18MDModulesNotifiersEPNSA_3AwhEP10gmx_enfrotPNSA_10ImdSessionEP6pull_tlP6t_nrnbP13gmx_wallcyclePK14gmx_localtop_tPA3_KdNSA_19ArrayRefWithPaddingINSA_11BasicVectorIdEEEENSA_8ArrayRefISY_EEPK9history_tPNSA_16ForceBuffersViewEPA3_dPK9t_mdatomsP14gmx_enerdata_tNS10_IST_EEP10t_forcerecRKNSA_21MdrunScheduleWorkloadEPNSA_19VirtualSitesHandlerEPddP9gmx_edsamP24CpuPpLongRangeNonbondedsRK22DDBalanceRegionHandler+0xdf0)[0x3f85ac9970]
[starfive:369549] [ 7] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_ZN3gmx15LegacySimulator5do_mdEv+0x39da)[0x3f85bc1870]
[starfive:369549] [ 8] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgromacs_mpi_d.so.9(_ZN3gmx8Mdrunner8mdrunnerEv+0x6e60)[0x3f85bebcb6]
[starfive:369549] [ 9] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgmxapi_mpi_d.so.0(_ZN6gmxapi11SessionImpl3runEv+0x18)[0x3f8621870e]
[starfive:369549] [10] /cvmfs/riscv.eessi.io/versions/20240402/software/linux/riscv64/generic/software/GROMACS/2024.1-foss-2023b/lib/libgmxapi_mpi_d.so.0(_ZN6gmxapi7Session3runEv+0xe)[0x3f86218854]
[starfive:369549] [11] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/gmxapi-mpi-test[0x2ebd2]
[starfive:369549] [12] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x30)[0x3f852b17da]
[starfive:369549] [13] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing4Test3RunEv+0xc2)[0x3f852a26fa]
[starfive:369549] [14] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8TestInfo3RunEv+0x11c)[0x3f852a2824]
[starfive:369549] [15] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing9TestSuite3RunEv+0xbc)[0x3f852a28ea]
[starfive:369549] [16] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8internal12UnitTestImpl11RunAllTestsEv+0x1fa)[0x3f852ab23e]
[starfive:369549] [17] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/../lib/libgtest.so.1.13.0(_ZN7testing8UnitTest3RunEv+0x52)[0x3f852a2a46]
[starfive:369549] [18] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/gmxapi-mpi-test[0x26dbe]
[starfive:369549] [19] /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(+0x27688)[0x3f84d71688]
[starfive:369549] [20] /cvmfs/riscv.eessi.io/versions/20240402/compat/linux/riscv64/lib64/lp64d/libc.so.6(__libc_start_main+0x74)[0x3f84d71730]
[starfive:369549] [21] /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/bin/gmxapi-mpi-test[0x26fd8]
[starfive:369549] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node starfive exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

@boegel
Copy link
Contributor

boegel commented May 16, 2024

@bedroge Could that be simply due to insufficient memory on your SiFive Unmatched board?

@bedroge
Copy link
Collaborator Author

bedroge commented May 19, 2024

@bedroge Could that be simply due to insufficient memory on your SiFive Unmatched Starfive VisionFive 2 board?

I don't know, didn't see any Killed / OOM messages.

I tried again, this time using the slightly modified easyconfig from easybuilders/easybuild-easyconfigs#20522, and then it failed in the second iteration:

Reading file /tmp/eb/easybuild/build/GROMACS/2024.1/foss-2023b/easybuild_obj/api/gmxapi/cpp/tests/Testing/Temporary/GmxApiTest_RunnerChainedMD.tpr, VERSION 2024.1-EasyBuild_4.9.1 (single precision)

-------------------------------------------------------
Program:     gmxapi-mpi-test, version 2024.1-EasyBuild_4.9.1
Source file: src/gromacs/utility/keyvaluetreeserializer.cpp (line 302)
Function:    gmx::{anonymous}::ValueSerializer::deserialize(gmx::ISerializer*)::<lambda()>
MPI rank:    0 (out of 2)

Assertion failed:
Condition: iter != s_deserializers.end()
Unknown type tag for deserializization

I don't have a clue what that's about, so I just did another attempt, and then the installation completed successfully (all tests of all four iterations passed) 🎉 🤷‍♂️

@bedroge
Copy link
Collaborator Author

bedroge commented May 21, 2024

GMP easyconfigs have precise: True in toolchainopts, but that doesn't work on RISC-V: the EB framework sets -mno-recip in this case (see https://github.com/easybuilders/easybuild-framework/blob/develop/easybuild/toolchains/compiler/gcc.py#L66C22-L66C31), but that's not supported on RISC-V. Neither on Arm, so there it's overridden to some other flags:
https://github.com/easybuilders/easybuild-framework/blob/develop/easybuild/toolchains/compiler/gcc.py#L77
But those are not available on RISC-V either. It doesn't seem like there's a good alternative, but @julianmorillo is going to check with a compiler expert. Meanwhile I tried building without precise: True, and that worked fine. Also the test step completed without issues.

@bedroge
Copy link
Collaborator Author

bedroge commented May 21, 2024

With x264 I'm running into an outdated config.guess issue once again. Here the problem is that its configure script is apparently handcrafted, and hence it doesn't contain the string that Easybuild uses to determine if this was generated with Autoconf (see https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/generic/configuremake.py#L57). If that's not there, EB will not replace the config.guess with a newer one (see https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/generic/configuremake.py#L303). So we probably have to do that manually in the easyconfig or with a hook.

edit: the same hook that I used before works fine and allows the installation to complete:

def pre_configure_hook_x264(self, *args, **kwargs):
    if self.name == 'x264' and self.version in ['20231019'] and get_cpu_architecture() == RISCV64:
        config_guess_path = self.obtain_config_guess()
        copy_file(config_guess_path, self.start_dir)

@bedroge
Copy link
Collaborator Author

bedroge commented May 21, 2024

And almost the same happens with LAME: it looks like the configure_cmd_prefix (here: https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/l/LAME/LAME-3.100-GCCcore-13.2.0.eb#L29) breaks the os.path.exists(configure_command) in the easyblock, which makes it fail to recognize that this actually is an Autoconf-generated configure script. Or is it because it's running autoreconf in preconfigopts? Either way, the config.guess still doesn't get updated, but the same hook works for this one as well.

@julianmorillo
Copy link

libdwarf-0.9.2 installed. This is the corresponding tar file to be ingested:
eessi-20240402-software-linux-riscv64-generic-1716472182.tar.gz

@boegel boegel added the riscv label May 24, 2024
@julianmorillo
Copy link

julianmorillo commented May 27, 2024

PAPI-7.1.0 and libxml2-2.9.14 installed. This is the corresponding tar file to be ingested:
eessi-20240402-software-linux-riscv64-generic-1716822680.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants