Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLNL GitLab-CI #747

Open
jedbrown opened this issue Apr 17, 2021 · 12 comments
Open

LLNL GitLab-CI #747

jedbrown opened this issue Apr 17, 2021 · 12 comments

Comments

@jedbrown
Copy link
Member

For those with LLNL CZ access, I created a mirror repository that will be able to run CI jobs. You can log in and request access if you don't already have.

Relevant documentation:

The LLNL system is set up to allow "LGTM" comments by a trusted member to launch a pipeline with jobs on LLNL machines (including quart, lassen, and corona). Some MFEM team members have experience with this setup. I don't think we need to use it for all PRs, but it'd be wonderful to set this up so it's easy to use for PRs that look like they would benefit. I'd envision using the batch executor on a single node, ideally with either MFEM or PETSc to test multi-GPU solvers. This could include short-running performance tests for longitudinal tracking.

@adrienbernede
Copy link

@jedbrown For your information: https://radiuss-ci.readthedocs.io/en/latest/
This documents a method to implement the CI for mirrored repo on LC using Uberenv to automate the installation of dependencies through Spack.
Whatever the workflow you choose, I can definitely help with the setup.

@v-dobrev
Copy link
Member

The libceed repo already has .gitlab-ci.yml with a gitlab CI setup for use outside the LLNL gitlab instance. Is there a way to distinguish what gets run on which gitlab instance?

@adrienbernede
Copy link

adrienbernede commented Apr 19, 2021

Yes there is. We can create "rules" that will distinguish based on the server name for example: CI_SERVER_HOST.

@jedbrown
Copy link
Member Author

I think I would prefer the first of these two options because all content stays in the repository.
https://ecp-ci.gitlab.io/docs/guides/multi-gitlab-project.html

@adrienbernede
Copy link

This ECP documentation is really good, and will save me considerable amount of time answering the same questions here on the lab internal docs!

@jeremylt
Copy link
Member

I like the first of those two options as well. I could see it being useful to split the scripts for each CI instance into separate files that get referenced in .gitlab-ci.yml though, for readability.

@adrienbernede
Copy link

adrienbernede commented Apr 19, 2021

@jeremylt, I am failing to see the nuance with the example given where each job is calling a different script.
Or maybe you meant that the yaml files should be split by instances (using the include keyword)?

@jeremylt
Copy link
Member

I'm just commenting that I would find it easier to see what's going if all this from our current yaml

# Compilers
    - export COVERAGE=1 CC=gcc CXX=g++ FC=gfortran HIPCC=hipcc
    - echo "-------------- nproc ---------------" && NPROC_CPU=$(nproc) && NPROC_GPU=$(($(nproc)<8?$(nproc):8))
    - echo "-------------- CC ------------------" && $CC --version
    - echo "-------------- CXX -----------------" && $CXX --version
    - echo "-------------- FC ------------------" && $FC --version
    - echo "-------------- HIPCC ---------------" && $HIPCC --version
    - echo "-------------- GCOV ----------------" && gcov --version
# Libraries for backends
# -- MAGMA from dev branch
    - echo "-------------- MAGMA ---------------"
    - export MAGMA_DIR=/projects/hipMAGMA && git -C $MAGMA_DIR describe
# -- LIBXSMM v1.16.1
    - cd .. && export XSMM_VERSION=libxsmm-1.16.1 && { [[ -d $XSMM_VERSION ]] || { git clone --depth 1 --branch 1.16.1 https://github.com/hfp/libxsmm.git $XSMM_VERSION && make -C $XSMM_VERSION -j$(nproc); }; } && export XSMM_DIR=$PWD/$XSMM_VERSION && cd libCEED
    - echo "-------------- LIBXSMM -------------" && git -C $XSMM_DIR describe --tags
# -- OCCA v1.1.0
    - cd .. && export OCCA_VERSION=occa-1.1.0 OCCA_OPENCL_ENABLED=0 && { [[ -d $OCCA_VERSION ]] || { git clone --depth 1 --branch v1.1.0 https://github.com/libocca/occa.git $OCCA_VERSION && make -C $OCCA_VERSION -j$(nproc); }; } && export OCCA_DIR=$PWD/$OCCA_VERSION && cd libCEED
    - echo "-------------- OCCA ----------------" && make -C $OCCA_DIR info
# libCEED
    - make configure HIP_DIR=/opt/rocm OPT='-O -march=native -ffp-contract=fast'
    - BACKENDS_CPU=$(make info-backends | grep -o '/cpu[^ ]*') && BACKENDS_GPU=$(make info-backends | grep -o '/gpu[^ ]*')
    - echo "-------------- libCEED -------------" && make info
    - make -j$NPROC_CPU
# -- libCEED only tests
    - echo "-------------- core tests ----------"
    - echo '[{"subject":"/","metrics":[{"name":"Transfer Size (KB)","value":"19.5","desiredSize":"smaller"},{"name":"Speed Index","value":0,"desiredSize":"smaller"},{"name":"Total Score","value":92,"desiredSize":"larger"},{"name":"Requests","value":4,"desiredSize":"smaller"}]}]' > performance.json
    - make -k -j$NPROC_CPU BACKENDS="$BACKENDS_CPU" junit realsearch=%
    - make -k -j$NPROC_GPU BACKENDS="$BACKENDS_GPU" junit realsearch=%
# Libraries for examples
# -- PETSc with HIP (minimal)
#    Note: These tests don't run in ' make junit realsearch=%' until PETSC_DIR set
#          PETSC_DIR is not set by default in GitLab runner env
    - export PETSC_DIR=/projects/petsc PETSC_ARCH=mpich-hip && git -C $PETSC_DIR describe
    - echo "-------------- PETSc ---------------" && make -C $PETSC_DIR info
    - make -k -j$NPROC_CPU BACKENDS="$BACKENDS_CPU" junit search="petsc fluids solids"
    - make -k -j$NPROC_GPU BACKENDS="$BACKENDS_GPU" junit search="petsc fluids solids"
# -- MFEM v4.2
    - cd .. && export MFEM_VERSION=mfem-4.2 && { [[ -d $MFEM_VERSION ]] || { git clone --depth 1 --branch v4.2 https://github.com/mfem/mfem.git $MFEM_VERSION && make -C $MFEM_VERSION -j$(nproc) serial CXXFLAGS="-O -std=c++11"; }; } && export MFEM_DIR=$PWD/$MFEM_VERSION && cd libCEED
    - echo "-------------- MFEM ----------------" && make -C $MFEM_DIR info
    - make -k -j$NPROC_CPU BACKENDS="$BACKENDS_CPU" junit search=mfem
    - make -k -j$NPROC_GPU BACKENDS="$BACKENDS_GPU" junit search=mfem
# -- Nek5000 v19.0
    - export COVERAGE=0
    - cd .. && export NEK5K_VERSION=Nek5000-19.0 && { [[ -d $NEK5K_VERSION ]] || { git clone --depth 1 --branch v19.0 https://github.com/Nek5000/Nek5000.git $NEK5K_VERSION && cd $NEK5K_VERSION/tools && ./maketools genbox genmap reatore2 && cd ../..; }; } && export NEK5K_DIR=$PWD/$NEK5K_VERSION && export PATH=$NEK5K_DIR/bin:$PATH MPI=0 && cd libCEED
    - echo "-------------- Nek5000 -------------" && git -C $NEK5K_DIR describe --tags
    - make -k -j$NPROC_CPU BACKENDS="$BACKENDS_CPU" junit search=nek
    - make -k -j$NPROC_GPU BACKENDS="$BACKENDS_GPU" junit search=nek
# Clang-tidy
    - echo "-------------- clang-tidy ----------" && clang-tidy --version
    - TIDY_OPTS="-fix-errors" make -j$NPROC_CPU tidy && git diff --exit-code
# Report status
    - echo "SUCCESS" > .job_status
  after_script:
    - |
      if [ $(cat .job_status) == "SUCCESS" ]; then
        lcov --directory . --capture --output-file coverage.info;
        bash <(curl -s https://codecov.io/bash) -f coverage.info -t ${CODECOV_ACCESS_TOKEN} -F interface;
        bash <(curl -s https://codecov.io/bash) -f coverage.info -t ${CODECOV_ACCESS_TOKEN} -F gallery;
        bash <(curl -s https://codecov.io/bash) -f coverage.info -t ${CODECOV_ACCESS_TOKEN} -F backends;
        bash <(curl -s https://codecov.io/bash) -f coverage.info -t ${CODECOV_ACCESS_TOKEN} -F tests;
        bash <(curl -s https://codecov.io/bash) -f coverage.info -t ${CODECOV_ACCESS_TOKEN} -F examples;
      fi

would be put in a separate bash file (and the script for the new LLNL job put into its own bash file) if we run both CI jobs off of the same yaml.

Then we'd have

.rules-noether
  rules:
  - if: '$CI_SERVER_HOST == "noether.colorado.edu"'
    when: always

.rules-llnl
  rules:
  - if: '$CI_SERVER_HOST == "gitlab.llnl.gov"'
    when: always

noether-nocm:
  stage: test
  tags:
    - rocm
  image: jedbrown/rocm:latest
  extends: .rules-noether
  script:
    - ./bash-file-with-noether-script

llnl-cuda:
  stage: test
  tags:
    - cuda
  extends: .rules-llnl
  script:
    - ./bash-file-with-llnl-script

or something along those lines

@adrienbernede
Copy link

FYI, the first method presents the drawbacks that rules can be hard to stack.
So I would not write them the way this documentation suggest:

- if: '$CI_SERVER_HOST == "gitlab.anl.gov"'
    when: always

There are 2 inaccuracies here:

  • The default behavior is "on-success" not "always". Using always would make the job run all the time, ignoring any failure in previous stages. This is not a good default behavior, so it shouldn't be used in a generic example.
  • In order to make rules "stackable" it is better to say the following:
- if: '$CI_SERVER_HOST != "gitlab.anl.gov"'
    when: never

That is because, per documentation:

Rules are evaluated in order until a match is found.

@adrienbernede
Copy link

I don't see a link to the ECP CI documentation repo, but I would like to share those thoughts with the authors.

@adrienbernede
Copy link

adrienbernede commented Apr 19, 2021

@jeremylt I agree, putting scripts in a separate file is a good practice.

@jedbrown
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants