Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting SLURM env vars for launching MAPDL configuration #2754

Draft
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

germa89
Copy link
Collaborator

@germa89 germa89 commented Feb 8, 2024

As the title.

The idea is that on SLURM HPC clusters, PyMAPDL will read the SLURM job through the env vars that the SLURM manager creates, so it can launch MAPDL with the appropriate number of cores.

You can see in the code:

options = [
            # 4,  # Fall back option
            SLURM_CPUS_PER_TASK * SLURM_NTASKS,  # (CPUs)
            SLURM_NPROCS,  # (CPUs)
            # SLURM_NTASKS,  # (tasks) Not necessary the number of CPUs,
            # SLURM_NNODES * SLURM_TASKS_PER_NODE * SLURM_CPUS_PER_TASK,  # (CPUs)
            SLURM_CPUS_ON_NODE * SLURM_NNODES,  # (cpus)
        ]
nproc = max(options)

which is the way I use to decide how many cores should MAPDL instance use.

Same with memory:

        if SLURM_MEM_PER_NODE:
            # RAM argument is in MB, so we need to convert
            if SLURM_MEM_PER_NODE[-1] == "T":  # tera
                ram = int(SLURM_MEM_PER_NODE[:-1]) * 10**6
            elif SLURM_MEM_PER_NODE[-1] == "G":  # giga
                ram = int(SLURM_MEM_PER_NODE[:-1]) * 10**3
            elif SLURM_MEM_PER_NODE[-1] == "G":  # mega
                ram = int(SLURM_MEM_PER_NODE[:-1]) * 10**0
            elif SLURM_MEM_PER_NODE[-1].upper() == "k":  # mega
                ram = int(SLURM_MEM_PER_NODE[:-1]) * 10 ** (-3)
            else:
                ram = int(SLURM_MEM_PER_NODE)

I do not use the MAPDL -machines argument to specify the machines used. I don't think it is needed/compatible.

@germa89 germa89 added the New Feature Request or proposal for a new feature label Feb 8, 2024
@germa89 germa89 self-assigned this Feb 8, 2024
@ansys-reviewer-bot
Copy link
Contributor

Thanks for opening a Pull Request. If you want to perform a review write a comment saying:

@ansys-reviewer-bot review

@github-actions github-actions bot added Documentation Documentation related (improving, adding, etc) Enhancement Improve any current implemented feature labels Feb 8, 2024
germa89 and others added 2 commits February 9, 2024 15:55
Co-authored-by: Maxime Rey <87315832+MaxJPRey@users.noreply.github.com>
@koubaa
Copy link
Contributor

koubaa commented Mar 8, 2024

cc @greschd . This might be related to your interest in launcher configuration

@wiz-inc-572fc38784
Copy link
Contributor

wiz-inc-572fc38784 bot commented Mar 14, 2024

Wiz Scan Summary

IaC Misconfigurations 0C 0H 0M 0L 0I
Vulnerabilities 0C 0H 0M 0L 0I
Sensitive Data 0C 0H 0M 0L 0I
Total 0C 0H 0M 0L 0I
Secrets 0🔑

@codecov-commenter
Copy link

codecov-commenter commented Apr 2, 2024

Codecov Report

Attention: Patch coverage is 74.72527% with 23 lines in your changes are missing coverage. Please review.

Project coverage is 84.07%. Comparing base (598e2e8) to head (84d4bc7).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2754      +/-   ##
==========================================
- Coverage   86.64%   84.07%   -2.57%     
==========================================
  Files          52       52              
  Lines        9553     9632      +79     
==========================================
- Hits         8277     8098     -179     
- Misses       1276     1534     +258     

@germa89 germa89 mentioned this pull request Apr 4, 2024
@germa89 germa89 marked this pull request as ready for review May 13, 2024 16:24
@germa89 germa89 marked this pull request as draft May 13, 2024 16:25
@@ -1165,10 +1165,13 @@ These are described in the following table:
| | export PYMAPDL_MAPDL_VERSION=22.2 |
| | |
+---------------------------------------+---------------------------------------------------------------------+
| :envvar:`PYMAPDL_ON_SLURM` | With this environment variable set to ``FALSE``, you can avoid |
| | PyMAPDL to detect that it is running on a SLURM HPC cluster. |
Copy link
Member

@PipKat PipKat May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| | PyMAPDL to detect that it is running on a SLURM HPC cluster. |
| | PyMAPDL from detecting that it is running on a SLURM HPC cluster. |

Sorry for messing up your table formatting with this needed correction!

Comment on lines 1173 to 1174
| | PRNSOL or NLIST, raise this. In bytes, |
| | defaults to 256 MB. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| | PRNSOL or NLIST, raise this. In bytes, |
| | defaults to 256 MB. |
| | PRNSOL or NLIST, raise this. The default message |
| | length in bytes is 256 MB. |

@@ -1165,10 +1165,13 @@ These are described in the following table:
| | export PYMAPDL_MAPDL_VERSION=22.2 |
| | |
+---------------------------------------+---------------------------------------------------------------------+
| :envvar:`PYMAPDL_ON_SLURM` | With this environment variable set to ``FALSE``, you can avoid |
| | PyMAPDL to detect that it is running on a SLURM HPC cluster. |
+---------------------------------------+---------------------------------------------------------------------+
| :envvar:`PYMAPDL_MAX_MESSAGE_LENGTH` | Maximum gRPC message length. If your |
| | connection terminates when running |
| | PRNSOL or NLIST, raise this. In bytes, |
| | defaults to 256 MB. |
| | |
| | Only for developing purposes. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a complete sentence.

# Unless the env var is false, it will be true.
ON_SLURM = not (ON_SLURM.lower() == "false")

# Let's require the following env vars to exists to go into slurm mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Let's require the following env vars to exists to go into slurm mode.
# Let's require the following env vars to exist to go into slurm mode.

ON_SLURM = True # Using this as main variable
else:
ON_SLURM = False

if remove_temp_files is not None:
warnings.warn(
"The option ``remove_temp_files`` is being deprecated and it will be removed by PyMAPDL version 0.66.0.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The option ``remove_temp_files`` is being deprecated and it will be removed by PyMAPDL version 0.66.0.\n"
"The ``remove_temp_files`` option is being deprecated. It is to be removed in PyMAPDL version 0.66.0.\n"

I'm not sure if you were trying to avoid being specific by saying "by PyMAPDL version 0.66.0" because it could be before then? Perhaps "It is to be removed sometime before PyMAPDL version 0.66.0."

raise NotEnoughResources
if machine_cores < int(nproc):
raise NotEnoughResources(
f"The machine has {machine_cores} cores and PyMAPDL is asking for {nproc} cores."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"The machine has {machine_cores} cores and PyMAPDL is asking for {nproc} cores."
f"The machine has {machine_cores} cores. PyMAPDL is asking for {nproc} cores."


except Exception as e:
LOG.info(
f"The machines list could not be obtained.\nFollowing error happened:\n{str(e)}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"The machines list could not be obtained.\nFollowing error happened:\n{str(e)}"
f"The machines list could not be obtained.\nThis error occurred:\n{str(e)}"

@@ -455,6 +455,31 @@ def run_before_and_after_tests_2(request, mapdl):
assert prev == mapdl.is_local


@pytest.fixture(scope="function")
def set_env_var(request, monkeypatch):
"""Set an environment variable from given requests, this fixture must be used with `parametrize`"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Set an environment variable from given requests, this fixture must be used with `parametrize`"""
"""Set an environment variable from given requests. This fixture must be used with ``parametrize``."""


@pytest.fixture(scope="function")
def set_env_var_context(request, monkeypatch):
"""Set MY_VARIABLE environment variable, this fixture must be used with `parametrize`"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Set MY_VARIABLE environment variable, this fixture must be used with `parametrize`"""
"""Set ``MY_VARIABLE`` environment variable. This fixture must be used with ``parametrize``."""

id="Testing NPROCS only",
),
pytest.param(
# This test does not do probably a good memory mapping between
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# This test does not do probably a good memory mapping between
# This test probably does not do a good memory mapping between

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Documentation related (improving, adding, etc) Enhancement Improve any current implemented feature New Feature Request or proposal for a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants