Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kestrel #405

Merged
merged 21 commits into from Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/bug_report.md
Expand Up @@ -29,8 +29,8 @@ log file here
```

**Platform (please complete the following information):**
- Simulation platform: [e.g. Eagle, AWS, local docker; please label with this as well]
- BuildStockBatch version, branch, or sha:
- Simulation platform: [e.g. Kestrel, Eagle, AWS, local docker; please label with this as well]
- BuildStockBatch version, branch, or sha:
- resstock or comstock repo version, branch, or sha:
- Local Desktop OS: [e.g. Windows, Mac, Linux, especially important if running locally]

Expand Down
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE/pull_request_template.md
Expand Up @@ -14,5 +14,5 @@ Not all may apply
- [ ] All other unit and integration tests passing
- [ ] Update validation for project config yaml file changes
- [ ] Update existing documentation
- [ ] Run a small batch run on Eagle to make sure it all works if you made changes that will affect Eagle
- [ ] Run a small batch run on Kestrel/Eagle to make sure it all works if you made changes that will affect Kestrel/Eagle
- [ ] Add to the changelog_dev.rst file and propose migration text in the pull request
3 changes: 2 additions & 1 deletion .gitignore
Expand Up @@ -5,6 +5,7 @@ __pycache__
docs/_build
dask-worker-space
venv/*
.venv
*.DS_Store
*.zip
.pytest_cache
Expand All @@ -15,4 +16,4 @@ coverage/
.coverage
build/
.env
.history
.history
2 changes: 1 addition & 1 deletion buildstockbatch/base.py
Expand Up @@ -3,7 +3,7 @@
"""
buildstockbatch.base
~~~~~~~~~~~~~~~
This is the base class mixed into the deployment specific classes (i.e. eagle, local)
This is the base class mixed into the deployment specific classes (i.e. kestrel, local)

:author: Noel Merket
:copyright: (c) 2018 by The Alliance for Sustainable Energy
Expand Down
2 changes: 1 addition & 1 deletion buildstockbatch/eagle.sh
Expand Up @@ -12,4 +12,4 @@ df -h
module load conda singularity-container
source activate "$MY_CONDA_ENV"

time python -u -m buildstockbatch.eagle "$PROJECTFILE"
time python -u -m buildstockbatch.hpc eagle "$PROJECTFILE"
2 changes: 1 addition & 1 deletion buildstockbatch/eagle_postprocessing.sh
Expand Up @@ -29,4 +29,4 @@ pdsh -w $SLURM_JOB_NODELIST_PACK_GROUP_1 "df -i; df -h"
$MY_CONDA_ENV/bin/dask scheduler --scheduler-file $SCHEDULER_FILE &> $OUT_DIR/dask_scheduler.out &
pdsh -w $SLURM_JOB_NODELIST_PACK_GROUP_1 "$MY_CONDA_ENV/bin/dask worker --scheduler-file $SCHEDULER_FILE --local-directory /tmp/scratch/dask --nworkers ${NPROCS} --nthreads 1 --memory-limit ${MEMORY}MB" &> $OUT_DIR/dask_workers.out &

time python -u -m buildstockbatch.eagle "$PROJECTFILE"
time python -u -m buildstockbatch.hpc eagle "$PROJECTFILE"
324 changes: 205 additions & 119 deletions buildstockbatch/eagle.py → buildstockbatch/hpc.py
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed this file from eagle.py ➡️ hpc.py.

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions buildstockbatch/kestrel.sh
@@ -0,0 +1,16 @@
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --tmp=1000000
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line tells slurm to give us a node with /tmp/scratch.


echo "Job ID: $SLURM_JOB_ID"
echo "Hostname: $HOSTNAME"
echo "QOS: $SLURM_JOB_QOS"

df -i
df -h

module load python apptainer
source "$MY_PYTHON_ENV/bin/activate"
Comment on lines +13 to +14
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll notice I abandoned conda as our python package and environment manager. There was too much trouble between it and pip when installing buildstockbatch. I opted to go with the system installed python (3.11) and use a venv.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to figure out if we actually still used ruby native outside of the container but it looks like not...


time python -u -m buildstockbatch.hpc kestrel "$PROJECTFILE"
34 changes: 34 additions & 0 deletions buildstockbatch/kestrel_postprocessing.sh
@@ -0,0 +1,34 @@
#!/bin/bash
#SBATCH --tmp=1000000
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line tells slurm to give us nodes with /tmp/scratch. Yes, we're using them in the postprocessing. Dask likes to dump extra stuff to disk and we tell it to go there.


echo "begin kestrel_postprocessing.sh"

echo "Job ID: $SLURM_JOB_ID"
echo "Hostname: $HOSTNAME"

df -i
df -h

module load python apptainer
source "$MY_PYTHON_ENV/bin/activate"

export POSTPROCESS=1

echo "UPLOADONLY: ${UPLOADONLY}"
echo "MEMORY: ${MEMORY}"
echo "NPROCS: ${NPROCS}"

SCHEDULER_FILE=$OUT_DIR/dask_scheduler.json

echo "head node"
echo $SLURM_JOB_NODELIST_PACK_GROUP_0
echo "workers"
echo $SLURM_JOB_NODELIST_PACK_GROUP_1

pdsh -w $SLURM_JOB_NODELIST_PACK_GROUP_1 "free -h"
pdsh -w $SLURM_JOB_NODELIST_PACK_GROUP_1 "df -i; df -h"

$MY_PYTHON_ENV/bin/dask scheduler --scheduler-file $SCHEDULER_FILE &> $OUT_DIR/dask_scheduler.out &
pdsh -w $SLURM_JOB_NODELIST_PACK_GROUP_1 "$MY_PYTHON_ENV/bin/dask worker --scheduler-file $SCHEDULER_FILE --local-directory /tmp/scratch/dask --nworkers ${NPROCS} --nthreads 1 --memory-limit ${MEMORY}MB" &> $OUT_DIR/dask_workers.out &
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't working for me when the python environment was on /kfs2/shared-projects/envs. There are some permissions things messed up there. Like the groups weren't being passed down to the compute nodes or something. Supposedly they're working on it. I recommend creating your virtualenv on /scratch or /projects for testing.


time python -u -m buildstockbatch.hpc kestrel "$PROJECTFILE"
2 changes: 1 addition & 1 deletion buildstockbatch/sampler/base.py
Expand Up @@ -40,7 +40,7 @@ def __init__(self, parent):
"""
Create the buildstock.csv file required for batch simulations using this class.

Multiple sampling methods are available to support local & eagle analyses, as well as to support multiple\
Multiple sampling methods are available to support local & hpc analyses, as well as to support multiple\
sampling strategies. Currently there are separate implementations for commercial & residential stock types\
due to unique requirements created by the commercial tsv set.

Expand Down
3 changes: 2 additions & 1 deletion buildstockbatch/schemas/v0.3.yaml
Expand Up @@ -6,6 +6,7 @@ weather_files_url: str(required=False)
sampler: include('sampler-spec', required=True)
workflow_generator: include('workflow-generator-spec', required=True)
eagle: include('hpc-spec', required=False)
kestrel: include('hpc-spec', required=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add a kestrel key like you have the eagle one to your project file. Adjust the number of jobs and file locations and stuff. It's all the same structure and format, though.

aws: include('aws-spec', required=False)
output_directory: regex('^(.*\/)?[a-z][a-z0-9_]*\/?$', required=True)
sys_image_dir: str(required=False)
Expand Down Expand Up @@ -48,7 +49,7 @@ hpc-spec:
hpc-postprocessing-spec:
time: int(required=True)
n_workers: int(min=1, max=32, required=False)
node_memory_mb: enum(85248, 180224, 751616, required=False)
node_memory_mb: int(min=85248, max=751616, required=False)
n_procs: int(min=1, max=36, required=False)
parquet_memory_mb: int(min=100, max=4096, required=False)

Expand Down
12 changes: 6 additions & 6 deletions buildstockbatch/test/test_eagle.py
Expand Up @@ -8,14 +8,14 @@
from unittest.mock import patch
import gzip

from buildstockbatch.eagle import user_cli, EagleBatch
from buildstockbatch.hpc import user_cli, EagleBatch
from buildstockbatch.base import BuildStockBatchBase
from buildstockbatch.utils import get_project_configuration, read_csv

here = os.path.dirname(os.path.abspath(__file__))


@patch("buildstockbatch.eagle.subprocess")
@patch("buildstockbatch.hpc.subprocess")
def test_hpc_run_building(mock_subprocess, monkeypatch, basic_residential_project_file):
tar_filename = (
pathlib.Path(__file__).resolve().parent / "test_results" / "simulation_output" / "simulations_job0.tar.gz"
Expand Down Expand Up @@ -93,9 +93,9 @@ def test_hpc_run_building(mock_subprocess, monkeypatch, basic_residential_projec


@patch("buildstockbatch.base.BuildStockBatchBase.validate_options_lookup")
@patch("buildstockbatch.eagle.EagleBatch.validate_output_directory_eagle")
@patch("buildstockbatch.eagle.EagleBatch.validate_singularity_image_eagle")
@patch("buildstockbatch.eagle.subprocess")
@patch("buildstockbatch.hpc.EagleBatch.validate_output_directory_eagle")
@patch("buildstockbatch.hpc.EagleBatch.validate_singularity_image_eagle")
@patch("buildstockbatch.hpc.subprocess")
def test_user_cli(
mock_subprocess,
mock_validate_singularity_image_eagle,
Expand Down Expand Up @@ -167,7 +167,7 @@ def test_user_cli(
assert "0" == mock_subprocess.run.call_args[1]["env"]["MEASURESONLY"]


@patch("buildstockbatch.eagle.subprocess")
@patch("buildstockbatch.hpc.subprocess")
def test_qos_high_job_submit(mock_subprocess, basic_residential_project_file, monkeypatch):
mock_subprocess.run.return_value.stdout = "Submitted batch job 1\n"
mock_subprocess.PIPE = None
Expand Down
4 changes: 2 additions & 2 deletions buildstockbatch/test/test_validation.py
Expand Up @@ -17,7 +17,7 @@
import tempfile
import json
import pathlib
from buildstockbatch.eagle import EagleBatch
from buildstockbatch.hpc import EagleBatch
from buildstockbatch.local import LocalBatch
from buildstockbatch.base import BuildStockBatchBase, ValidationError
from buildstockbatch.test.shared_testing_stuff import (
Expand Down Expand Up @@ -379,7 +379,7 @@ def test_validate_singularity_image_eagle(mocker, basic_residential_project_file
with open(temp_yml, "w") as f:
yaml.dump(cfg, f, Dumper=yaml.SafeDumper)
with pytest.raises(ValidationError, match=r"image does not exist"):
EagleBatch.validate_singularity_image_eagle(str(temp_yml))
EagleBatch.validate_singularity_image_hpc(str(temp_yml))


def test_validate_sampler_good_buildstock(basic_residential_project_file):
Expand Down
36 changes: 36 additions & 0 deletions create_kestrel_env.sh
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To install, it defaults to /shared-projects so you'll want to override that. Also we're using python venv now for environments instead of conda, so activating is a little different.

module load git # yes, really
git clone git@github.com:NREL/buildstockbatch.git
cd buildstockbatch
git checkout kestrel
mkdir -p /scratch/$USER/envs
./create_kestrel_env.sh -e /scratch/$USER/envs -d mybsb
source /scratch/$USER/envs/mybsb/bin/activate
buildstock_kestrel path/to/project_file.yml

@@ -0,0 +1,36 @@
#!/bin/bash

DEV=0
while getopts de: option
do
case "${option}"
in
d) DEV=1;;
e) PYTHON_ENVS_DIR=${OPTARG};;
esac
done

if [ -z "$PYTHON_ENVS_DIR" ]
then
PYTHON_ENVS_DIR=/kfs2/shared-projects/buildstock/envs
fi

MY_PYTHON_ENV_NAME=${@:$OPTIND:1}
if [ -z "$MY_PYTHON_ENV_NAME" ]
then
echo "Environment name not provided"
exit 1
fi

MY_PYTHON_ENV="$PYTHON_ENVS_DIR/$MY_PYTHON_ENV_NAME"
echo "Creating $MY_PYTHON_ENV"
module load python
python -m venv --clear --upgrade-deps --prompt "$MY_PYTHON_ENV_NAME" "$MY_PYTHON_ENV"
source "$MY_PYTHON_ENV/bin/activate"
which pip
if [ $DEV -eq 1 ]
then
pip install --no-cache-dir -e ".[dev]"
else
pip install --no-cache-dir .
fi
3 changes: 2 additions & 1 deletion setup.py
Expand Up @@ -65,7 +65,8 @@
entry_points={
"console_scripts": [
"buildstock_local=buildstockbatch.local:main",
"buildstock_eagle=buildstockbatch.eagle:user_cli",
"buildstock_eagle=buildstockbatch.hpc:eagle_cli",
"buildstock_kestrel=buildstockbatch.hpc:kestrel_cli",
Comment on lines +68 to +69
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a separate buildstock_kestrel cli.

"buildstock_aws=buildstockbatch.aws.aws:main",
]
},
Expand Down