28 create simplest running azure batch kick off script #30

natemcintosh · 2024-03-18T21:41:36Z

Initial take on this, based on discussion here.

Note that this skeleton calls some function which do not yet exist, so pre-commit will fail on ruff check

This tests that calls are made to `client.add_task()`. Still need to add tests that check more specifically what is called, but this is a good start

natemcintosh · 2024-03-19T17:31:05Z

Still to do:

Determine a pool name. E.g. multisignal-epi-inference
Provide actual implementations of
- create_docker_cmd()
- create_mdl_cfg_filename()
- create_pp_cfg_filename()

The hard part is figuring out exactly how to name the model configuration files, and the post production configuration files. I think that part will depend on the geographies that the models are running on, and the dates. Once that part is better defined, I think this will be easy to do.

ghost · 2024-03-22T13:34:12Z

Hey @natemcintosh, what's the plan for this? Do you need anything from me?

natemcintosh · 2024-03-22T14:05:15Z

Hi George, I've mostly been working on setting up the gold tables for the NSSP pull.

The list of tasks above is pretty much what still needs to be done. If you have any input on the best way to name the configuration files (they each need to have a unique name for when we upload to Azure blob storage), that would be helpful.

The other major improvement that could be made is to improve the testing. Right now test_local_e2e() only tests that at least one call to client.add_task() is made. It would be better if we could verify the correct number of calls is made, with the correct arguments, and that the the other client methods are called correctly as well.

ghost · 2024-03-22T14:11:12Z

I'll think about names and start planning on an e2e example

ghost · 2024-03-22T17:21:09Z

Naming

For naming, I think we need to consider the following:

The version control system is with us, so there is no need to have the versioning part of the name (only date, maybe).
I like the pattern where the sub-files follow the name of the main file and add a suffix, e.g., flu-and-covid-winter2025.json -> (flu-and-covid-winter2025-all-us.json, flu-and-covid-winter2025-state1-and-state2.json, etc.)
I think we should have a way to sort files based on creation date, e.g., 00-flu-and-covid-winter2025, 01-flu-winter2025, 02-flu-and-covid-winter2025-pooled.

Idea: Name files using both the date and a descriptive handle, e.g., YYYY-mm-dd_[handle]+.json for the main file, and YYYY-mm-dd_[handle].[sub handle].json for the sub-files; [handle] and [sub handle] should be [:alnum:]+ so it's easier to parse files.

Another thought, since the number of files will be large, perhaps each project should be within a folder named YYYY-mm-dd_[handle], and under the hood, have:

YYYY-mm-dd_[handle 1]/
  - `00-config.json`
  - `01-log-[sub handle 1].log`
  - `01-log-[sub handle 2].log`
  - ...
  - `02-model-[sub handle 1].json`
  - `02-model-[sub handle 2].json`
  - ...
  - `03-post-[sub handle 1].json`
  - `03-post-[sub handle 2].json`
  - ...
YYYY-mm-dd_[handle 2]/
  - `00-config.json`
  - `01-log-[sub handle 1].log`
  - `01-log-[sub handle 2].log`
  - ...
  - `02-model-[sub handle 1].json`
  - `02-model-[sub handle 2].json`
  - ...
  - `03-post-[sub handle 1].json`
  - `03-post-[sub handle 2].json`
  - ...

`test_local_e2e()`

It could make sure that the job, whatever it is, saves the right information (or a message) to a logger, which could be the 01-log-[sub handle].log.

natemcintosh · 2024-04-18T20:30:53Z

@gvegayon I like the idea of having top level folder with a date and "run" name, e.g. YYYY-mm-dd_flu_and_covid, and then various files associated with this run live under it. Perhaps the run name could be a third top level field in the large input config file along with model and post_production.

Something we have been doing at the top level of the Rt pipeline is using two storage containers: /input/ and /output/. If we continued this practice, there would be a YYYY-mm-dd_flu_and_covid in both the input and output storage containers. I believe cfa-azure currently supports this structure.

However, if we think it would be more advantageous to store both input and output under the same run folder (which I think is what you are suggesting above?), it would certainly make it easier to see both the inputs and outputs for a model run in one place. I'm not 100% sure that cfa-azure currently supports bind mounting to a single storage container, but I'm sure it won't be too hard to figure out. If we do it this way, perhaps we could have a structure like

YYYY-mm-dd_[handle 1]/
  - input/
    - `00-config.json`
    - `01-model-[sub handle 1].json`
    - `01-model-[sub handle 2].json`
    - `02-post-[sub handle 1].json`
    - `02-post-[sub handle 2].json`
    - ...
  - output/
    - `log-[sub handle 1].log`
    - `log-[sub handle 2].log`
    - ...
YYYY-mm-dd_[handle 2]/
  - input/
    - `00-config.json`
    - `01-model-[sub handle 1].json`
    - `01-model-[sub handle 2].json`
    - `02-post-[sub handle 1].json`
    - `02-post-[sub handle 2].json`
    - ...
  - output/
    - `log-[sub handle 1].log`
    - `log-[sub handle 2].log`
    - ...

What do you think?

…off-script

codecov · 2024-05-20T16:33:09Z

Codecov Report

Attention: Patch coverage is 84.12698% with 20 lines in your changes are missing coverage. Please review.

Project coverage is 90.89%. Comparing base (45deb0d) to head (b09fd5c).

Files	Patch %	Lines
pipeline/pipeline/submit_main.py	76.47%	20 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #30      +/-   ##
==========================================
- Coverage   92.24%   90.89%   -1.35%     
==========================================
  Files          34       34              
  Lines         683      802     +119     
==========================================
+ Hits          630      729      +99     
- Misses         53       73      +20

Flag	Coverage Δ
unittests	`90.89% <84.12%> (-1.35%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

natemcintosh and others added 15 commits March 14, 2024 22:05

initial makefile

a33e43d

updated pyproject

7ca179d

Beginnings of main

9efb45e

give better name

75bb6a9

set up logging and arg parsing

8cbaba2

skeleton of submit script

3eaf2e9

Note that this skeleton calls some function which do not yet exist, so pre-commit will fail on ruff check

do a better job of handling edge cases

072ac45

add test make command

022182d

fix bug in catching missing fields

ac9966b

break up main() into smaller functions

e66ab62

first attempt at testing

49e5fb3

make pre-commit happy

6aa1fc4

add emtpy toml fixture

532be11

no longer need placeholders

37dfe84

Basic working local e2e test

a94a3c6

This tests that calls are made to `client.add_task()`. Still need to add tests that check more specifically what is called, but this is a good start

natemcintosh linked an issue Mar 18, 2024 that may be closed by this pull request

Create simplest, running, Azure Batch kick-off script #28

Open

9 tasks

Add proper pool name

11006de

natemcintosh added 3 commits May 17, 2024 10:36

update to latest cfa-azure

c6afe1c

Merge branch 'main' into 28-create-simplest-running-azure-batch-kick-…

abaaa69

…off-script

check that all expected calls come in

5027291

make precommit happy

b09fd5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

28 create simplest running azure batch kick off script #30

28 create simplest running azure batch kick off script #30

natemcintosh commented Mar 18, 2024

natemcintosh commented Mar 19, 2024 •

edited

ghost commented Mar 22, 2024

natemcintosh commented Mar 22, 2024

ghost commented Mar 22, 2024

ghost commented Mar 22, 2024 •

edited by ghost

natemcintosh commented Apr 18, 2024

codecov bot commented May 20, 2024 •

edited

28 create simplest running azure batch kick off script #30

Are you sure you want to change the base?

28 create simplest running azure batch kick off script #30

Conversation

natemcintosh commented Mar 18, 2024

natemcintosh commented Mar 19, 2024 • edited

Still to do:

ghost commented Mar 22, 2024

natemcintosh commented Mar 22, 2024

ghost commented Mar 22, 2024

ghost commented Mar 22, 2024 • edited by ghost

Naming

test_local_e2e()

natemcintosh commented Apr 18, 2024

codecov bot commented May 20, 2024 • edited

Codecov Report

natemcintosh commented Mar 19, 2024 •

edited

ghost commented Mar 22, 2024 •

edited by ghost

`test_local_e2e()`

codecov bot commented May 20, 2024 •

edited