New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved integration tests #575
Conversation
now config.yaml location can be defined with custom value (env var) integration tests use a new /tmp/ folder storage with every run After every command db is backed up to /tmp/%dir%/snapshots/ for easy debug Every command is printed before execution
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
cli/cli_chestxray_tutorial_test.sh
Outdated
echo "==========================================" | ||
echo "Creating test profiles for each user" | ||
echo "==========================================" | ||
medperf profile activate local | ||
ev medperf profile activate local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ev
is a custom function defined in tests_setup: it echoes command before execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a more descriptive name for this? Or an easy way to identify where this function is defined and what is it used for? I can see myself getting confused by this in the future. Maybe eval
or print_eval
. Might be longer but I'd prefer readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a more descriptive name for this? Maybe
eval
orprint_eval
. Might be longer but I'd prefer readability.
Sure:) IMO ev
is more readable though. I pushed the fix (renamed ev
to print_eval
). Plz look and let me know - if it is better for you, let's keep it longish.
cli/cli_chestxray_tutorial_test.sh
Outdated
echo "We have to return the content back manually, so new tmp folder is empty" | ||
echo "and all the existing data is kept on the old place." | ||
mkdir -p ~/.medperf | ||
ev mv -f $MEDPERF_STORAGE/.medperf/* ~/.medperf/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a crutch: without it
(1) new tmp storage is spoiled with existing data
(2) it might be deleted after test run, so deleting any files you might want to keep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intention is to keep test files in case of debugging or similar, I'd prefer to instead specify the location to store them as an argument instead of merging it with the default medperf storage. This is because there's a chance the test files would collide with other files someone might be working on (e.g. I have to do a demonstration of medperf and I'm using the local profile and server for that).
Alternatively, have a flag that disables deleting the tmp folder, and not letting tests run if that folder isn't empty.
I personally prefer tests artifacts to be isolated from the real medperf storage/server if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disregard my message above, I misunderstood this section given the snippet of code commented and the accompanying message. This is only executed when moving the storage failed for some reason right? And it's just to restore the moved files to their original locations. Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No:)
medperf storage move
makes two essential actions:
- updates all paths in config to point to a new folder
- actually moves the content of
~/.medperf/
to/tmp/medperf_tests_XXXYYY/storage/.medperf/
While (1) is what we want for tests (remember, we have a separate temporary config file while running tests), (2) is eww. Because really we want to work with a new clean empty storage folder while running tests (and, and, in particular, to know it's can be safely deleted at the end).
So, this crutch reverts (2) and moves storage content back to ~/.medperf/
.
medperf profile activate default | ||
medperf profile delete testbenchmark | ||
medperf profile delete testmodel | ||
medperf profile delete testdata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to fix list of profiles as whole test root would be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Slava, this is looking great!
I just left a few suggestions and questions. Other than that I think it's looking good!
cli/cli_chestxray_tutorial_test.sh
Outdated
echo "==========================================" | ||
echo "Creating test profiles for each user" | ||
echo "==========================================" | ||
medperf profile activate local | ||
ev medperf profile activate local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a more descriptive name for this? Or an easy way to identify where this function is defined and what is it used for? I can see myself getting confused by this in the future. Maybe eval
or print_eval
. Might be longer but I'd prefer readability.
@@ -44,7 +45,7 @@ | |||
# Storage config | |||
config_storage = Path.home().resolve() / ".medperf_config" | |||
logs_storage = Path.home().resolve() / ".medperf_logs" | |||
config_path = str(config_storage / "config.yaml") | |||
config_path = getenv("MEDPERF_CONFIG_PATH", str(config_storage / "config.yaml")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining this env var is now required for all medperf installations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if env is not defined, default value is used (str(config_storage / "config.yaml")
). getenv
works similar to dict.get()
method
ev() { | ||
local timestamp=$(date +%m%d%H%M%S) | ||
local formatted_cmd=$(echo "$@" | sed 's/[^a-zA-Z0-9]\+/_/g' | cut -c 1-50) | ||
LAST_SNAPSHOT_PATH="$SNAPSHOTS_FOLDER/${timestamp}_${formatted_cmd}.sqlite3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we end up clutteraing this folder indefinitely if we're not cleaning it consistently? Would there be a way of defining a max amount of snapshots that can exist at a given time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly
tmp/
|-- medperf_tests_20240409140527/
|--|-- snapshots/ # here all the snapshots for that run are stored
|--|--|--....
|--|-- storage/
|--|--|-- .medperf/
|--|--|--|-- ....
|--|-- config.yaml
So, this snapshots folder belongs to a specific tests run, not mixing with snapshots from other runs. No need to manage especially snapshots. If you don't need this test results anymore (or you use -c
/CLEAN
flag), you delete a whole medperf_tests_XXXYYY
.
However by default -c
is set up to False
. So, we are cluttering /tmp/
folder with a bunch of medperf_tests_XXXYYY
.
What do you think guys if we ultimately remove test root folder if tests were passed successfully? And oldish crap after failed tests we can remove manually.
Or I can implement some kind of logic "on every run delete medperf_tests_XXXYYY
that are older than a month", though it looks like overengineering to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VukW man, that's great! this makes lives a lot easier :)
one small suggestion is to move the Changing storage to tmp location
bash code into setup since it is common, but that's not necessary!
Thanks for this!
Hi @hasan7n thanks, you are right, moved config&storage init to Also updated "storage moving": instead of real moving files back and forth (that may be error-prone if broken on the mid) we explicitly point config pathes to a newly created empty tmp folder. That is safer IMO |
@hasan7n fixed, ready for re-review:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great man!
NB: despite of it simplifies tests rerun, one still need to reset db storage manually before tests run:
./reset_db.sh && python seed.py