Fix/mlnet rebased on `master` #527

PGijsbers · 2023-05-31T15:11:16Z

Rebase #522 on top of master. Did not want to force push a rebase on someone else's branch so set up a new one instead.

Assuming that any exception during `run_cmd` is unrecoverable.

PGijsbers · 2023-05-31T16:40:39Z

@LittleLittleCloud I rebased your work on the latest master, which resulted in clearer error messages.

The error I got was about an incorrect invocation of mlnet and it not expecting the --test-dataset parameter. I don't understand why it complains about the --test-dataset argument though, from the documentation it seems like a valid argument.

However, I also have a question about the usage of --test-dataset here. It reads to me like the --test-dataset will actually be used to perform model selection (e.g., train a bunch of models on --dataset, then select the best based on results on --test-dataset). Is that correct? If so, that is not in line with the benchmark design. Model selection should happen without knowledge of the test dataset. I think the correct usage is for MLNet to be invoked with --dataset only, and consequently do model selection based on an internally chosen validation set (e.g., k-fold cv). Then the test dataset is only provided for the mlnet predict command. I applied this "fix" on this branch. Please let me know if I understand the usage correctly and if this seems like the right (new) invocation.

Anyway, even after the change there seems to be a failures. On github I get error code 143 (likely memory), on AWS I get the following:

[INFO] [openml.datasets.dataset:15:59:47.186] pickle write APSFailure
[INFO] [frameworks.MLNet.exec:15:59:54.601] train dataset: /s3bucket/input/org/openml/www/datasets/41138/dataset_train_0.csv
[INFO] [frameworks.MLNet.exec:15:59:54.601] test dataset: /s3bucket/input/org/openml/www/datasets/41138/dataset_test_0.csv
[INFO] [amlb.utils.process:15:59:54.601] Running cmd `/repo/frameworks/MLNet/lib/mlnet classification --dataset /s3bucket/input/org/openml/www/datasets/41138/dataset_train_0.csv --train-time 600 --label-col 0 --output /tmp/tmp3_b7uplj --name 0 --verbosity diag --log-file-
path /tmp/tmp3_b7uplj/0/log.txt`
[INFO] [amlb.print:15:59:54.972] Set log level to Trace
[INFO] [amlb.print:15:59:54.972] Set log file path to /tmp/tmp3_b7uplj/0/log.txt
[INFO] [amlb.print:15:59:59.934] Start Training
[INFO] [amlb.print:16:00:00.019] start multiclass classification
[INFO] [amlb.print:16:00:00.021] Evaluate Metric: MacroAccuracy
[INFO] [amlb.print:16:00:00.021] Available Trainers: LGBM,FASTFOREST,FASTTREE,LBFGS,SDCA
[INFO] [amlb.print:16:00:00.023] Training time in second: 600
[INFO] [amlb.print:16:00:12.420] |      Trainer                             MacroAccuracy Duration    |
[INFO] [amlb.print:16:00:12.420] |--------------------------------------------------------------------|
[INFO] [amlb.print:16:00:12.422] |0     FastTreeOva                         0.6877     12.1490        |
[INFO] [amlb.print:16:00:12.430] found best trial - trial id: 0
[INFO] [amlb.print:16:00:28.029] |1     FastForestOva                       0.8392     15.5940        |
[INFO] [amlb.print:16:00:28.032] found best trial - trial id: 1
[INFO] [amlb.print:16:00:58.398] |2     LbfgsMaximumEntropyMulti            0.8336     30.3630        |
... 
<manually removed remainder of training output> 
...
[INFO] [amlb.print:16:09:59.936] [Source=AutoMLExperiment, Kind=Info] cancel training because cancellation token is invoked...
[INFO] [amlb.print:16:09:59.941] |--------------------------------------------------------------------|
[INFO] [amlb.print:16:09:59.943] |                          Experiment Results                        |
[INFO] [amlb.print:16:09:59.943] |--------------------------------------------------------------------|
[INFO] [amlb.print:16:09:59.946] |                               Summary                              |
[INFO] [amlb.print:16:09:59.946] |--------------------------------------------------------------------|
[INFO] [amlb.print:16:09:59.949] |ML Task: multiclass classification                                  |
[INFO] [amlb.print:16:09:59.949] |Dataset: /s3bucket/input/org/openml/www/datasets/41138/dataset_train_0.csv|
[INFO] [amlb.print:16:09:59.951] |Label : class                                                       |
[INFO] [amlb.print:16:09:59.954] |Total experiment time :   599.0000 Secs                             |
[INFO] [amlb.print:16:09:59.954] |Total number of models explored: 19                                 |
[INFO] [amlb.print:16:09:59.958] |--------------------------------------------------------------------|
[INFO] [amlb.print:16:09:59.958] |                        Top 5 models explored                       |
[INFO] [amlb.print:16:09:59.961] |--------------------------------------------------------------------|
[INFO] [amlb.print:16:09:59.961] |      Trainer                             MacroAccuracy Duration    |
[INFO] [amlb.print:16:09:59.963] |--------------------------------------------------------------------|
[INFO] [amlb.print:16:09:59.965] |12    FastTreeOva                         0.8784     22.6530        |
[INFO] [amlb.print:16:09:59.966] |9     FastTreeOva                         0.8504     14.7040        |
[INFO] [amlb.print:16:09:59.968] |5     LbfgsLogisticRegressionOva          0.8413     58.8520        |
[INFO] [amlb.print:16:09:59.968] |17    FastForestOva                       0.8394     16.8820        |
[INFO] [amlb.print:16:09:59.971] |1     FastForestOva                       0.8392     15.5940        |
[INFO] [amlb.print:16:09:59.971] |--------------------------------------------------------------------|
[INFO] [amlb.print:16:10:00.080] save 0.mbconfig to /tmp/tmp3_b7uplj/0
[INFO] [amlb.print:16:10:01.453] Generating a console project for the best pipeline at location : /tmp/tmp3_b7uplj/0
[INFO] [amlb.print:16:10:01.474] 
[INFO] [amlb.print:16:10:01.474] 
[INFO] [amlb.print:16:10:01.475] 
[INFO] [amlb.utils.process:16:10:01.476] Running cmd `/repo/frameworks/MLNet/lib/mlnet predict --task-type classification --model /tmp/tmp3_b7uplj/0/0.zip --dataset /s3bucket/input/org/openml/www/datasets/41138/dataset_test_0.csv --label-col class > /tmp/tmp3_b7uplj/0/prediction.txt`
[ERROR] [amlb.utils.process:16:10:01.849] File does not exist: /tmp/tmp3_b7uplj/0/0.zip

ps. Is there a way to influence the inference time of the final models? E.g., a constraint or preset which favors faster models, or having inference time explicitly as secondary objective? If we do get the opportunity to benchmark MLNet, we might be interested of evaluating the system with different trade-offs.

LittleLittleCloud · 2023-05-31T18:02:41Z

Hey Pieter As for your concern on `--test-dataset`. The `--test-dataset` will be used to evaluate selected model after AutoML finished. So it won’t be participated in model selection. If `--test-dataset` is not provided, the model will be evaluated on validation dataset which is subsampled from `--dataset` The error looks like because the model file is not found? Is there a way for me to confirm the model file is generated and located under `/tmp/tmp3_b7uplj/0/0.zip`? Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows From: Pieter ***@***.***> Sent: Wednesday, May 31, 2023 9:40 AM To: ***@***.***> Cc: XiaoYun ***@***.***>; ***@***.***> Subject: Re: [openml/automlbenchmark] Fix/mlnet rebased on `master` (PR #527) @LittleLittleCloud<https://github.com/LittleLittleCloud> I rebased your work on the latest master, which resulted in clearer error messages. The error I got<https://github.com/openml/automlbenchmark/actions/runs/5134583469/jobs/9238807788#step:7:211> was about an incorrect invocation of mlnet and it not expecting the --test-dataset parameter. I don't understand why it complains about the --test-dataset argument though, from the documentation<https://learn.microsoft.com/en-us/dotnet/machine-learning/reference/ml-net-cli-reference#test-dataset> it seems like a valid argument. However, I also have a question about the usage of --test-dataset here. It reads to me like the --test-dataset will actually be used to perform model selection (e.g., train a bunch of models on --dataset, then select the best based on results on --test-dataset). Is that correct? If so, that is not in line with the benchmark design. Model selection should happen without knowledge of the test dataset. I think the correct usage is for MLNet to be invoked with --dataset only, and consequently do model selection based on an internally chosen validation set (e.g., k-fold cv). Then the test dataset is only provided for the mlnet predict command. I applied this "fix" on this branch. Please let me know if I understand the usage correctly and if this seems like the right (new) invocation. Anyway, even after the change there seems to be a failures. On github I get error code 143 (likely memory), on AWS I get the following: [INFO] [openml.datasets.dataset:15:59:47.186] pickle write APSFailure [INFO] [frameworks.MLNet.exec:15:59:54.601] train dataset: /s3bucket/input/org/openml/www/datasets/41138/dataset_train_0.csv [INFO] [frameworks.MLNet.exec:15:59:54.601] test dataset: /s3bucket/input/org/openml/www/datasets/41138/dataset_test_0.csv [INFO] [amlb.utils.process:15:59:54.601] Running cmd `/repo/frameworks/MLNet/lib/mlnet classification --dataset /s3bucket/input/org/openml/www/datasets/41138/dataset_train_0.csv --train-time 600 --label-col 0 --output /tmp/tmp3_b7uplj --name 0 --verbosity diag --log-file- path /tmp/tmp3_b7uplj/0/log.txt` [INFO] [amlb.print:15:59:54.972] Set log level to Trace [INFO] [amlb.print:15:59:54.972] Set log file path to /tmp/tmp3_b7uplj/0/log.txt [INFO] [amlb.print:15:59:59.934] Start Training [INFO] [amlb.print:16:00:00.019] start multiclass classification [INFO] [amlb.print:16:00:00.021] Evaluate Metric: MacroAccuracy [INFO] [amlb.print:16:00:00.021] Available Trainers: LGBM,FASTFOREST,FASTTREE,LBFGS,SDCA [INFO] [amlb.print:16:00:00.023] Training time in second: 600 [INFO] [amlb.print:16:00:12.420] | Trainer MacroAccuracy Duration | [INFO] [amlb.print:16:00:12.420] |--------------------------------------------------------------------| [INFO] [amlb.print:16:00:12.422] |0 FastTreeOva 0.6877 12.1490 | [INFO] [amlb.print:16:00:12.430] found best trial - trial id: 0 [INFO] [amlb.print:16:00:28.029] |1 FastForestOva 0.8392 15.5940 | [INFO] [amlb.print:16:00:28.032] found best trial - trial id: 1 [INFO] [amlb.print:16:00:58.398] |2 LbfgsMaximumEntropyMulti 0.8336 30.3630 | ... <manually removed remainder of training output> ... [INFO] [amlb.print:16:09:59.936] [Source=AutoMLExperiment, Kind=Info] cancel training because cancellation token is invoked... [INFO] [amlb.print:16:09:59.941] |--------------------------------------------------------------------| [INFO] [amlb.print:16:09:59.943] | Experiment Results | [INFO] [amlb.print:16:09:59.943] |--------------------------------------------------------------------| [INFO] [amlb.print:16:09:59.946] | Summary | [INFO] [amlb.print:16:09:59.946] |--------------------------------------------------------------------| [INFO] [amlb.print:16:09:59.949] |ML Task: multiclass classification | [INFO] [amlb.print:16:09:59.949] |Dataset: /s3bucket/input/org/openml/www/datasets/41138/dataset_train_0.csv| [INFO] [amlb.print:16:09:59.951] |Label : class | [INFO] [amlb.print:16:09:59.954] |Total experiment time : 599.0000 Secs | [INFO] [amlb.print:16:09:59.954] |Total number of models explored: 19 | [INFO] [amlb.print:16:09:59.958] |--------------------------------------------------------------------| [INFO] [amlb.print:16:09:59.958] | Top 5 models explored | [INFO] [amlb.print:16:09:59.961] |--------------------------------------------------------------------| [INFO] [amlb.print:16:09:59.961] | Trainer MacroAccuracy Duration | [INFO] [amlb.print:16:09:59.963] |--------------------------------------------------------------------| [INFO] [amlb.print:16:09:59.965] |12 FastTreeOva 0.8784 22.6530 | [INFO] [amlb.print:16:09:59.966] |9 FastTreeOva 0.8504 14.7040 | [INFO] [amlb.print:16:09:59.968] |5 LbfgsLogisticRegressionOva 0.8413 58.8520 | [INFO] [amlb.print:16:09:59.968] |17 FastForestOva 0.8394 16.8820 | [INFO] [amlb.print:16:09:59.971] |1 FastForestOva 0.8392 15.5940 | [INFO] [amlb.print:16:09:59.971] |--------------------------------------------------------------------| [INFO] [amlb.print:16:10:00.080] save 0.mbconfig to /tmp/tmp3_b7uplj/0 [INFO] [amlb.print:16:10:01.453] Generating a console project for the best pipeline at location : /tmp/tmp3_b7uplj/0 [INFO] [amlb.print:16:10:01.474] [INFO] [amlb.print:16:10:01.474] [INFO] [amlb.print:16:10:01.475] [INFO] [amlb.utils.process:16:10:01.476] Running cmd `/repo/frameworks/MLNet/lib/mlnet predict --task-type classification --model /tmp/tmp3_b7uplj/0/0.zip --dataset /s3bucket/input/org/openml/www/datasets/41138/dataset_test_0.csv --label-col class > /tmp/tmp3_b7uplj/0/prediction.txt` [ERROR] [amlb.utils.process:16:10:01.849] File does not exist: /tmp/tmp3_b7uplj/0/0.zip ps. Is there a way to influence the inference time of the final models? E.g., a constraint or preset which favors faster models, or having inference time explicitly as secondary objective? If we do get the opportunity to benchmark MLNet, we might be interested of evaluating the system with different trade-offs. — Reply to this email directly, view it on GitHub<#527 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEAYLOSJSHCLSVIU2I6XMEDXI5YBDANCNFSM6AAAAAAYVTXHNM>. You are receiving this because you were mentioned.Message ID: ***@***.***>

PGijsbers · 2023-06-01T07:39:33Z

The error looks like because the model file is not found? Is there a way for me to confirm the model file is generated and located under /tmp/tmp3_b7uplj/0/0.zip?

You might be able to test the integration script locally (I currently cannot) and poke around. From my understanding, it should also be possible to log in to the EC2 instance to verify the existence of the file, but unfortunately I'm too busy to do that myself the next few days.

LittleLittleCloud and others added 8 commits May 31, 2023 17:03

Update setup.sh

00e1d80

Update exec.py

b5680b9

Update exec.py

a47bbb3

Update exec.py

5b240dc

Update exec.py

a4b3820

Raise the exception after logging the log

da43a69

Assuming that any exception during `run_cmd` is unrecoverable.

Add additional logging information, changed live output to line

561c25d

Don't provide test data as validation data

0962fff

PGijsbers mentioned this pull request Jun 1, 2023

Included lightautoml in frameworks_stable #412

Merged

PGijsbers added external For issues which will be fixed by an external party framework For issues with frameworks in the current benchmark labels Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/mlnet rebased on `master` #527

Fix/mlnet rebased on `master` #527

PGijsbers commented May 31, 2023

PGijsbers commented May 31, 2023

LittleLittleCloud commented May 31, 2023 via email

PGijsbers commented Jun 1, 2023

Fix/mlnet rebased on master #527

Are you sure you want to change the base?

Fix/mlnet rebased on master #527

Conversation

PGijsbers commented May 31, 2023

PGijsbers commented May 31, 2023

LittleLittleCloud commented May 31, 2023 via email

PGijsbers commented Jun 1, 2023

Fix/mlnet rebased on `master` #527

Fix/mlnet rebased on `master` #527