Trainer refactoring #66

justusschock · 2019-02-25T10:19:14Z

This is a first draft to refactor trainer (combine code where possible) and introduce a predictor.

The metric logging was moved from the networks closure to the trainer.

@ORippler : maybe we could also merge experiments and rename the AbstractTrainer to BaseTrainer since it is not abstract anymore?

Also some tf tests fail due to shape missmatch. Have you any idea why?

@mibaumgartner : does this match your idea of a predictor? Do you have any improvements compared to your own prototype?

fix #39
fix #46 ?

EDIT: Docstrings are still missing

delira/training/predictor.py

delira/training/abstract_trainer.py

delira/training/metrics.py

ORippler

will investigate shape errors tomorrow. Furthermore, experiment.stratified_kfold_predict has to be adapted (might also be related to the shape errors)

delira/training/predictor.py

…/delira into trainer_refactoring

justusschock · 2019-06-03T08:30:03Z

I just added two things to the Predictor:

The conversion of the batchdict from native tensors to numpy (since the batchdict may be converted inplace during predict)
A as_generator option, which yields the results of each batch instead of adding them to the list and returning the whole list afterwards. This should be useful if we only want to predict, since for huge datasets (especially if 3D) tasks like segmentation might cause OutOfMemoryErrors even if we only store predictions in memory and this way they can be processed/saved on a per-batch basis

EDIT:
I failed with the generator part, but fixed it now... The problem with generators is, that every function, that contains a yield in it's body returns a generator name (even if the actual yield is never triggered). This prevents us from using a solution which either returns a generator or the complete predictions and metrics. Because I think, we should definitely support this kind of behavior, b510ffb makes the predictor.predict_data_mgr always return a generator object, which is of size 1 if the lazy_gen flag is False.
13d3adb updates the experiment.test function accordingly

…king

justusschock · 2019-06-04T10:04:36Z

I think we're almost good to go. The only remaining issue is the one with tensorflow not finding some resources in our tests. According to this issue it is most likely due to a thread that is spawned anywhere... Any Ideas on why this happens now and didn't happen before?

Make shallow copy of batchdict to retain keys, which might get popped in `prepare_batch`

justusschock · 2019-06-05T18:39:08Z

The trainer is now good to be merged. The only failing test is python 3.7 which is due to trixi dependencies but completely unrelated to this PR

mibaumgartner

Implementation looks good 👍
We should include the Predictor inside the training.rst so it is displayed correctly in the documentation.
I think that should be a quick fix, so i approve the changes.

justusschock · 2019-06-05T19:06:48Z

The file already existed, I just forgot to include it into the root file. Done now

justusschock added 2 commits February 22, 2019 17:26

Start refactoring trainer and experiment

0ce953a

make new torch trainer and experiment working

722cd55

justusschock added doing labels Feb 25, 2019

justusschock added this to the Release 0.4.0 milestone Feb 25, 2019

justusschock self-assigned this Feb 25, 2019

justusschock added this to ToDo in PyTorch via automation Feb 25, 2019

justusschock added this to ToDo in Tensorflow via automation Feb 25, 2019

justusschock requested review from mibaumgartner and ORippler February 25, 2019 10:19

justusschock marked this pull request as ready for review February 25, 2019 10:50

mibaumgartner reviewed Feb 25, 2019

View reviewed changes

delira/training/predictor.py Outdated Show resolved Hide resolved

mibaumgartner reviewed Feb 25, 2019

View reviewed changes

delira/training/predictor.py Outdated Show resolved Hide resolved

mibaumgartner reviewed Feb 25, 2019

View reviewed changes

delira/training/abstract_trainer.py Outdated Show resolved Hide resolved

mibaumgartner reviewed Feb 25, 2019

View reviewed changes

delira/training/metrics.py Outdated Show resolved Hide resolved

justusschock added 3 commits February 26, 2019 14:44

Merge branch 'master' into trainer_refactoring

0ee1c08

Update pytorch_trainer.py

1dfa7ce

Update pytorch_trainer.py

ec0c11a

PyTorch automation moved this from ToDo to Doing Mar 8, 2019

ORippler suggested changes Mar 8, 2019

View reviewed changes

delira/training/predictor.py Outdated Show resolved Hide resolved

delira/training/predictor.py Outdated Show resolved Hide resolved

Tensorflow automation moved this from ToDo to Doing Mar 8, 2019

ORippler reviewed Mar 8, 2019

View reviewed changes

delira/training/predictor.py Outdated Show resolved Hide resolved

mibaumgartner and others added 8 commits March 12, 2019 00:01

Merge branch 'trainer_refactoring' of https://github.com/justusschock…

a4b8743

…/delira into trainer_refactoring

First version of auc metric implementation

151a3df

Resolve DImension Error

da4f474

Move _is_better_val_score from predictor to abstract trainer

71387c7

Fix error in torchvision datasets due to latest torchvision release

17478ce

Fix error due to missing brackets for file extension checks

1a8cca0

Add docstrings

e467013

Update pytorch_trainer docstrings

f23b52a

justusschock mentioned this pull request Jun 3, 2019

WIP: Mxnet backend #119

Closed

4 tasks

justusschock and others added 4 commits June 3, 2019 15:26

fix generator behavior

b510ffb

make experiment.test return the first generator item

13d3adb

Merge branch 'master' into trainer_refactoring

22c7113

add kwargs to overwritten predict_data_mgr functions

d123c1e

justusschock changed the base branch from master to parallel_master June 4, 2019 08:04

justusschock and others added 9 commits June 4, 2019 10:39

merge Parallel_master

7007cd7

PEP-8 Auto-Fix

bc32d5c

Merge GAN

b63587c

Merge PEP-8 Autofix

3325ba8

Add style fixes and common function to search for previous checkpoints

2e264f5

fix infinite recursion by hard type checking instead of instance chec…

4ef61c9

…king

fix pep8

27fd0e4

Remove TrixiExperiment as Experiment Baseclass

bcee731

correct indent in tf trainer

ac1c6e9

justusschock and others added 3 commits June 5, 2019 11:27

shallow copy

6b9ac0d

Make shallow copy of batchdict to retain keys, which might get popped in `prepare_batch`

initialize uninitialized members in TfExeriment.Test

12ac946

remove param argument from test for TfExperiment.test

5077864

justusschock requested a review from mibaumgartner June 5, 2019 18:39

mibaumgartner approved these changes Jun 5, 2019

View reviewed changes

add predictor to docs

74313e1

justusschock merged commit 69080a8 into parallel_master Jun 5, 2019

PyTorch automation moved this from Doing to Done Jun 5, 2019

Tensorflow automation moved this from Doing to Done Jun 5, 2019

justusschock deleted the trainer_refactoring branch June 5, 2019 19:07

justusschock mentioned this pull request Jun 5, 2019

Add trainer refactoring into master #121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trainer refactoring #66

Trainer refactoring #66

justusschock commented Feb 25, 2019 •

edited

ORippler left a comment

justusschock commented Jun 3, 2019 •

edited

justusschock commented Jun 4, 2019

justusschock commented Jun 5, 2019

mibaumgartner left a comment

justusschock commented Jun 5, 2019

Trainer refactoring #66

Trainer refactoring #66

Conversation

justusschock commented Feb 25, 2019 • edited

ORippler left a comment

Choose a reason for hiding this comment

justusschock commented Jun 3, 2019 • edited

justusschock commented Jun 4, 2019

justusschock commented Jun 5, 2019

mibaumgartner left a comment

Choose a reason for hiding this comment

justusschock commented Jun 5, 2019

justusschock commented Feb 25, 2019 •

edited

justusschock commented Jun 3, 2019 •

edited