CI: split tests-examples #990

Borda · 2020-03-01T10:27:02Z

What does this PR do?

This is resolving #942 and follow-up work for #986
Also fixes random crashes with CUDA core
moved model's test to one folder

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Borda · 2020-03-01T23:17:45Z

integrating fix of actions/checkout#23

Borda · 2020-03-01T23:36:28Z

failing tests because of pytorch/pytorch#20030

awaelchli · 2020-03-14T21:15:29Z

Looks like the tests related to ddp and amp are failing. Could it be that it is because you changed some of the numbers in tests (max_epoch, min_acc, ...)?

Borda · 2020-03-14T21:47:34Z

Looks like the tests related to ddp and amp are failing. Could it be that it is because you changed some of the numbers in tests (max_epoch, min_acc, ...)?

I have increased the run to get better acc but I did not touch the functionalities...
@luiscape was looking at the GPU CI and his thought is dome driver error :{

awaelchli · 2020-03-15T10:30:41Z

did you already try to run the tests without changing the hyperparameters? maybe ddp/amp gives slightly different numbers that don't match the new threshold for acc.

Borda · 2020-03-15T11:35:01Z

just working on it... lol

Borda · 2020-03-16T22:32:01Z

@luiscape @PyTorchLightning/core-contributors any idea why GPU test crashes on:

>           ret = torch.addmm(bias, input, weight.t())
--
2284 | E           RuntimeError: CUDA error: no kernel image is available for execution on the device

nut when I run it on the same machine with the mase image it passes...

Borda · 2020-03-17T13:30:14Z

@jeremyjordan have you seen this, any suggestion?

pep8speaks · 2020-03-19T14:46:55Z

Hello @Borda! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-24 20:39:00 UTC

codecov · 2020-03-19T16:11:50Z

Codecov Report

Merging #990 into master will not change coverage by %.
The diff coverage is n/a.

@@          Coverage Diff          @@
##           master   #990   +/-   ##
=====================================
  Coverage      91%    91%           
=====================================
  Files          61     61           
  Lines        3153   3153           
=====================================
  Hits         2879   2879           
  Misses        274    274

Borda · 2020-03-19T18:03:36Z

@awaelchli @williamFalcon @luiscape ^^ pls

pl_examples/basic_examples/lightning_module_template.py

requirements-extra.txt

tests/models/utils.py

awaelchli · 2020-03-20T03:16:10Z

Awesome! Tests finally pass!

williamFalcon · 2020-03-24T13:17:14Z

pl_examples/basic_examples/lightning_module_template.py

+        ... )
+        >>> from argparse import Namespace
+        >>> hparams = Namespace(**params)
+        >>> model = LightningTemplateModel(hparams)


you can just pass in a dict now... no need to cast as Namespace

here it is used in init to get some values to do here has to be Namespace
so I will prepare follow-up pr with this change to have hparams setter/getter to accept

mergify · 2020-03-24T18:56:18Z

This pull request is now in conflict... :(

williamFalcon · 2020-03-24T18:56:24Z

@Borda this PR is huge lol. was there a merge or something? can we make it smaller?

Borda · 2020-03-24T19:50:30Z

@williamFalcon the splitting test and examples are quite complex, I could move a few smaller steps elsewhere but this stays huge anyway... and it would take quite some time to debug and duplicate some fixes...

Borda · 2020-03-24T20:57:41Z

tests/models/__init__.py

@@ -1,57 +0,0 @@
-"""Models for testing."""


this is just moving old tests.models to tests.base

tullie

This is hard to review because lots of small changes but lgtm

jeremyjordan · 2020-03-25T02:33:56Z

tests/base/utils.py

@@ -5,11 +5,11 @@
 import numpy as np
 import torch

-from pl_examples import LightningTemplateModel
+# from pl_examples import LightningTemplateModel


it'd be nice to not check in commented out code

jeremyjordan · 2020-03-25T02:34:27Z

tests/base/utils.py

@@ -21,7 +21,7 @@


 def run_model_test_no_loggers(trainer_options, model, min_acc=0.50):
-    save_dir = trainer_options['default_save_path']
+    # save_dir = trainer_options['default_save_path']


same comment here, it'd be nice to not check in commented out code

jeremyjordan · 2020-03-25T02:38:18Z

tests/models/test_cpu.py

@@ -114,7 +114,7 @@ def test_running_test_after_fitting(tmpdir):
    trainer.test()

    # test we have good test accuracy
-    tutils.assert_ok_model_acc(trainer)
+    tutils.assert_ok_model_acc(trainer, thr=0.35)


do we want to define a DESIRED_THRESHOLD var in the base file and import that? rather than referencing a float throughout the tests

* Example: Simple RL example using DQN/Lightning * DQN RL Agent using Lightning * Uses Iterable Dataset for Replay Buffer * Buffer is populated by agent as training is carried out, updating the dataset * Applied autopep8 fixes * * Updated line length from 120 to 110 * Update pl_examples/domain_templates/dqn.py simplify get_device method Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pl_examples/domain_templates/dqn.py Re-ordered imports Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * CI: split tests-examples (#990) * CI: split tests-examples * tests without template * comment depends * CircleCI typo * add doctest * update test req. * CI tests * setup macOS * longer train * lover pred acc * fix model * rename default model * lower tests acc * typo * imports * fix test optimizer * update calls * fix Win * lower Drone image * fix call * pytorch image * fix test * add dev image * add dev image * update image * drone volume * lint * update test notes * rename tests/models >> tests/base * group models * conftest * optim imports * typos * fix import * fix tests * install AMP * tests * fix import * Clean up * added module docstring * renamed variables to be more descriptive * Added missing docstrings and type annotations * Added gym to example requirements * Added note to changelog * updated example image * update types * rename script * Update CHANGELOG.md Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * another rename * Disable validation when val_percent_check=0 (#1251) * fix disable validation * add test * update changelog * update docs for val_percent_check * make "fast training" docs consistent * calling self.forward() -> self() (#1211) * self.forward() -> self() * update changelog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Fix requirements-extra.txt Trains package to release version (#1229) * Fix requirement-extra use released Trains package * Update README.md add Trains and links to the external Visualization section Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Remove unnecessary parameters to super() in documentation and source code (#1240) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update deprecation warning (#1258) * update docs for progress bat values (#1253) * lower timeouts for inactive issues (#1250) * update contrib list (#1241) Co-authored-by: William Falcon <waf2107@columbia.edu> * Fix outdated docs (#1227) * Fix typo (#1224) * drop unused Tox (#1242) * system info (#1234) * system info * update big info * test script * update config * rename script * import path * Changed smoothing in tqdm to decrease variability of time remaining between training / eval (#1194) * Example: Simple RL example using DQN/Lightning * DQN RL Agent using Lightning * Uses Iterable Dataset for Replay Buffer * Buffer is populated by agent as training is carried out, updating the dataset * Applied autopep8 fixes * * Updated line length from 120 to 110 * Update pl_examples/domain_templates/dqn.py simplify get_device method Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pl_examples/domain_templates/dqn.py Re-ordered imports Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Clean up * added module docstring * renamed variables to be more descriptive * Added missing docstrings and type annotations * Added gym to example requirements * Added note to changelog * update types * rename script * Update CHANGELOG.md Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * another rename Co-authored-by: Donal Byrne <Donal.Byrne@xperi.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Adrian Wälchli <adrian.waelchli@students.unibe.ch> Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com> Co-authored-by: Martin.B <51887611+bmartinn@users.noreply.github.com> Co-authored-by: Tyler Yep <tyep@stanford.edu> Co-authored-by: Shunta Komatsu <59395084+skmatz@users.noreply.github.com> Co-authored-by: Jack Pertschuk <jackpertschuk@gmail.com>

* CI: split tests-examples * tests without template * comment depends * CircleCI typo * add doctest * update test req. * CI tests * setup macOS * longer train * lover pred acc * fix model * rename default model * lower tests acc * typo * imports * fix test optimizer * update calls * fix Win * lower Drone image * fix call * pytorch image * fix test * add dev image * add dev image * update image * drone volume * lint * update test notes * rename tests/models >> tests/base * group models * conftest * optim imports * typos * fix import * fix tests * install AMP * tests * fix import

* Example: Simple RL example using DQN/Lightning * DQN RL Agent using Lightning * Uses Iterable Dataset for Replay Buffer * Buffer is populated by agent as training is carried out, updating the dataset * Applied autopep8 fixes * * Updated line length from 120 to 110 * Update pl_examples/domain_templates/dqn.py simplify get_device method Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pl_examples/domain_templates/dqn.py Re-ordered imports Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * CI: split tests-examples (Lightning-AI#990) * CI: split tests-examples * tests without template * comment depends * CircleCI typo * add doctest * update test req. * CI tests * setup macOS * longer train * lover pred acc * fix model * rename default model * lower tests acc * typo * imports * fix test optimizer * update calls * fix Win * lower Drone image * fix call * pytorch image * fix test * add dev image * add dev image * update image * drone volume * lint * update test notes * rename tests/models >> tests/base * group models * conftest * optim imports * typos * fix import * fix tests * install AMP * tests * fix import * Clean up * added module docstring * renamed variables to be more descriptive * Added missing docstrings and type annotations * Added gym to example requirements * Added note to changelog * updated example image * update types * rename script * Update CHANGELOG.md Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * another rename * Disable validation when val_percent_check=0 (Lightning-AI#1251) * fix disable validation * add test * update changelog * update docs for val_percent_check * make "fast training" docs consistent * calling self.forward() -> self() (Lightning-AI#1211) * self.forward() -> self() * update changelog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Fix requirements-extra.txt Trains package to release version (Lightning-AI#1229) * Fix requirement-extra use released Trains package * Update README.md add Trains and links to the external Visualization section Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Remove unnecessary parameters to super() in documentation and source code (Lightning-AI#1240) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update deprecation warning (Lightning-AI#1258) * update docs for progress bat values (Lightning-AI#1253) * lower timeouts for inactive issues (Lightning-AI#1250) * update contrib list (Lightning-AI#1241) Co-authored-by: William Falcon <waf2107@columbia.edu> * Fix outdated docs (Lightning-AI#1227) * Fix typo (Lightning-AI#1224) * drop unused Tox (Lightning-AI#1242) * system info (Lightning-AI#1234) * system info * update big info * test script * update config * rename script * import path * Changed smoothing in tqdm to decrease variability of time remaining between training / eval (Lightning-AI#1194) * Example: Simple RL example using DQN/Lightning * DQN RL Agent using Lightning * Uses Iterable Dataset for Replay Buffer * Buffer is populated by agent as training is carried out, updating the dataset * Applied autopep8 fixes * * Updated line length from 120 to 110 * Update pl_examples/domain_templates/dqn.py simplify get_device method Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pl_examples/domain_templates/dqn.py Re-ordered imports Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Clean up * added module docstring * renamed variables to be more descriptive * Added missing docstrings and type annotations * Added gym to example requirements * Added note to changelog * update types * rename script * Update CHANGELOG.md Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * another rename Co-authored-by: Donal Byrne <Donal.Byrne@xperi.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Adrian Wälchli <adrian.waelchli@students.unibe.ch> Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com> Co-authored-by: Martin.B <51887611+bmartinn@users.noreply.github.com> Co-authored-by: Tyler Yep <tyep@stanford.edu> Co-authored-by: Shunta Komatsu <59395084+skmatz@users.noreply.github.com> Co-authored-by: Jack Pertschuk <jackpertschuk@gmail.com>

Borda added feature Is an improvement or enhancement ci Continuous Integration labels Mar 1, 2020

Borda requested a review from a team March 1, 2020 18:29

Borda changed the title ~~CI: split tests-examples [wip]~~ CI: split tests-examples Mar 1, 2020

Borda marked this pull request as ready for review March 1, 2020 21:03

Borda mentioned this pull request Mar 1, 2020

Add MNIST dataset & drop torchvision dep. from tests #986

Merged

5 tasks

awaelchli mentioned this pull request Mar 4, 2020

fix MNIST download #1044

Merged

Borda force-pushed the master branch from a07849b to 21ac476 Compare March 6, 2020 20:38

Borda added this to the 0.7.2 milestone Mar 12, 2020

Borda requested review from neggert and williamFalcon March 19, 2020 18:03

awaelchli reviewed Mar 19, 2020

View reviewed changes

pl_examples/basic_examples/lightning_module_template.py Outdated Show resolved Hide resolved

requirements-extra.txt Outdated Show resolved Hide resolved

tests/models/utils.py Show resolved Hide resolved

Borda added the ready PRs ready to be merged label Mar 19, 2020

williamFalcon reviewed Mar 24, 2020

View reviewed changes

Borda added 2 commits March 24, 2020 20:40

CI: split tests-examples

77872f8

tests without template

482699c

Borda added 12 commits March 24, 2020 20:43

drone volume

d846049

lint

d940ce4

update test notes

181f591

rename tests/models >> tests/base

044191c

group models

bc6fa27

conftest

fe315f0

optim imports

3ff54d0

typos

4707e80

fix import

45e51c8

fix tests

21b6e31

install AMP

16d9b4b

tests

860aa4e

fix import

a2303d7

Borda commented Mar 24, 2020

View reviewed changes

tests/models/__init__.py

@@ -1,57 +0,0 @@

"""Models for testing."""

Copy link

Member Author

Borda Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just moving old tests.models to tests.base

Borda requested review from a team and jeremyjordan March 24, 2020 20:58

tullie approved these changes Mar 24, 2020

View reviewed changes

Borda requested review from williamFalcon and ethanwharris March 24, 2020 23:06

jeremyjordan approved these changes Mar 25, 2020

View reviewed changes

williamFalcon approved these changes Mar 25, 2020

View reviewed changes

williamFalcon merged commit 45d671a into Lightning-AI:master Mar 25, 2020

williamFalcon mentioned this pull request Mar 25, 2020

Example: Simple RL example using DQN/Lightning #1232

Merged

5 tasks

Borda deleted the split-test-examples branch March 25, 2020 13:47

Borda modified the milestones: v0.7., v0.7.x Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: split tests-examples #990

CI: split tests-examples #990

Borda commented Mar 1, 2020 •

edited

Borda commented Mar 1, 2020

Borda commented Mar 1, 2020

awaelchli commented Mar 14, 2020

Borda commented Mar 14, 2020

awaelchli commented Mar 15, 2020

Borda commented Mar 15, 2020

Borda commented Mar 16, 2020

Borda commented Mar 17, 2020

pep8speaks commented Mar 19, 2020 •

edited

codecov bot commented Mar 19, 2020 •

edited

Borda commented Mar 19, 2020

awaelchli commented Mar 20, 2020

williamFalcon Mar 24, 2020

Borda Mar 24, 2020 •

edited

mergify bot commented Mar 24, 2020

williamFalcon commented Mar 24, 2020

Borda commented Mar 24, 2020

Borda Mar 24, 2020

tullie left a comment

jeremyjordan Mar 25, 2020

jeremyjordan Mar 25, 2020

jeremyjordan Mar 25, 2020

CI: split tests-examples #990

CI: split tests-examples #990

Conversation

Borda commented Mar 1, 2020 • edited

What does this PR do?

PR review

Did you have fun?

Borda commented Mar 1, 2020

Borda commented Mar 1, 2020

awaelchli commented Mar 14, 2020

Borda commented Mar 14, 2020

awaelchli commented Mar 15, 2020

Borda commented Mar 15, 2020

Borda commented Mar 16, 2020

Borda commented Mar 17, 2020

pep8speaks commented Mar 19, 2020 • edited

Comment last updated at 2020-03-24 20:39:00 UTC

codecov bot commented Mar 19, 2020 • edited

Codecov Report

Borda commented Mar 19, 2020

awaelchli commented Mar 20, 2020

williamFalcon Mar 24, 2020

Choose a reason for hiding this comment

Borda Mar 24, 2020 • edited

Choose a reason for hiding this comment

mergify bot commented Mar 24, 2020

williamFalcon commented Mar 24, 2020

Borda commented Mar 24, 2020

Borda Mar 24, 2020

Choose a reason for hiding this comment

tullie left a comment

Choose a reason for hiding this comment

jeremyjordan Mar 25, 2020

Choose a reason for hiding this comment

jeremyjordan Mar 25, 2020

Choose a reason for hiding this comment

jeremyjordan Mar 25, 2020

Choose a reason for hiding this comment

Borda commented Mar 1, 2020 •

edited

pep8speaks commented Mar 19, 2020 •

edited

codecov bot commented Mar 19, 2020 •

edited

Borda Mar 24, 2020 •

edited