Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Binder build configuration and notebook execution #50

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

willingc
Copy link

@shaunagm This WIP PR updates the Binder build.

The todo.md file has some recommendations after I ran all the notebooks. If you want to walk through this later this week, I would be happy to do so.

@project-bot project-bot bot added this to Needs Triage in Issues & PRs Jul 23, 2019
@willingc
Copy link
Author

Looks like the missing file is due to a data directory/submodule not being copied to this repo.

JupyterLab's demo repo for Binder has a nice layout for binder with separate directories for data files and notebooks. Also it uses pyinvoke which simplifies a local install.

@shaunagm
Copy link
Contributor

The missing file issue (the usacturial.txt issue) was due to a problem with how our setup.py was configured. The .txt file should actually be part of our HARK package which these notebooks use. Unfortunately fixing this for one notebook involves changing the version of all our notebooks, which we'd strongly prefer not to do.

And yes, let's do a videocall later this week?

@willingc
Copy link
Author

Sure. I'm not clear on why the notebooks would need to all change version unless there are major breaking changes in EconARK's API.

I could set up another branch on my repo that pulls the latest commit from master for EconARK so at least you could test it out if you want. Should only take 5 minutes.

@shaunagm
Copy link
Contributor

shaunagm commented Jul 23, 2019

We don't want them to change version - we'd like to reach a point with most of these notebooks where they're "done" and we don't ever touch them again. Changing versions of Econ-ARK and then having to update the notebooks so they don't break when Econ-ARK has a breaking change is exactly what we're trying to avoid. But newer notebooks will need newer versions of Econ-ARK. If all notebooks in binder must have the same version of Econ-ARK we are trapped having to occasionally update 30+ notebooks to use new versions.

@willingc
Copy link
Author

willingc commented Jul 23, 2019

@shaunagm This branch https://github.com/willingc/DemARK/tree/try-master-latest-sha uses the latest master on Binder and it looks like stuff is working.

Look at the binder folder requirements.txt and see how the version for binder is configured to pull from master. The notebooks with the missing file are working now.

As an FYI, you can have binder serve a different version of Econ-ARK than the repo itself. Binder will default to taking the requirements in the binder folder first and if not found use the requirements at the repo's root.

Here's the binder link for a build using the master of econ-ARK: https://mybinder.org/v2/gh/willingc/DemARK/try-master-latest-sha

@shaunagm
Copy link
Contributor

shaunagm commented Jul 23, 2019

Oh, neat. Could we do something like git+git://github.com/econ-ark/HARK.git@latest_stable_release so that, unless overwritten in the notebook itself, the notebooks would automatically get whatever branch we'd created under the name latest_stable_release? (We already create release branches, so we could just duplicate stables one to latest_stable_release.) That's probably safer than just pinning to master, which we introduce bugs into all the time.

@willingc
Copy link
Author

Yep you can pin it to any SHA1, tag, or release.

@willingc willingc changed the title [WIP] Troubleshoot Binder build and notebook execution Simplify Binder build configuration and notebook execution Jul 24, 2019
@llorracc
Copy link
Collaborator

llorracc commented Jul 25, 2019 via email

@willingc
Copy link
Author

@llorracc Shauna and I chatted this morning. There are 3 general ways to test notebooks: nbval, nbconvert, and papermill. Typically using pytest as the test runner. A good resource for all things notebooks and reproducible research: https://github.com/jupyter-guide/jupyter-guide and its related https://github.com/jupyter-guide/ten-rules-jupyter.

@shaunagm
Copy link
Contributor

shaunagm commented Jul 25, 2019

@llorracc can you get us an initial list of notebooks you want to run, that should pass (or fail!) quickly?

Also: see this open active discussion on testing notebooks.

@llorracc
Copy link
Collaborator

@willingc Thanks very much for your extensive improvements on a bunch of fronts.

I was about to merge but then realized that this discussion is independently valuable and would be less prominent after a merge.

Having reread the discussion in the related issue I think my preferences have clarified.

Steps:

  1. Get to a point where we have verified that ALL DemARKs and REMARKs work with the current stable release.
    • We are pretty close to that point right now; the Alphacruncher guys have tested all DemARKs and REMARKs and I think we have addressed most or all of the problems
  2. Set things up so that, when there is a new dev release, that triggers a process to run all of the DemARK *.py files and all the REMARK do_min.py files
    • If there are notebooks that break, we can either:
      1. Do a "quick fix" which is to pin the notebooks to the previous dev release and put the unpinned version on a "to-fix" branch; or
      2. Actually fix the problem(s) and post the corrected versions

For every notebook that breaks, we therefore have a means to unbreak it.

I much prefer this workflow to pinning everything to the current (working) version, precisely because I think that trying to run existing notebooks on a new candidate release is an excellent tool for bug-finding in the new release.

@willingc
Copy link
Author

willingc commented Jul 26, 2019

I was about to merge but then realized that this discussion is independently valuable and would be less prominent after a merge.

No rush to merge. 😄

Having reread the discussion in the related issue I think my preferences have clarified.

Steps:

  1. Get to a point where we have verified that ALL DemARKs and REMARKs work with the current stable release.

Excellent plan!

  • We are pretty close to that point right now; the Alphacruncher guys have tested all DemARKs and REMARKs and I think we have addressed most or all of the problems

Nice!

  1. Set things up so that, when there is a new dev release, that triggers a process to run all of the DemARK *.py files and all the REMARK do_min.py files

I would recommend to test the notebooks on all pushes to master and run the matrix on all supported Python versions and nightly. This way there should be minimal breakage at release time. Azure Pipelines may be a good CI tool instead of or in addition to Travis or CircleCI since it supports Windows and is usually much faster.

  • If there are notebooks that break, we can either:

    1. Do a "quick fix" which is to pin the notebooks to the previous dev release and put the unpinned version on a "to-fix" branch; or
    2. Actually fix the problem(s) and post the corrected versions

For every notebook that breaks, we therefore have a means to unbreak it.

I much prefer this workflow to pinning everything to the current (working) version, precisely because I think that trying to run existing notebooks on a new candidate release is an excellent tool for bug-finding in the new release.

Yep, getting tests written would simplify quite a bit since you should be able to have binder use the latest stable version which would be pulled into the container if it is unpinned.

@llorracc
Copy link
Collaborator

Seems like the right way to do this is to have some way of tagging the notebooks that should be automatically run, maybe by adding something to the metadata, and then the script should just run those. Either that or have in the DemARK (and REMARK) repos a /test-me folder or something like that with a list of the names of the .py files that should be run when there is a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Issues & PRs
  
Needs Triage
Development

Successfully merging this pull request may close these issues.

None yet

4 participants