Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Separate file for notebook executed cell outputs. #5677

Open
jbursey opened this issue Aug 12, 2020 · 11 comments
Open

Suggestion: Separate file for notebook executed cell outputs. #5677

jbursey opened this issue Aug 12, 2020 · 11 comments

Comments

@jbursey
Copy link

jbursey commented Aug 12, 2020

Unless this is a feature already I think it would be nice to have a separate file (something like .ipynb.output) that links output to their cells in the .ipynb json file. This would make it significantly easier to exclude notebook outputs in source control systems like git.

If this is already possible somehow I would be interested to know.

@gitjeff05
Copy link

Its not a bad idea. But if keeping cell output out of source control is your primary concern, the easiest solution is to just clear the outputs before committing. There are a few ways to do that:

  1. Use a commit hook as outlined in Jupyter docs.

  2. Use Jupyter's shortcut to "clear all cell output"

  3. Use nbconvert to clear the notebook outputs before committing.

  4. You could also just write your own shell script to clear outputs. I wrote one using jq to do that and it is fairly easy.

Some folks also choose to just convert the notebook to python using nbconvert and then just commit that. If you search for "How to version control jupyter notebooks" you will see a bunch of posts on the topic.

@cipri-tom
Copy link

I think that jupyterlab already has the capability of displaying the output in a different view from the notebook.

@IvoMerchiers
Copy link

Alternatively, Jupytext could be helpful for your case. It allows you to save notebooks as code. Then you only need to commit the code to git, whilst you can ignore the notebooks for version control.

Their paired notebooks avoid the need for automatically saving and converting the notebooks.

@th0ger
Copy link

th0ger commented Aug 26, 2023

Good idea. The alternative discussed above are about excluding cells from source control.

But sometimes we have a need to include the executed cells in source control. (My current case is with Quarto.)
Including the cell output in the .ipynb file makes it extremely difficult to review/diff a plaintext. This experience would be improved a lot if the input and output could be separated. A reviewer would then be able to decide whether the changes was cause by code change, or purely external changes and rejection of the notebook.

@alexbjorling
Copy link

This feature would be very helpful for cases where execution is time-consuming, or relies on the availability of input data or tricky code dependencies. With separate output, the .ipynb.output file could be managed with (eg) git LFS, making the .ipynb diffs easy to review and still allowing retension and versioning of the output.

@th0ger
Copy link

th0ger commented Nov 7, 2023

@alexbjorling LFS is a good point. Notebook output is very suitable for LFS, but input cells are not.

@Tyrrx
Copy link

Tyrrx commented Nov 21, 2023

I think cleaning the notebook can only be seen as a workaround.

@zmbc
Copy link

zmbc commented Feb 21, 2024

Yes, this would be a huge improvement. I believe this is why Quarto embeds Python in Markdown as a "plain text representation of notebooks."

If the .ipynb itself could be in a readable plain-text format, and the outputs stored in a separate file, that would:

  • Make diffing of notebook code trivial.
  • Make editing (the code of) a notebook easy using any text editor.
  • Allow the output to be versioned using a non-plaintext scheme, e.g. Git LFS as mentioned above, or being snapshotted only periodically as opposed to on every commit.

@carschandler
Copy link

Hugely in support of this! Even if it isn't a default behavior, it would be amazing to have the option.

@zmbc
Copy link

zmbc commented Apr 24, 2024

Surprised not to see anyone mention this yet, this jupyter extension does almost exactly what this thread describes: https://jupytext.readthedocs.io/en/latest/paired-notebooks.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants