New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using a notebook & git creates too many diff #9444
Comments
maybe have a look at jupytext. This extensions allows to save notebooks files as markdown files without output, then just add hope this helps, Alex- |
Hi @Alexboiboi thanks for your proposal, but actually there are many proposals around more or less complex, and I do want to git my *.ipynb. I was just wondering why a fix couldn't be embedded in jupyter. What I do currently before gitting, is just to restart the kernel which reset the index to zero & clear the output, then save, then commit. But it seems (quite) awkward to reset the kernel, and I can imagine, terrific in some situations (not for me hopefully). I guess they want to keep the GUI simple, but I think subjectively that my proposal is simple & clear. Thanks again. |
@sylvain-bougnoux what about the GUI diff in jupyterlab-git - see demo towards the end of the GIF below: |
I think that you can configure the underlying nbdiff to ignore outputs, see: https://nbdime.readthedocs.io/en/latest/config.html#configuring-ignores |
+1 to looking at nbdiff to get help with this. |
I'm +1 on an easy way to save just the inputs, like suggested in the OP. Several questions from when we've thought a lot about this before (when dealing with saving widget data, for example):
|
@jasongrout +1 for your answer, it is exactly the answer I was expecting. I guess the solution has pro & con, this is why opening the question, as a little survey to your workflows guys, is important.
PS: for |
@sylvain-bougnoux I understand your concerns (for the PS part). Just a quick clarification, there is an abandoned project nbdiff which you seem to refer too (https://pypi.org/project/nbdiff/, indeed in alpha), but there is also nbdime (https://github.com/jupyter/nbdime) that I linked to which is mature official project of Jupyter and has an nbdiff command (along nbmerge, etc) to which I referred to. |
Another question: what about saving metadata? It seems like saving standardized cell metadata is a good thing, but it's not so clear when extensions are also saving metadata. For example, ipywidgets provides the user with the option of saving the widget state, which is potentially huge, and not very useful if not saving outputs. Typically we tell extensions when we are saving to give them a chance to generate and update metadata in the notebook. Perhaps in the save handler, we need to tell extensions if we are doing a full save or an inputs-only save, and let them decide if they want to save metadata in each case. |
The issue with the *.ipynb format is that just deleting the output does not make it interact nicely with git. It is a big json-blob with metadata, such as
that should not really be tracked by git. So these have to be filtered out and still somewhat preserved later if the tracked file should be close to the original. Also, if you want to keep your output data the workflow then would need to include saving "only inputs", then commit, then save again with outputs and then disregard that git things the file has changed (no All of these issues are addressed by the jupytext extension that Alexboiboi mentioned. If this functionality should be part of the core Jupyterlab I am all for it but reinventing it with reduced capabilities feels like an unnecessary effort. |
Aside from version control, I think it is useful to be able to save a very minimal lightweight version of a notebook that just contains code, not output. |
Cross referencing related discussion in JupyterLab-git: jupyterlab/jupyterlab-git#392 |
@asteppke thanks for your interest. Regarding the meta-data @jasongrout mentioned it as well. But saving them or not depend on the context; eg it could be saved as a configuration for an example to work. So I think, it is a good idea to add it. Now as said jason git is just an example, but the need appears so generic that IMHO it is worth embedding these options in a pure notebook without the need to install another extension. I guess the simple workflow could be:
IMHO this workflow is interesting, because the standard behavior is kept, and it is effective for git, as usually the options hardly change during the file lifetime. For me it is just:
I guess '[x] save inputs' is not needed, but could be there for completeness. |
Related question on Stack Overflow: How can I configure my tools to ignore or prevent updates to the execution_count field in a Jupyter Notebook from being tracked in git? |
@sylvain-bougnoux An option to save an ipynb file without metadata and without outputs that should not change unless the actual code changes sounds like a good idea. This would allow git, diff or backup tools to at least distinguish between trivial re-execution and actual changes to the code. |
If you're using GitHub, pip install nbdime # Install nbdime (including nbdiff)
nbdime config-git --enable --global # Configure nbdime to play nice with git Then you can configure what to keep and what to ignore as pointed out by @krassowski 😃
|
Problem
I'm always frustrated when saving a notebook because it created many differences in git (hence it is hard to follow important diff). It is well known issue, plenty of people noticed it, but the many solutions proposed appear awkward or fairly complex for such a stupid issue.
Proposed Solution
It would be much simpler if we had an option to save only the input cells, not the output ones. And to reset the cell index (
execution_count
) to 0 without restarting the kernel.For instance in the
save_as
message box:And of course remember the selection.
Sorry if this has been asked hundreds of times.
The text was updated successfully, but these errors were encountered: