Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: output extractor: infer data encoding from output metadata #858

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

akhmerov
Copy link
Member

As discussed with @minrk, with more media formats being added to the mimebundle, the logic of output extraction becomes harder to extend. For example if an application adds a new base64 encoded mimetype, ExtractOutputPreprocessor would need to be explicitly patched to handle this.

This PR defines an alternative mechanism for extensions communicating how to handle their outputs, specifically by defining output.metadata[mime_type]['encoding'] entry, which instructs the frontend how the output must be interpreted. Currently there are only 3 supported encodings:

  • json for json objects
  • utf8 for unicode strings
  • base64 for binary data

This would for example allow to extend IPython.display.Audio and IPython.display.Video objects to publish the source files directly. One concern, however, is that they would still need to provide _repr_html_ and therefore would need to double the amount of data in the notebook.

@akhmerov
Copy link
Member Author

This is WIP because of the following:

  • It assumes a new part of the jupyter spec, which needs to be approved first
  • Tests are missing

@blink1073 blink1073 added this to the 5.5 milestone Aug 29, 2018
@SylvainCorlay
Copy link
Member

Could this also enable the binary buffers in the messaging protocol?

@akhmerov
Copy link
Member Author

I'm not sure: the protocol requires that everything should be string-serializable. Otherwise we wouldn't be able to store the json representation of the notebook.

@akhmerov
Copy link
Member Author

...OTOH we could imagine other ways of specifying how the output data is provided, for instance by allowing to specify an URI.

@MSeal
Copy link
Contributor

MSeal commented Apr 23, 2019

Removing 5.5 milestone as this PR isn't complete and it's unclear if it should be addressed as approached.

Overall I think having a stronger approach to encoding choices would be good. But it is difficult to do as there's a lot of undocumented behavior / mimetypes that rely on the accidental application of particular encoding patterns here :/. Perhaps a question to set a standard for encoding mimetypes should be made somewhere higher up in the specs that other library should enforce in future schema versions?

@MSeal MSeal removed this from the 5.5 milestone Apr 23, 2019
@akhmerov
Copy link
Member Author

Indeed, this needs a discussing in http://github.com/jupyter/nbformat first.

@willingc willingc added the status:work-in-process Do not merge label Jul 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants