WIP: output extractor: infer data encoding from output metadata #858

akhmerov · 2018-08-17T11:57:18Z

As discussed with @minrk, with more media formats being added to the mimebundle, the logic of output extraction becomes harder to extend. For example if an application adds a new base64 encoded mimetype, ExtractOutputPreprocessor would need to be explicitly patched to handle this.

This PR defines an alternative mechanism for extensions communicating how to handle their outputs, specifically by defining output.metadata[mime_type]['encoding'] entry, which instructs the frontend how the output must be interpreted. Currently there are only 3 supported encodings:

json for json objects
utf8 for unicode strings
base64 for binary data

This would for example allow to extend IPython.display.Audio and IPython.display.Video objects to publish the source files directly. One concern, however, is that they would still need to provide _repr_html_ and therefore would need to double the amount of data in the notebook.

akhmerov · 2018-08-17T11:58:15Z

This is WIP because of the following:

It assumes a new part of the jupyter spec, which needs to be approved first
Tests are missing

SylvainCorlay · 2018-08-29T13:52:29Z

Could this also enable the binary buffers in the messaging protocol?

akhmerov · 2018-08-29T14:02:33Z

I'm not sure: the protocol requires that everything should be string-serializable. Otherwise we wouldn't be able to store the json representation of the notebook.

akhmerov · 2018-08-29T14:16:54Z

...OTOH we could imagine other ways of specifying how the output data is provided, for instance by allowing to specify an URI.

MSeal · 2019-04-23T04:01:34Z

Removing 5.5 milestone as this PR isn't complete and it's unclear if it should be addressed as approached.

Overall I think having a stronger approach to encoding choices would be good. But it is difficult to do as there's a lot of undocumented behavior / mimetypes that rely on the accidental application of particular encoding patterns here :/. Perhaps a question to set a standard for encoding mimetypes should be made somewhere higher up in the specs that other library should enforce in future schema versions?

akhmerov · 2019-04-23T11:35:46Z

Indeed, this needs a discussing in http://github.com/jupyter/nbformat first.

output extractor: infer data encoding from output metadata

c4fb372

blink1073 added this to the 5.5 milestone Aug 29, 2018

akhmerov mentioned this pull request Nov 21, 2018

do not pipe non-json data through json.dumps #910

Open

MSeal removed this from the 5.5 milestone Apr 23, 2019

willingc added the status:work-in-process Do not merge label Jul 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: output extractor: infer data encoding from output metadata #858

WIP: output extractor: infer data encoding from output metadata #858

akhmerov commented Aug 17, 2018

akhmerov commented Aug 17, 2018

SylvainCorlay commented Aug 29, 2018

akhmerov commented Aug 29, 2018

akhmerov commented Aug 29, 2018

MSeal commented Apr 23, 2019

akhmerov commented Apr 23, 2019

WIP: output extractor: infer data encoding from output metadata #858

Are you sure you want to change the base?

WIP: output extractor: infer data encoding from output metadata #858

Conversation

akhmerov commented Aug 17, 2018

akhmerov commented Aug 17, 2018

SylvainCorlay commented Aug 29, 2018

akhmerov commented Aug 29, 2018

akhmerov commented Aug 29, 2018

MSeal commented Apr 23, 2019

akhmerov commented Apr 23, 2019