Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for Pandas DataFrames #453

Open
marwan116 opened this issue Apr 7, 2020 · 5 comments
Open

Feature Request: Support for Pandas DataFrames #453

marwan116 opened this issue Apr 7, 2020 · 5 comments
Milestone

Comments

@marwan116
Copy link

marwan116 commented Apr 7, 2020

Thank you for open-sourcing the eliot logging library.

I have a question about the decision to use JSON to serialize the logs - specifically when it comes to scientific computing. Trying to use a pandas object as an argument results in Object of type DataFrame is not JSON serializable - however, had the choice been made to use YAML then this would not have been an issue.

Can you shed some light on the necessity of using JSON vs YAML for eliot's purposes - and what do you think about using YAML instead?

@marwan116
Copy link
Author

marwan116 commented Apr 7, 2020

As a followup - looking at the implementation of to_file. I see to_file(output_file, encoder=EliotJSONEncoder) - would the change be as simple as creating and using "EliotYAMLEncoder" here?

@itamarst itamarst changed the title Feature Request: Use YAML instead of JSON for structured serialization Feature Request: Support for Pandas DataFrames Apr 7, 2020
@itamarst itamarst added this to the 1.13.0 milestone Apr 7, 2020
@itamarst
Copy link
Owner

itamarst commented Apr 7, 2020

YAML doesn't magically enable Pandas DataFrames. The default Python YAML library will (de)serialize arbitrary objects, but that's insecure, at least for deserialization (the safe_* variants won't do that for that reason). So I recommend against it.

Some options:

  1. Eliot does have pluggable serializers for the JSON destination (it's how it serializes NumPy to JSON). I've already considered adding support for Pandas, so I will try to do that sometime soon.
  2. You can also plug in your own serialization system by adding a custom destination, an arbitrary function that can do anything it want with logged messages: https://eliot.readthedocs.io/en/stable/outputting/output.html#configuring-logging-output You could write one that opens a file and writes out YAML if you wish.

@marwan116
Copy link
Author

marwan116 commented Apr 7, 2020

"YAML doesn't magically enable Pandas DataFrames. The default Python YAML library will (de)serialize arbitrary objects, but that's insecure, at least for deserialization (the safe_* variants won't do that for that reason). So I recommend against it."

agreed, I usually use the yamlable library to wrap any object that is meant to be serialized by yaml - however one can argue for purposes when all YAML objects are locally created by the user then this security issue is less of a concern when it comes to deserialization ...

(re:yamlable: https://smarie.github.io/python-yamlable/) most of my use-cases involving pandas: it is a class that makes use of a pandas dataframe, or extends a pandas dataframe ... )

  1. "Eliot does have pluggable serializers for the JSON destination (it's how it serializes NumPy to JSON). I've already considered adding support for Pandas, so I will try to do that sometime soon."
    that's great to hear

  2. "You can also plug in your own serialization system by adding a custom destination, an arbitrary function that can do anything it want with logged messages: https://eliot.readthedocs.io/en/stable/outputting/output.html#configuring-logging-output You could write one that opens a file and writes out YAML if you wish."
    Thank you so much for this suggestion - I will attempt to create a custom destination then

@marwan116
Copy link
Author

Sorry I recognize this is probably a question better raised on ‘eliot-tree’ but if one uses a custom destination to a yaml file - would Eliot-tree also accept a custom deserializer ?

@itamarst
Copy link
Owner

itamarst commented Apr 7, 2020

Not sure, it's a different maintainer. FWIW I suggest option #1 is better: it'll Just Work with eliot-tree, and it's not very hard to do. Here's what the NumPy code looks like: https://github.com/itamarst/eliot/blob/master/eliot/json.py#L15

You'd just need to add another if statement or two there that converts a DataFrame/Series to Python objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants