Skip to content

Why is pickled array so large? #3081

Answered by agoose77
nbecker asked this question in Q&A
Apr 12, 2024 · 2 comments · 2 replies
Discussion options

You must be logged in to vote

The nuance here is that the default integer type in Awkward Array is int64. So, when pickling the Awkward Array, each integer consumes 8 bytes. Meanwhile, Python's pickler knows how to densely pack integers (I think this routine: https://github.com/python/cpython/blob/2d3d9b4461d0e2cb475014868af3c2f241cb6495/Modules/_pickle.c#L2066). As such, particularly for small values, the difference between the two is stark.

If you care about space, then you can just use compression, e.g.

import io
import pickle
import numpy as np
import awkward as ak

def save_compressed(array, file_):
    form, length, container = ak.to_buffers(array)

    # Compress the arrays into a bytestream
    data = io.BytesIO

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Answer selected by agoose77
Comment options

You must be logged in to vote
2 replies
@agoose77
Comment options

@jpivarski
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants