You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should stash metadata in the parquet schema, rather than using extensionarray.
We still need ExtensionArray for to_arrow because pa.array doesn't otherwise have a mechanism for stashing metadata.
The text was updated successfully, but these errors were encountered:
agoose77
added
bug (unverified)
The problem described would be a bug, but needs to be triaged
bug
The problem described is something that must be fixed
and removed
bug (unverified)
The problem described would be a bug, but needs to be triaged
labels
Oct 25, 2023
@jpivarski I take this as an indication that we should pursue a solution in Awkward rather than trying to get upstream support for partial reads. What do you think? (I haven't taken too much time to read all of the discourse).
Yeah, we need to do this ourselves. A good work-around will be:
ak.to_arrow retains the associated ExtensionArray logic. We still need to round-trip Awkward Arrays through pyarrow.array and these are not storage types that will need per-column selection. Also, with pyarrow.array, there is no Table metadata, so ExtensionArray remains the only way to do this.
ak.to_arrow_table changes in two ways: (1) it fills the table with pyarrow.arrays made with extensionarray=False and (2) it puts the Form and other Awkward information into the Table metadata.
ak.from_arrow applied to pyarrow.array uses the existing ExtensionArray logic (it has no choice), and ak.from_arrow applied to Table uses the Table metadata to losslessly reconstruct the Awkward Array.
The implementation of ak.from_arrow on Tables might proceed by reading the non-ExtensionArray columns, constructing the ExtensionArray type, applying it to the columns, and then using the existing ExtensionArray infrastructure to ensure that the Awkward Array is properly built. Alternatively, it might be an entirely different code path. On the one hand, we'd like to reuse code and treat pyarrow.array and pyarrow.Table in similar ways, but on the other hand, introducing the ExtensionArray could be more complicated than a straight conversion.
If the implementation still goes through ExtensionArray, we may want to leave the name of the argument as extensionarray: bool in both ak.to_arrow and ak.to_arrow_table. If not, we might want to deprecate both or just the one in ak.to_arrow_table to be lossless: bool.
Version of Awkward Array
main`
Description and code to reproduce
We should stash metadata in the parquet schema, rather than using extensionarray.
We still need
ExtensionArray
forto_arrow
becausepa.array
doesn't otherwise have a mechanism for stashing metadata.The text was updated successfully, but these errors were encountered: