You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I agree that the ability to read Parquet files would be nice. It's probably worth investigating whether using something like pyarrow directly has any sort of performance gains over pandas.read_parquet, but if you're interested in a very minimal example of a Parquet data loader, you can add the snippet below (which requires pyarrow) to your glue config file, which should allow you to load at least basic Parquet files:
from glue.config import data_factory
from glue.core.data_factories.helpers import has_extension
from glue.core.data_factories.pandas import panda_process
from pandas import read_parquet
@data_factory(label="Parquet file", identifier=has_extension("parquet"))
def pandas_read_parquet(path, engine="pyarrow", **kwargs):
df = read_parquet(path, engine=engine)
return panda_process(df)
Is your feature request related to a problem? Please describe it:
Pandas'
parquet
files are not loadedDescribe the solution you'd like:
Load
parquet
filesThe text was updated successfully, but these errors were encountered: