ENH: read parquet files in chunks using to_parquet and chunksize #55973

match-gabeflores · 2023-11-15T15:23:18Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

Similar to how read_csv has chunksize parameter, can read_parquet function have chunksize?

Seems like it's possible using pyarrow via iter_batches.
https://stackoverflow.com/questions/59098785/is-it-possible-to-read-parquet-files-in-chunks

Is this something feasible within pandas?

Feature Description

add a new parameter chunksize to read_parquet

Alternative Solutions

use pyarrow iter_batches

Additional Context

No response

The text was updated successfully, but these errors were encountered:

RahulDubey391 · 2023-11-26T09:01:33Z

Hi @match-gabeflores , I would like to have a look into this issue. Can you please assign it to me?

match-gabeflores · 2023-11-27T15:39:50Z

Thanks, go for it! Unfortunately, I don't have access to assign

Meadiocre · 2024-02-03T02:28:45Z

Hello @match-gabeflores,

My project team is looking for Pandas enhancement features for our grad school semester long project. We saw this task and would like to contribute if possible! Furthermore, we noticed that @RahulDubey391 mentioned that he wanted to work on this feature a few months ago. However, if no one is currently working on it, we would like to pick it up.

match-gabeflores · 2024-02-03T14:37:40Z

Go for it, @Meadiocre !

I don't have access to assign, I think that's just a formality anyway. @lithomas1

HkrFlores · 2024-02-04T02:32:12Z

take

HkrFlores · 2024-02-04T02:34:08Z

Hello @match-gabeflores,
I am working with @Meadiocre, will assign it to me
thanks!

HkrFlores · 2024-02-16T01:57:10Z

take

HkrFlores · 2024-03-26T03:46:15Z

Hello @match-gabeflores
I just want to make sure I am not deviating of what has been asked.
Got read_parquet() with chunksize to "work" in terms of that the values are displayed based on the chunksize selected, but the data is returned into DataFrame which looking at csv implementation will only display the full table (or at least the last batch). Right now I am working on implementing TextFileReader report the information in a table the same way csv does.
Looking at the engines setup for TextFileReader, noticed that there were not setup for Parquet, I created a new engine type (_typing) and can't find information about those engines, I assume python and pyarrow are ok? is there any other engine to use for this?

Thanks!

match-gabeflores added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 15, 2023

lithomas1 added IO Parquet parquet, feather Arrow pyarrow functionality and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 11, 2024

github-actions bot assigned HkrFlores Feb 16, 2024

HkrFlores mentioned this issue May 3, 2024

ENH: read parquet files in chunks using to_parquet and chunksize #58544

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: read parquet files in chunks using to_parquet and chunksize #55973

ENH: read parquet files in chunks using to_parquet and chunksize #55973

match-gabeflores commented Nov 15, 2023

RahulDubey391 commented Nov 26, 2023

match-gabeflores commented Nov 27, 2023 •

edited

Meadiocre commented Feb 3, 2024

match-gabeflores commented Feb 3, 2024

HkrFlores commented Feb 4, 2024 •

edited

HkrFlores commented Feb 4, 2024

HkrFlores commented Feb 16, 2024

HkrFlores commented Mar 26, 2024

ENH: read parquet files in chunks using to_parquet and chunksize #55973

ENH: read parquet files in chunks using to_parquet and chunksize #55973

Comments

match-gabeflores commented Nov 15, 2023

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

RahulDubey391 commented Nov 26, 2023

match-gabeflores commented Nov 27, 2023 • edited

Meadiocre commented Feb 3, 2024

match-gabeflores commented Feb 3, 2024

HkrFlores commented Feb 4, 2024 • edited

HkrFlores commented Feb 4, 2024

HkrFlores commented Feb 16, 2024

HkrFlores commented Mar 26, 2024

match-gabeflores commented Nov 27, 2023 •

edited

HkrFlores commented Feb 4, 2024 •

edited