Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change the name of "mode" argument to awswrangler.s3.to_csv #2409

Open
goleash-4alight opened this issue Jul 25, 2023 · 5 comments
Open

change the name of "mode" argument to awswrangler.s3.to_csv #2409

goleash-4alight opened this issue Jul 25, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request investigating More investigation is required

Comments

@goleash-4alight
Copy link

Describe the bug

The method (awswrangler.s3.to_csv) supports a "mode" argument and **pandas_kwargs. The "mode" argument is not passed through to Pandas, but consumed in the awswrangler method, which also expects dataset=True to use "mode". In some cases, it would be useful to pass this argument through to Pandas.

If there is already a way to pass "mode" to Pandas, a documentation update would resolve this issue:

How to Reproduce

import awswrangler as wr

...

load a DataFrame and name it df

...

Pandas "mode"

wr.s3.to_csv(df, "some_test_file_name", mode="a", header=False)

awswrangler expects mode="append", dataset=True

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

AWS Lambda x86_64 Architecture

Python version

3.10

AWS SDK for pandas version

3.2.0

Additional context

No response

@goleash-4alight goleash-4alight added the bug Something isn't working label Jul 25, 2023
@malachi-constant
Copy link
Contributor

To avoid a breaking change we can consider introducing a one-off parameter (pandas_mode for example) that is re-labeled as mode and passed to the underlying pandas method. I can open a PR and the team can discuss if this is how we'd like to move forward.

@goleash-4alight
Copy link
Author

That makes sense and sounds great -- thanks for the prompt response.

@malachi-constant malachi-constant moved this from To Do to In progress in AWS SDK for pandas roadmap Jul 26, 2023
@malachi-constant malachi-constant added enhancement New feature or request investigating More investigation is required and removed bug Something isn't working labels Jul 26, 2023
@malachi-constant
Copy link
Contributor

Still investigating here as this may require refactoring on how we are reading existing S3 File objects in order to support modes like append(a)

@goleash-4alight
Copy link
Author

This may not be necessary at all if there is a way to append an existing file. For example, this seems to silently fail (doesn't append the dataframe "df" to the s3 file "target" and does not raise an exception):

wr.s3.to_csv(df, target, dataset=True, mode='append')

@ggoleash-4-ats
Copy link

ggoleash-4-ats commented Oct 4, 2023

Is there a different way to append an existing file? We use pandas "append" mode primarily with pandas "chunksize" for large text files. If there's a different way to do this using awswrangler, "chunksize" and "mode" for pandas are not necessary. We're looking for a way to append to an existing file rather than writing a new file in the target directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request investigating More investigation is required
Projects
Development

No branches or pull requests

3 participants