Skip to content

Python helpers for doing IO with Pandas DataFrames

License

Notifications You must be signed in to change notification settings

Mikata-Project/df_io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

df_io

Python helpers for doing IO with Pandas DataFrames

Available methods

read_df

  • bzip2/gzip/zstandard compression
  • passing parameters to Pandas' readers
  • reading from anything, which smart_open supports (local files, AWS S3 etc)
  • most of the available formats, Pandas supports

write_df

This method supports:

  • streaming writes
  • chunked writes
  • bzip2/gzip/zstandard compression
  • passing parameters to Pandas' writers
  • writing to anything, which smart_open supports (local files, AWS S3 etc)
  • most of the available formats, Pandas supports

Documentation

API doc

Examples

Write a Pandas DataFrame (df) to an S3 path in CSV format (the default):

import df_io

df_io.write_df(df, 's3://bucket/dir/mydata.csv')

The same with gzip compression:

df_io.write_df(df, 's3://bucket/dir/mydata.csv.gz')

With zstandard compression using pickle:

df_io.write_df(df, 's3://bucket/dir/mydata.pickle.zstd', fmt='pickle')

Using JSON lines:

df_io.write_df(df, 's3://bucket/dir/mydata.json.gz', fmt='json')

Passing writer parameters:

df_io.write_df(df, 's3://bucket/dir/mydata.json.gz', fmt='json', writer_options={'lines': False})

Chunked write (splitting the df into equally sized parts and creating/writing outputs for them):

df_io.write_df(df, 's3://bucket/dir/mydata.json.gz', fmt='json', chunksize=10000)