Skip to content

etiennemunnich/s3perf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

s3perf

Some performance ideas for S3 - make the most of request rates to S3:

The plan here is to create some proof of concept ideas for writing/reading to/from S3 to allow for higher request rates

Random Hash implementations:

  • Using a random hash prefix to the object name will allow for higher request rates. Here are some example s3://mybucket/[random hex hash]/objectname.txt s3://mybucket/a/objectname.txt s3://mybucket/b/objectname.txt .... s3://mybucket/z/objectname.txt

Example 1: s3hcp.py - This is an simple example of adding the MD5 hash of a file infront of object name and then writing an index file that contains the object name and the location (hashed) in a csv.

Use:

python3 s3hcp.py [localfile(s)] --bucket [bucketname]

Example use for copying files in the current folder (doesn't recursively copy folders just yet):

python3 s3hcp.py * --bucket mybucket or python3 s3hcp.py *.csv --bucket mybucket

Some further ideas:

  • async implementation or look at threading for higher through-puts
  • need to put more exception checking in:
    • check the bucket exists
  • allow for writing ACLs, encryption, tags and meta-data
  • allow for an input file list to process for upload to S3
  • allow for retry logic and debug log files