Loading Data from S3 #141

hilt86 · 2022-10-05T03:48:49Z

Thanks for making Zat!

I would like to be able to centrally store all my data in Amazon S3 and access then using Zat, however currently Zat expects an actual filesystem. Is it possible to use the native-ish support for S3 in Zat please?

brifordwylie · 2022-10-05T13:53:02Z

This is a super awesome suggestion.... putting this on the top of my open source queue 👍

hilt86 · 2022-10-06T22:28:19Z

Thanks! At the moment I've modified the /opt/zeek/share/zeekctl/scripts/archive-log log to call a small python script that converts the file to a parquet file on Amazon S3. For Analysis I use Google Colab like so :

fs = s3fs.S3FileSystem(anon=False)
filz = fs.find('s3://parquet.my.org/2022/10/')

fileList = []
for f in filz:
  if "dns" in f:
    fileList.append('s3://' + f)

Then I use pc.concat() to combine all the minis into one big dataframe for analysis :

df = pd.concat((pd.read_parquet(f, engine='fastparquet') for f in fileList), ignore_index=False)

hilt86 mentioned this issue Oct 5, 2022

AWS as data input/output examples #89

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading Data from S3 #141

Loading Data from S3 #141

hilt86 commented Oct 5, 2022

brifordwylie commented Oct 5, 2022

hilt86 commented Oct 6, 2022

Loading Data from S3 #141

Loading Data from S3 #141

Comments

hilt86 commented Oct 5, 2022

brifordwylie commented Oct 5, 2022

hilt86 commented Oct 6, 2022