Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading Data from S3 #141

Open
hilt86 opened this issue Oct 5, 2022 · 2 comments
Open

Loading Data from S3 #141

hilt86 opened this issue Oct 5, 2022 · 2 comments

Comments

@hilt86
Copy link

hilt86 commented Oct 5, 2022

Thanks for making Zat!

I would like to be able to centrally store all my data in Amazon S3 and access then using Zat, however currently Zat expects an actual filesystem. Is it possible to use the native-ish support for S3 in Zat please?

@brifordwylie
Copy link
Member

This is a super awesome suggestion.... putting this on the top of my open source queue 👍

@hilt86
Copy link
Author

hilt86 commented Oct 6, 2022

Thanks! At the moment I've modified the /opt/zeek/share/zeekctl/scripts/archive-log log to call a small python script that converts the file to a parquet file on Amazon S3. For Analysis I use Google Colab like so :

fs = s3fs.S3FileSystem(anon=False)
filz = fs.find('s3://parquet.my.org/2022/10/')

fileList = []
for f in filz:
  if "dns" in f:
    fileList.append('s3://' + f)

Then I use pc.concat() to combine all the minis into one big dataframe for analysis :

df = pd.concat((pd.read_parquet(f, engine='fastparquet') for f in fileList), ignore_index=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants