Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDB Reader can stream from AWS s3 buckets with minimal modification #4568

Open
ljwoods2 opened this issue Apr 19, 2024 · 3 comments
Open

PDB Reader can stream from AWS s3 buckets with minimal modification #4568

ljwoods2 opened this issue Apr 19, 2024 · 3 comments

Comments

@ljwoods2
Copy link
Contributor

As per @hmacdope's request, here is how you can tweak PDBReader with a few lines of code to get it to read from an AWS S3 bucket:

https://github.com/ljwoods2/mdanalysis/pull/2/files

This allows you to do something like this:

import MDAnalysis as mda
from MDAnalysisTests.datafiles import PSF
import s3fs

s3_fs = s3fs.S3FileSystem(
    # anon must be false to allow authentication
    anon=False,
    profile='sample_profile',# use profiles defined in a .aws/credentials file to store secret keys
    client_kwargs=dict(
        region_name='us-west-1',
    )
)

# PDB trajectory file is stored in an S3 bucket
# Trajectory used is PDB_small from MDAnalysisTests.datafiles
file = s3fs.S3File(s3_fs, "zarrtraj-test-data/pdb_small.pdb")

u = mda.Universe(PSF, file, format="PDB")
for ts in u.trajectory:
    print(u.atoms)

This works because File-like objects are accepted by the PDBReader (and potentially other formats, @orbeckst suggested the GRO format may be able to do this as well) and S3File objects implement this interface.

For a large trajectory, this would be extremely slow, but could be sped up with caching.

@RMeli
Copy link
Member

RMeli commented Apr 19, 2024

Related to #4139

@orbeckst
Copy link
Member

orbeckst commented May 9, 2024

There's nothing in the code that needs changing, so this is more of a "let's document fun things one can do", perhaps for a "hacking around MDAnalysis section".

@orbeckst
Copy link
Member

orbeckst commented May 9, 2024

Also note that this functionality will fail to work when we were to switch to accelerated text-based readers, which would be based on a C++/Cython implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants