PDB-Protein-Analysis

This is a Repo consists of two modules:

A PDB (Protein Data Bank) dataset parser that will give cleaned ATOM information.
A Ramachandran Analysis tool.

PDB Parser

See PDB_Parser.py.

It goes to RCSB PDB (Protein Data Bank) and download (stream) the list of non-redundant protein structure files at 30% sequence identity level. The resulting text file "clusters-by-entity-30.txt" contains over 300,000 lines, each of which corresponds to a cluster of single-chain sequences and structures (those four alphanumeric characters are PDB IDs and they are followed by "_" and then by a polymer entity identifier, not chain identifier)

Next, loops over the largest 100 clusters (the first 100 lines) in the list, select one random structure for each cluster/line.

Finally, it extracts ATOM information from PDB and FASTA dataset, and returns a cleaned dataframe with: atom_name, residue_name, x, y, z.

Ramachandran Analysis tool

See Ramachandran_Analysis.py and find the experiment result at Ramachandran_Report.pdf.

Gives the Ramachandran Plots (scatter plots) for: (a) all residues but glycines and prolines (b) all glycines (c) all prolines

Experiments

To get the Ramachandran Plots, execute $python __main__.py.

Note that Ramachandran_Analysis must be initialized with a pandas dataframe having the format (columns): atom_name, residue_name, x, y, z.

Future Work/ Improvements

Current version skips the PDBx/mmCIF Format, thus the "first 100" structures is actually giving fewer (94, in the report case).
Current version simply accumulate all ATOMs, instead of making them into groups of chains, which may cause mis-calculations (alleviated to 128 occurrences in over 500000 amino acids).
Multi-processing/multi-threading methods can be used to improve the speed.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
images		images
.gitignore		.gitignore
LICENSE		LICENSE
PDB_Parser.py		PDB_Parser.py
README.md		README.md
Ramachandran_Analysis.py		Ramachandran_Analysis.py
Ramachandran_Report.pdf		Ramachandran_Report.pdf
__main__.py		__main__.py
imports.py		imports.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

images

images

.gitignore

.gitignore

LICENSE

LICENSE

PDB_Parser.py

PDB_Parser.py

README.md

README.md

Ramachandran_Analysis.py

Ramachandran_Analysis.py

Ramachandran_Report.pdf

Ramachandran_Report.pdf

main.py

main.py

imports.py

imports.py

Repository files navigation

PDB-Protein-Analysis

PDB Parser

Ramachandran Analysis tool

Experiments

Future Work/ Improvements

About

Releases

Packages

Contributors 2

Languages

License

YSChen0609/PDB-Protein-Analysis

Folders and files

Latest commit

History

Repository files navigation

PDB-Protein-Analysis

PDB Parser

Ramachandran Analysis tool

Experiments

Future Work/ Improvements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages