Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Homology modeled structure? #11

Open
davidroberson opened this issue Nov 16, 2016 · 8 comments
Open

Homology modeled structure? #11

davidroberson opened this issue Nov 16, 2016 · 8 comments

Comments

@davidroberson
Copy link

davidroberson commented Nov 16, 2016

Is it possible to read in a homology modeled structure that is not in RCSB pdb using the

-pdb-file-dir

argument?

Thanks

@davidroberson davidroberson changed the title Horology modeled structure? Homology modeled structure? Nov 16, 2016
@AdamDS
Copy link
Collaborator

AdamDS commented Nov 16, 2016

Dave,

I don't believe it will work with the current state of HotSpot3D. There may be some issues since HotSpot3D uses information from UniProt and other databases to help with structure mapping. If the model is not in UniProt for your gene/protein then there should be errors in the uppro/calpro step. HotSpot3D looks to the chain information contained in UniProt for DBREF/PDB structures.

However, if your structure file is in the same format as a .pdb file, then there may be a way that we can work with non-RCSB/non-UniProt listed structures.

-Adam

On 11/16/16 2:46 PM, Dave Roberson wrote:

Is it possible to read in a horology modeled structure that is not in RCSB pdb using the

-pdb-file-dir

argument?

Thanks


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/11, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEqLJ6BDXjGyfHe6pYyn84Qg2lINqNsiks5q-2uOgaJpZM4K0blC.


The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

@davidroberson
Copy link
Author

Hi @AdamDS

The model is in pdb format and homology modeled off of 2ZPA in Swiss-Model.
http://www.rcsb.org/pdb/explore.do?structureId=2ZPA

Thanks for your help!

@sabrodie

@AdamDS
Copy link
Collaborator

AdamDS commented Nov 17, 2016

@sabrodie,

I think that there is a way to get this to work then. You'll need to be sure of a couple of details:

  1. Name your model file 2ZPA.pdb and store it in the local pdb-dir that HotSpot3D will use.

  2. Make sure that the protein chains are the same - that your homologous protein is labeled for the same chains as the original protein given in 2ZPA.

There may be some other necessary details, but I think that these two are the most critical.

-Adam

On 11/17/16 9:43 AM, Dave Roberson wrote:

Hi @AdamDShttps://github.com/AdamDS

The model is in pdb format and homology modeled off of 2ZPA in Swiss-Model.
http://www.rcsb.org/pdb/explore.do?structureId=2ZPA

Thanks for your help!

@sabrodiehttps://github.com/sabrodie


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/11#issuecomment-261281518, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEqLJ5GMHjRXundha9FhtYdHfnXzTDgtks5q_HYVgaJpZM4K0blC.


The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

@AdamDS
Copy link
Collaborator

AdamDS commented Nov 17, 2016

I just noticed that your protein is non-human. In the transcript annotation step there will be errors, because HotSpot3D expects transcripts from Ensembl. There is a line that will not know how to deal with EnsemblBacteria transcripts.
From what I can tell, the necessary files and lookups should largely be identical, so it could be possible to make some small tweaks to allow non-human proteins to be used. However, I am less familiar with processing them, so I cannot be sure how many other changes would be needed.

@davidroberson
Copy link
Author

Thank you @AdamDS. I will talk to @sabrodie who is the functional scientist leading this project and get back to you. He did have one more question which I will paraphrase here:

...is it possible that the variants in our gene of interest are not in solved (crystalized) regions of the protein.
see http://www.uniprot.org/uniprot/O43683
Secondary structure
1
1085
Legend: HelixTurnBeta strand
Show more details
3D structure databases
Entry Method Resolution (Å) Chain Positions PDBsum
2LAH NMR - A 1-150 [»]
4A1G X-ray 2.60 A/B/C/D 1-150 [»]
4QPM X-ray 2.20 A/B 740-1085 [»]
4R8Q X-ray 2.31 A 724-1085 [»]
5DMZ X-ray 2.40 A/B 726-1085 [»]

It looks like the mutations fall into the AA#~500.  Does that meanit is not represented in the crystal structures in the RCSB database?

FInally, is there an ideal number of genes to have present in the MAF file? We have many whole exomes worth of data...but are just focusing on a few genes. Should we change our approach?

@sabrodie
Copy link

@dave , @AdamDS
That was in reference to another protein in the same project....a very different problem.

Seth Brodie PhD
Senior Scientist Functional Group
Cancer Genomics Research Laboratory (CGR)
Division of Cancer Epidemiology and Genetics, NCI
Leidos Biomedical Research, Inc.
8717 Grovemont Circle
ATC Room 225B(office) Room 109(lab)
Gaithersburg, MD 20877

-----Original Message-----
From: Dave Roberson [notifications@github.commailto:notifications@github.com]
Sent: Thursday, November 17, 2016 06:47 PM Eastern Standard Time
To: ding-lab/hotspot3d
Cc: Brodie, Seth (NIH/NCI) [C]; Mention
Subject: Re: [ding-lab/hotspot3d] Homology modeled structure? (#11)

Thank you @AdamDShttps://github.com/AdamDS. I will talk to @sabrodiehttps://github.com/sabrodie who is the functional scientist leading this project and get back to you. He did have one more question which I will paraphrase here:

...is it possible that the variants in our gene of interest are not in solved (crystalized) regions of the protein.
see http://www.uniprot.org/uniprot/O43683
Secondary structure
1
1085
Legend: HelixTurnBeta strand
Show more details
3D structure databases
Entry Method Resolution (Å) Chain Positions PDBsum
2LAH NMR - A 1-150 [»]
4A1G X-ray 2.60 A/B/C/D 1-150 [»]
4QPM X-ray 2.20 A/B 740-1085 [»]
4R8Q X-ray 2.31 A 724-1085 [»]
5DMZ X-ray 2.40 A/B 726-1085 [»]

It looks like the mutations fall into the AA#~500. Does that meanit is not represented in the crystal structures in the RCSB database?

FInally, is there an ideal number of genes to have present in the MAF file? We have many whole exomes worth of data...but are just focusing on a few genes. Should we change our approach?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/11#issuecomment-261406525, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVw8zuuaNVbAef27qC_IvQ3P5EJE2kzDks5q_OebgaJpZM4K0blC.

@AdamDS
Copy link
Collaborator

AdamDS commented Nov 18, 2016

Some variants do end up in non-solved regions of the models. HotSpot3D cannot do anything with these at this time.
If you know that you will only need to look at a handful of genes, I very much recommend that you use a subset of your original .maf that contains only mutations from your genes of interest. This will drastically reduce run time and storage space usage. For perspective, preprocessing the ~5k human protein pdb structures takes ~1week to run on an LSF server and the data will take up ~2TB of space. We are in the process of optimizing HotSpot3D preprocessing to improve both run time and storage usage, but these updates are not yet in place. For the analysis steps, even with ~1M mutations in several thousand genes, analysis run times can take ~1day (without the sigclus step), so even there it will be useful to reduce the .maf to the genes of interest.

@AdamDS
Copy link
Collaborator

AdamDS commented Feb 22, 2017

@sabrodie
With the latest updates, we can now provide a way to support alternative Ensembl releases and reference genomes. I think that there are a couple of other things that could be done in the Trans.pm & Uppro.pm modules to support bacteria and other species. If you are still interested, perhaps we can work out a solution to help support other species data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants