Model not training #19

Thomas191 · 2019-07-17T09:54:04Z

Hello! I have been attempting to run this code for a couple of weeks but seem to have hit a dead end.

I am running the model on Ubuntu 18.04, with Tensorflow GPU installed (and verified with other code) and with CUDA 10.0 and CuDNN 7.6.1.

My end goal is to use CASP12 to predict the structure of around 1000 proteins.

At the moment I am using CASP10 (to save space) and trying to predict the structure of just one sequence to test the model.

Here is my folder structure:

WD/hmmer-3.2.1

WD/rgn/data_processing/
WD/rgn/model/

WD/proteinnet10

WD/RGN10/data/ProteinNet10Thinning90/testing
WD/RGN10/data/ProteinNet10Thinning90/training
WD/RGN10/data/ProteinNet10Thinning90/validation

WD/RGN10/runs/CASP10/ProteinNet10Thinning90/1
WD/RGN10/runs/CASP10/ProteinNet10Thinning90/2
...
WD/RGN10/runs/CASP10/ProteinNet10Thinning90/logs
WD/RGN10/runs/CASP10/ProteinNet10Thinning90/checkpoints
WD/RGN10/runs/CASP10/ProteinNet10Thinning90/configuration

WD/RGN10/logs

This is the last line of code:

rgn/model/protling.py RGN10/runs/CASP10/ProteinNet10Thinning90/configuration -d RGN10 -p -e weighted_testing

When the model runs there doesn't appear to be any errors, however the prediction is placed in folder number 1 and not in the highest number folder as would be expected.

Following the comments in another issue, I have tried deleting all the numbered folders and just training the model using the following code:

rgn/model/protling.py RGN10/runs/CASP10/ProteinNet10Thinning90/configuration -d RGN10

This only creates folder 1, logs, and checkpoints folders.

Likewise for the following code:

rgn/model/protling.py RGN10/runs/CASP10/ProteinNet10Thinning90/configuration -d RGN10 -p -e weighted_testing

Where once again only folder 1, logs, and checkpoints folders are created and the prediction for our sequence is placed in folder 1.

We have looked at this prediction and have converted it to a PDB file to view in PyMol, however the output is a helical structure (completely different to what a folded protein would look like).

We would appreciate any suggestions you have about how to fix this issue.

The text was updated successfully, but these errors were encountered:

alquraishi · 2019-07-31T02:02:32Z

Hi @Thomas191,

I just tried recreating your directory structure on my system and it worked fine. The predictions should definitely be placed in the highest numbered folder. Otherwise you're making predictions from an untrained model which will be junk. Training a model from scratch is also quite time-intensive.

I noticed that you don't have a gpu assigned. Did you try an option like -g0?

Thomas191 · 2019-08-02T09:14:05Z

Hi!
Adding the -g0 worked like a charm, however I now have the issue that it doesn't seem to work for fasta sequences with more than one chain. Should it? Or is the model not suited for multiple chains?
This is the error I get when I try to fold a protein with more than one chain:

Input file contains >1 alignments, but UCSC A2M formatted output file can only contain 1
WARNING: Logging before flag parsing goes to stderr.
W0801 20:46:52.664010 140230345348992 deprecation_wrapper.py:119] From rgn/data_processing/convert_to_tfrecord.py:120: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

Traceback (most recent call last):
File "rgn/data_processing/convert_to_tfrecord.py", line 123, in
dict_ = read_record(input_file, num_evo_entries)
File "rgn/data_processing/convert_to_tfrecord.py", line 68, in read_record
primary = letter_to_num(file_.readline()[:-1], _aa_dict)
File "rgn/data_processing/convert_to_tfrecord.py", line 53, in letter_to_num
num = [int(i) for i in num_string.split()]
ValueError: invalid literal for int() with base 10: '>2X7'

Again, any help is much appreciated.

alquraishi · 2019-08-02T19:31:00Z

Yes unfortunately it doesn't support multiple chains at the moment. You'd have to input them separately.

gszwabowski · 2019-08-08T17:28:57Z

@Thomas191 how did you convert the tertiary file to a pdb? I have my output but have no idea how to interpret it.

OsamaGhandour · 2021-01-01T22:45:10Z

@Thomas191 can you mention how exactly command structure that solve this problem for you ?
(This only creates folder 1, logs, and checkpoints folders)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model not training #19

Model not training #19

Thomas191 commented Jul 17, 2019

alquraishi commented Jul 31, 2019

Thomas191 commented Aug 2, 2019

alquraishi commented Aug 2, 2019

gszwabowski commented Aug 8, 2019

OsamaGhandour commented Jan 1, 2021

Model not training #19

Model not training #19

Comments

Thomas191 commented Jul 17, 2019

alquraishi commented Jul 31, 2019

Thomas191 commented Aug 2, 2019

alquraishi commented Aug 2, 2019

gszwabowski commented Aug 8, 2019

OsamaGhandour commented Jan 1, 2021