GitHub - DreAymi/SAXS_reconstruction

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
example		example
model		model
train_net		train_net
.gitattributes		.gitattributes
README		README
SASDA25.dat		SASDA25.dat
SASDA25.out		SASDA25.out
SASDA25_fit1_model1.pdb		SASDA25_fit1_model1.pdb
align.py		align.py
auto_encoder_t.py		auto_encoder_t.py
genegroup_init_parameter_2.txt		genegroup_init_parameter_2.txt
main.py		main.py
main_process.py		main_process.py
map2iq.py		map2iq.py
pdb2voxel.py		pdb2voxel.py
processSaxs.py		processSaxs.py
region_search.py		region_search.py
result2pdb.py		result2pdb.py
run_model.py		run_model.py
voxel2pdb.py		voxel2pdb.py
zalign.py		zalign.py

Repository files navigation

User Guide
Please attention:
When you download this project on github, there are three model files in the 'model' directory named with model.ckpt*. They are too large to download successfully. You need to click those files in github, and download them manually, then put them in the 'model' directory.

1. Environment requirement and installation:
You need to install python2.7, tensorflow and SASTBX kit.
1) Recommend you use Anaconda to manage your environment. Just download Anaconda (python2.7 version) on https://www.anaconda.com, then follow the installation and user guide which suit for your platform. (https://docs.anaconda.com/anaconda/install/linux/)

2) After you installed anaconda successfully, use conda to install tensorflow.
If you have GPU devices, you can install tensorfloe-gpu version:

conda install tensorflow-gpu

(you can also specify version like this: conda install tensorflow-gpu=1.9.0, which is the version I used.)
Or just install CPU version:

conda install tensorflow=1.9.0

3) Last one you need to install is SASTBX, you can download here:

http://liulab.csrc.ac.cn/dokuwiki/doku.php?id=sastbx

You need to choose command-line version, and install the kit follow its documentation.
Pay attention that, when you install this kit, use the python environment which you have installed tensorflow.

After installation, you can type:
sastbx.python
Then, type:
import tensorflow as tf
tf.__version__
If it works, you have deployed successfully.

2. Description of files:
Files in folder 'train_net' are used to generate dataset and train autoencoder network.
Files in folder 'model' are the well-trained autoencoder model, which is used in the reconstruction project.
main.py is the only file you need to use when you run the project.
SASDA25.* are samples which can use to run a demo.
Other py files are sub-files, need to put in the same folder with main.py.

3. How to run example
Here give an example in 'example' folder, you can run 'run.sh' in terminal.
It's easy to run the project if you have deployed the environment successfully. Just input command like this:

sastbx.python main.py --model_path $model_path --iq_path $iq_path --output_folder $output_path --rmax $known_rmax --target_pdb $targetpdb_path

--model_path: the path of well-trained tensorflow's model files.(here we give a well-trained model in folder 'model'.)
--iq_path: file of experimental saxs data, DAT* or GNOM.
--output_folder: output path of results.
--rmax: size(Å) of the reconstruction object if you know.
--target_pdb: the target pdb structure if you want to compare with results.
--rmax_start: minimum of rmax (default=10)
--rmax_end: maximum of rmax (default=300)

--model_path, --iq_path and --output_folder are necessary. if the --iq_path you give is a GNOM type(eg.example/SASDA25.out) file, --rmax is not needed. This project will extract the rmax from your input file. If the --iq_path you give is just a DAT* file (two or three columns of data), you can specify a certain value of --rmax. If not, this project will find a well-matched size during reconstruction. You also can give a range of rmax by specify --rmax_start and --rmax_end.
--target_pdb is another additional parameter, help you know the relationship (correlation coefficient) between the reconstruction results and the structure you give.

4. Result files
When the project has done, you will get the following files:
out.ccp4: ccp4 format reconstruction result.
out_za.pdb: pdb format reconstruction result.
Log.txt: Record the time spent in each part.
cc_mat.txt: if you give --target_pdb, this file record the correlation coefficient of each iteration's top 20 samples' results.
final_saxs.txt: Record the fitting saxs data of last generation.
score_mat.txt: Record top 20 samples with high score on every generation.
cc_mat.txt: Record top 20 samples' correlation coefficient with target structure in every generation.

5. How to generate dataset and train network
There are three directories in folder 'train_net'. 'dataset' is used to generate dataset. 'train_net1' and 'train_net2' are two methods to train antoencoder network.
In 'dataset', you can generate dataset before train: 1) Run 'python download.py' to download 'pisa' dataset. 2) Run 'sastbx.python make_tfrecords.py' to generate tensorflow format dataset.
After generating dataset, you can train network as follows:
In 'train_net1', you can use two steps to train network: 1) Run 'python autoencoder.py' in 'net1'. Then run 'python autoencoder.py' in 'net2'. The final model will saved in 'train_net1/net2/model'.
In 'train_net2', you can run 'python autoencoder.py' directly to train network. The final model will saved in 'train_net2/model'.