Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNIPER is just for human?Any suggestions? If we want to use SNIPER in other species. #3

Open
wbszhu opened this issue Nov 29, 2019 · 4 comments

Comments

@wbszhu
Copy link

wbszhu commented Nov 29, 2019

No description provided.

@wbszhu
Copy link
Author

wbszhu commented Nov 29, 2019

Hi,

  1. We wanted to use it for the prediction of subcompartment A1 and A2 in mouse, but when I read the paper, it was all about human predictions. Any suggestions?If we want to use SNIPER in other species.
  2. I have tried to modify the script to run, but I still get an error. And I don't know how to prepare the files needed by sniper_train.py
  3. And I really want to know what is the crop?
    Thank you,
    Ruby

@wbszhu wbszhu changed the title hi, SNIPER is just for human?Any suggestions? If we want to use SNIPER in other species. Nov 29, 2019
@kairukuma
Copy link
Collaborator

Hi Ruby,

  1. Currently, we only provide models trained on GM12878 (human lymphoblastoid cells) and do not have models trained on mouse cell types. To make predictions in mouse cell types, we suggest training a separate model on high coverage Hi-C mouse cell types (i.e. mESC, mNPC, or mCN from Bonev et al.) and use a downsampled dataset for your training input. As we don't have a ground truth for mouse cell annotations, you would need to run Gaussian HMM (or another clustering method) on high-coverage mouse data to obtain a ground truth.

  2. I'm not sure which errors you are getting - it would be helpful if you could be more specific. If you are running SNIPER on mouse cell types, you need files that are tailored to the mouse genome. Looking at SNIPER's code, currently it's designed to work on human cell types because it assumes 22 autosomal chromosomes, but can be modified to work on mouse cell types by adjusting the upper range of the for loops from 23 to 20. We will work on adding an option to specify different genome assemblies across multiple species.

  3. The crop here includes matrix rows and columns that were removed from the GM12878 inter-chromosomal Hi-C matrix as they were too sparse (greater than 30% of entries were zeros or undefined). The crop is important in running SNIPER because it ensures that the same rows and columns are removed from the Hi-C matrices of other cell types. You can construct such a crop map when you're annotating your ground truth.

@wbszhu
Copy link
Author

wbszhu commented Jan 14, 2020

Hi elykcoldster,
Sorry too late to see it ~
Thanks for the reply~
For the Third point, I had change the upper range of the for loops. This operation make no sense, and errors will still occur.
###########################################################################
[lzhang@cu06 SNIPER]$python3.6 sniper_train.py 0h.allvalidPairs.hic ./target_hic annotations.zip -jt $PATH/juicer_tools_1.11.09_jcuda.0.8.jar -c crop_map -dd ./test -sm -ar -ow
Using TensorFlow backend.
Constructing input matrix
Trimming sparse regions...
Traceback (most recent call last):
File "sniper_train.py", line 21, in
train_with_hic(params)
File "$PATH/training.py", line 67, in train_with_hic
inputM = trimMat(inputM,params['cropIndices'])
File "$PATH/data_processing.py", line 77, in trimMat
M = M[row_indices,:]
IndexError: index 12808 is out of bounds for axis 0 with size 12808
###########################################################################

Looking forward to your updates.
best

@Simple53
Copy link

Hi, I am also interested in training with my own data with Gaussian HMM. Could you give some suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants