Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model problem #343

Open
Guoshuai1314 opened this issue Mar 8, 2021 · 9 comments
Open

model problem #343

Guoshuai1314 opened this issue Mar 8, 2021 · 9 comments

Comments

@Guoshuai1314
Copy link

Guoshuai1314 commented Mar 8, 2021

When I ran the test data, I found that module Modelling took too long. It was expected to take 6 minutes, but I ran it in 2334 minutes.When I tested it with my own data, it ran for more than 20 days and didn't finish.Is it my system?I use the Centos7 system.

image

@david-castillo
Copy link
Contributor

Hi,

That's not normal. Can you run this test inside the 'test' folder of TADbit?

python test_all.py 13

That will run a simple modelling that should take seconds to compute.

Regards

David

@Guoshuai1314
Copy link
Author

Hi,

That's not normal. Can you run this test inside the 'test' folder of TADbit?

python test_all.py 13

That will run a simple modelling that should take seconds to compute.

Regards

David

yes, It works fine.
image

@david-castillo
Copy link
Contributor

How did you generate your matrix with "tadbit bin"?
It's better to generate the matrix only of the region to be modelled, otherwise TADbit might be trying to recreate the full Hi-C matrix just to take a small region afterwards.

@Guoshuai1314
Copy link
Author

How did you generate your matrix with "tadbit bin"?
It's better to generate the matrix only of the region to be modelled, otherwise TADbit might be trying to recreate the full Hi-C matrix just to take a small region afterwards.

I did not use "tadbit bin" to generate the matrix, but directly used JuicerTools dump to generate the matrix.And then I did it with the whole chromosome matrix.

@david-castillo
Copy link
Contributor

david-castillo commented Mar 11, 2021

Then I'm sure TADbit is trying to rebuild the full matrix and that's taking forever if your computer has not a lot of memory and speed.
Can you convert your data to the following format?

# CRM chr20	64444167
# chr20:2-102 resolution:50000
# MASKED 
0	0		8
9	0		1
...

So,
First line is the chromosome and its total size
Second line is the region contained in the file. In this case chromosome 20 at 50Kbp from bin 2 to bin 102 (chr20:100000-5100000)
Third line is used to mask columns in case those columns have no data, you cam leave it blank.
Then each line is a i,j and value (need to be normalized) starting with 0,0

David

@Guoshuai1314
Copy link
Author

Then I'm sure TADbit is trying to rebuild the full matrix and that's taking forever if your computer has not a lot of memory and speed.
Can you convert your data to the following format?

# CRM chr20	64444167
# chr20:2-102 resolution:50000
# MASKED 
0	0		8
9	0		1
...

So,
First line is the chromosome and its total size
Second line is the region contained in the file. In this case chromosome 20 at 50Kbp from bin 2 to bin 102 (chr20:100000-5100000)
Third line is used to mask columns in case those columns have no data, you cam leave it blank.
Then each line is a i,j and value (need to be normalized) starting with 0,0

David

OK, I'll try it right away.Thank you for your help.

@Guoshuai1314
Copy link
Author

Then I'm sure TADbit is trying to rebuild the full matrix and that's taking forever if your computer has not a lot of memory and speed.
Can you convert your data to the following format?

# CRM chr20	64444167
# chr20:2-102 resolution:50000
# MASKED 
0	0		8
9	0		1
...

So,
First line is the chromosome and its total size
Second line is the region contained in the file. In this case chromosome 20 at 50Kbp from bin 2 to bin 102 (chr20:100000-5100000)
Third line is used to mask columns in case those columns have no data, you cam leave it blank.
Then each line is a i,j and value (need to be normalized) starting with 0,0

David

I have converted my data to the following format and reduced the matrix scope to 2M (target region).

CRM Chr2 2000000

Chr2:0-200 resolution:10000

MASKED

0 0 1135.049
0 1 109.90044
1 1 837.5016
0 2 67.08255
1 2 149.96063
2 2 858.84674
0 3 14.311334
1 3 51.089424
2 3 161.17682
.......

Then, I run it with the following command, but it has been running for 3 days and is not over yet.

tadbit model -w test/both --input_matrix Chr2_2M.abc --noX --optimize --beg 1 --end 2000000 --reso 10000 --maxdist 400:500:100 --upfreq=-0.2:0:0.1 --lowfreq=-0.4:-0.2:0.1 --nmodels 20 --nkeep 20 -j 60 --cpu 60

I noticed in the background that Tadbit was sleeping and the system had free memory and CPU. Why is that?

@david-castillo
Copy link
Contributor

Hi,
I don't see the problem. Can you try to take out this part here:

-j 60 --cpu 60

You don't need any job id.
I'll check in my computer to see if it's some kind of bug with the upgrade of dependencies. You used conda to install it?

Regards

David

@Guoshuai1314
Copy link
Author

Hi,
I don't see the problem. Can you try to take out this part here:

-j 60 --cpu 60

You don't need any job id.
I'll check in my computer to see if it's some kind of bug with the upgrade of dependencies. You used conda to install it?

Regards

David

No, I installed it from source code, like this

cd tadbit-master sudo python setup.py install

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants