-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to speed up / debug deduplicating VCF step? #101
Comments
This is unusual behavior, but I do have a suggestion. There was an issue in the dedupe step that another user pointed out in issue #92, which I fixed in version 1.1.3. Is there a contact person at the Broad Insitute that is in charge of maintaining the svaba docker image? You could reach out to them to update svaba to the current version here on Github, 1.1.3. |
Thank you for the quick reply. I used the biocontainer's docker for 1.1.0 available at https://biocontainers.pro/tools/svaba |
jbalberge we are having the same issue stuck for 100+ hours at this step for a couple of samples. Did you mange to fix the issue?
The SvABA version we have been using is from some time ago. We have successfully processed hundreds of samples with this version but now a couple of samples are just stuck. We could update the version and just run for the problem samples but not sure if that would fix the issue, but then again the cohort would no longer be "harmonised".
The cohorts we've analysed have germlines sequenced at ~30x and tumors from 60x to 120x. The two problem samples have germline at 30x and the Tumor is at ~70x and ~120x. Both have been stuck at the dedupe step for 100+ hours. We have given it 200Gb for the run. |
@ahwanpandey This is one the memory/run weakness of svaba that I've known about but haven't had time to fix. The issue is that svaba compiles all of the variants into an intermediate file, and this file needs to be sorted and de-duped at the end to make the organized VCF. For most runs this is fine, but if the number of suspected variants is high (in your case it is very high), then the memory can run very high as it tries to read in this entire file. The solution is really to just do what Out of curiosity, how large is the *.bps.txt.gz file for this run? That's the file that it is reading into memory. |
Hi @walaj thanks so much for your response. For the two samples that are stuck, the "*.bps.txt.gz " files are 147M and 131M. We have a lot of High Grade Ovarian Cancer WGS data and they indeed have a lot of structural variants. Is there any chance you would be able to fix this issue for us? We would be very grateful. I can even share the files if that would be useful. We have already run svaba for hundreds of samples throughout the years and as you can understand it would be tricky to not be able to run the tool on a couple of samples, and probably more in the future. So again, we would be very grateful if you could have a look at fixing the issue when you get a chance. The other option we are trying is to run the latest version of the tool. Do you think we will have the same problem with it? I'm trying to install the latest version, but as you've noted I think I need to fix what CMAKE is doing. |
If I remember correctly this happened with short inserts; hard-trimming of
adapters and PolyG must have reduced the number of candidates in my case at
that time.
Le mar. 30 janv. 2024 à 20:09, Jeremiah Wala ***@***.***> a
écrit :
… @ahwanpandey <https://github.com/ahwanpandey> This is one the memory/run
weakness of svaba that I've known about but haven't had time to fix. The
issue is that svaba compiles all of the variants into an intermediate file,
and this file needs to be sorted and de-duped at the end to make the
organized VCF. For most runs this is fine, but if the number of suspected
variants is high (in your case it is very high), then the memory can run
very high as it tries to read in this entire file.
The solution is really to just do what samtools sort does and do a
scatter-gather sort, but I haven't been able to implement yet.
Out of curiosity, how large is the *.bps.txt.gz file for this run? That's
the file that it is reading into memory.
—
Reply to this email directly, view it on GitHub
<#101 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEMR3P4Z6PBINPGPK46T74TYRGKTHAVCNFSM46DRMNJ2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJRHAYTQMJSGUYA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hmm OK, I'm concerned given that file size of bps.txt.gz (not so big) that
there is a memory clash happening somewhere that's running up the memory as
a bug. There was a bug that caused some memory clashes randomly on < 5% of
samples, at the dedupe stage, but I fixed it a while ago. I would think
that our best approach here is to have you try with the newly built
version, and you'll just have a few samples that were run with a newer
version. Nothing too substantive has changed, just bug fixes and build
systems, so you wouldn't have to re-run your other samples.
If you're still getting the same memory overrun issues on the latest
version for these samples, I'll have to re-visit the smart sorting. But
with bps files that small, I doubt that this is the issue now.
…On Tue, Jan 30, 2024 at 8:22 PM ahwanpandey ***@***.***> wrote:
Hi @walaj <https://github.com/walaj> thanks so much for your response.
For the two samples that are stuck, the "*.bps.txt.gz " files are 147M and
131M.
We have a lot of High Grade Ovarian Cancer WGS data and they indeed have a
lot of structural variants. Is there any chance you would be able to fix
this issue for us? We would be very grateful. I can even share the files if
that would be useful. We have already run svaba for hundreds of samples
throughout the years and as you can understand it would be tricky to not be
able to run the tool on a couple of samples, and probably more in the
future. So again, we would be very grateful if you could have a look at
fixing the issue when you get a chance.
The other option we are trying is to run the latest version of the tool.
Do you think we will have the same problem with it?
I'm trying to install the latest version, but as you've noted I think I
need to fix what CMAKE is doing.
#132 <#132>
—
Reply to this email directly, view it on GitHub
<#101 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUZ7CCROI5MZLU6TLCSNVLYRGMHFAVCNFSM46DRMNJ2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJRHAYTSMRXHA4Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @walaj I've now tried to re-run with the latest version and still got stuck at the dedupe step for two samples :/. Would it be possible for you to see if you could fix this issue for us? I can share any files you need. We would be very grateful for your time in fixing this bug. Stuck at the following step for two out for hundreds of WGS samples.
The output directory contents so far Latest SVABA VERSION where issue persists
Old version where issue was first observed
|
@walaj is there any chance you could have a look at this issue for us? We would be very grateful for the help. Thanks so much. |
This is fixed in the latest commit ( |
@walaj Thanks for all the help so far. I have now downloaded the latest commit and processed some old samples using the old version (as mentioned in this issue) as well as the latest commit ( fcfa17e ). The results are drastically different in the number of passing somatic SVs. See plot below summarized for each chromosome across two samples (latest commit results in orange bars) I noticed that in the new commit's log file there are lots of messages saying "with limit hit of 0" whereas not so much in the old version. Not sure if this has to do with anything. I also ran the new version with 16 threads instead of 8 in the old version. I'll try to run with 8 threads and see if that fixes anything? Do you have any ideas? Thanks again. OLD VERSION
LATEST COMMIT
|
Thank you for reporting, this is now fixed. Rolling forward BWA 8 years as
part of these latest round of updates introduced some nasty bugs on my
part, and this one ended up being simple to fix once I found it. The latest
svaba (and latest SeqLib it points to) should address this.
…On Wed, May 8, 2024 at 1:02 AM ahwanpandey ***@***.***> wrote:
@walaj <https://github.com/walaj> Thanks for all the help so far.
I have now downloaded the latest commit and processed some old samples
using the old version (as mentioned in this issue
<#101 (comment)>) as
well as the latest commit ( fcfa17e
<fcfa17e>
). The results are drastically different in the number of passing somatic
SVs. See plot below summarized for each chromosome across two samples
(latest commit results in orange bars)
image.png (view on web)
<https://github.com/walaj/svaba/assets/8450532/23021ccb-b958-4286-8f64-3a0fad950bb2>
I noticed that in the new commit's log file there are lots messages saying
"with limit hit of 0" whereas not so much in the old version. Not sure if
this has to do with anything. I also ran the new version with 16 threads
instead of 8 in the old version. I'll try to run with 8 threads and see if
that fixes anything? Do you have any ideas? Thanks again.
OLD VERSION
]$ cat AN_T_66639_2100027_14_N_66639_GL.log | grep "with limit hit of" | head -n 40
writing contigs etc on thread 140475294115584 with limit hit of 796
writing contigs etc on thread 140475302508288 with limit hit of 474
writing contigs etc on thread 140475260544768 with limit hit of 1353
writing contigs etc on thread 140475285722880 with limit hit of 2536
writing contigs etc on thread 140475277330176 with limit hit of 2743
writing contigs etc on thread 140469615314688 with limit hit of 3811
writing contigs etc on thread 140475294115584 with limit hit of 336
writing contigs etc on thread 140475268937472 with limit hit of 1780
writing contigs etc on thread 140475302508288 with limit hit of 307
writing contigs etc on thread 140475310900992 with limit hit of 1795
writing contigs etc on thread 140475285722880 with limit hit of 552
writing contigs etc on thread 140475277330176 with limit hit of 916
writing contigs etc on thread 140475302508288 with limit hit of 574
writing contigs etc on thread 140475310900992 with limit hit of 437
writing contigs etc on thread 140475260544768 with limit hit of 1059
writing contigs etc on thread 140475285722880 with limit hit of 1293
writing contigs etc on thread 140475268937472 with limit hit of 2951
writing contigs etc on thread 140475294115584 with limit hit of 4241
writing contigs etc on thread 140469615314688 with limit hit of 5049
writing contigs etc on thread 140475302508288 with limit hit of 8076
writing contigs etc on thread 140475310900992 with limit hit of 4492
writing contigs etc on thread 140475277330176 with limit hit of 5499
writing contigs etc on thread 140475294115584 with limit hit of 6412
writing contigs etc on thread 140475268937472 with limit hit of 5956
writing contigs etc on thread 140475285722880 with limit hit of 16232
writing contigs etc on thread 140475260544768 with limit hit of 15423
writing contigs etc on thread 140469615314688 with limit hit of 7244
writing contigs etc on thread 140475302508288 with limit hit of 6837
writing contigs etc on thread 140475310900992 with limit hit of 8440
writing contigs etc on thread 140475268937472 with limit hit of 8838
writing contigs etc on thread 140475260544768 with limit hit of 7990
writing contigs etc on thread 140475260544768 with limit hit of 7990
writing contigs etc on thread 140475260544768 with limit hit of 7990
writing contigs etc on thread 140475260544768 with limit hit of 7990
writing contigs etc on thread 140475294115584 with limit hit of 13428
writing contigs etc on thread 140475285722880 with limit hit of 8048
writing contigs etc on thread 140475277330176 with limit hit of 11336
writing contigs etc on thread 140469615314688 with limit hit of 7874
writing contigs etc on thread 140475310900992 with limit hit of 8119
writing contigs etc on thread 140475302508288 with limit hit of 8213
LATEST COMMIT
]$ cat AN_T_66639_2100027_14_N_66639_GL.log | grep "with limit hit of" | head -n 40
writing contigs etc on thread 139868850484992 with limit hit of 0
writing contigs etc on thread 139868858877696 with limit hit of 0
writing contigs etc on thread 139868884055808 with limit hit of 0
writing contigs etc on thread 139868833699584 with limit hit of 0
writing contigs etc on thread 139868825306880 with limit hit of 0
writing contigs etc on thread 139874457597696 with limit hit of 0
writing contigs etc on thread 139874564609792 with limit hit of 0
writing contigs etc on thread 139874581395200 with limit hit of 0
writing contigs etc on thread 139874491168512 with limit hit of 0
writing contigs etc on thread 139874465990400 with limit hit of 0
writing contigs etc on thread 139874573002496 with limit hit of 0
writing contigs etc on thread 139868867270400 with limit hit of 0
writing contigs etc on thread 139868858877696 with limit hit of 0
writing contigs etc on thread 139868875663104 with limit hit of 0
writing contigs etc on thread 139874482775808 with limit hit of 0
writing contigs etc on thread 139868842092288 with limit hit of 0
writing contigs etc on thread 139868833699584 with limit hit of 0
writing contigs etc on thread 139868884055808 with limit hit of 0
writing contigs etc on thread 139874474383104 with limit hit of 0
writing contigs etc on thread 139874564609792 with limit hit of 0
writing contigs etc on thread 139868825306880 with limit hit of 0
writing contigs etc on thread 139874457597696 with limit hit of 0
writing contigs etc on thread 139874491168512 with limit hit of 0
writing contigs etc on thread 139874581395200 with limit hit of 0
writing contigs etc on thread 139868850484992 with limit hit of 0
writing contigs etc on thread 139874573002496 with limit hit of 0
writing contigs etc on thread 139874465990400 with limit hit of 0
writing contigs etc on thread 139868884055808 with limit hit of 0
writing contigs etc on thread 139868858877696 with limit hit of 0
writing contigs etc on thread 139868867270400 with limit hit of 0
writing contigs etc on thread 139868825306880 with limit hit of 0
writing contigs etc on thread 139868842092288 with limit hit of 0
writing contigs etc on thread 139874482775808 with limit hit of 0
writing contigs etc on thread 139874474383104 with limit hit of 0
writing contigs etc on thread 139868875663104 with limit hit of 0
writing contigs etc on thread 139868833699584 with limit hit of 0
writing contigs etc on thread 139874457597696 with limit hit of 0
writing contigs etc on thread 139874491168512 with limit hit of 0
writing contigs etc on thread 139868850484992 with limit hit of 0
writing contigs etc on thread 139874564609792 with limit hit of 0
—
Reply to this email directly, view it on GitHub
<#101 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUZ7CAL6RQCHCVDCU5VJ6DZBGWXXAVCNFSM46DRMNJ2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBZHE3TINJYGEZQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Running svaba on Terra/Firecloud, we have troubles at this step of
svaba run
for >30X Tumor/Normal WGS (from the logs)Actual run time is a couple of hours for variant calling, then the logs get stuck at dedupe steps for 100+ hours and counting. Have you seen that before? Is there anything we can do to debug this situation?
We tried with up to 128GB, 16CPU and 1000GB HDD VMs with svaba 1.1.0 quay docker
Thanks for your help!
The text was updated successfully, but these errors were encountered: