Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Captus Extract: missing extractions for some loci in some samples #1

Open
LPDagallier opened this issue Feb 22, 2023 · 3 comments
Open

Comments

@LPDagallier
Copy link

Hi,

Thanks a lot for this great tool ! It is very efficient and easy to use, and the documentation and outputs are super clear !

I faced an issue while running Captus extract: the extraction had not been carried out for some loci in some samples (see screenshot of .html report below). I looked at the Scipio .log for one of this sample and it returns a warning: "Warning: query length mismatch. This will produce unpredictable results!" (see full warning at the end of this post and NUC_scipio_initial_run1.log attached).

Interestingly, I re-run the same Captus extract on the same dataset with the exact same parameters (but on a different node of the cluster I'm working on -the node is assigned automatically by the cluster manager), but the problem didn't came out and it seems to have worked perfectly the second time. The Captus extract .log are the same between the 2 runs, but the Scipio initial.log for the aforementioned sample are different: the one from run1 has the warning message (see Scipio logs attached).

This issue seems to be a Scipio issue more than a Captus issue, but do you have any clue of what's going on there? I also wonder if there is a way to report the warnings and/or errors from the sub-progams (like Scipio) to the Captus .log itself. Here, I notice there was a problem from the .html report because it was easy to notice, but a more subtle sub-program issue would easily be missed if not reported directly in the Captus .log. Moreover, when analyses are run on hundreds of samples, it can be cumbersome to go and check the sub-program logs of each sample.

Thanks a lot again for this wonderful tool !

Full warning error:
"
Processing BLAT hits:
...Warning: query length mismatch. This will produce unpredictable results!
substr outside of string at /apps/captus/0.9.88/lib/python3.11/site-packages/dependencies/scipio-1.4/scipio.1.4.1.pl line 999, line 566.
Use of uninitialized value $aa in string eq at /apps/captus/0.9.88/lib/python3.11/site-packages/dependencies/scipio-1.4/scipio.1.4.1.pl line 1001, line 566.
substr outside of string at /apps/captus/0.9.88/lib/python3.11/site-packages/dependencies/scipio-1.4/scipio.1.4.1.pl line 1052, line 566.
Use of uninitialized value $aa in string eq at /apps/captus/0.9.88/lib/python3.11/site-packages/dependencies/scipio-1.4/scipio.1.4.1.pl line 1054, line 566.
diff_str returns undef: input is
s1: EQQQGGAADEAEPFMGSGRF
s2: PRIIDTGFFSKIPPELYHHILKFLS
count: 20
from1 :0
from2: 211
query:HLJG-5123
NODE_1521_length_466_cov_5.0000_k_175_flag_1:gaacagcaacaaggcggtgcagcggatgaggccgaaccgttcatgggatccggtcgattt
diff_str returned undef!
Use of uninitialized value $diff_str in concatenation (.) or string at /apps/captus/0.9.88/lib/python3.11/site-packages/dependencies/scipio-1.4/scipio.1.4.1.pl line 2039, line 566.
query:HLJG-5123
target:NODE_1521_length_466_cov_5.0000_k_175_flag_1
Incorrect calculation of unmatched aa's in line 566!

No query sequence 'HLJG-5123_[253]' of length 253 found.
Skipped.
No query sequence 'HWUP-5123' of length 256 found.
[...] "

NUC_scipio_initial_run1.log
NUC_scipio_initial_run2.log

Captus extract report for run1:
(note the missing extraction for at least 7 samples)
captus-assembly_extract report_run1

@edgardomortiz
Copy link
Owner

edgardomortiz commented Feb 23, 2023 via email

@LPDagallier
Copy link
Author

Hi Edgardo,

Thanks for your answer. I'm using v0.9.88, I will ask for an update to the latest on the shared cluster and see how it goes. The thing is that it is hard to reproduce this issue as it worked perfectly after a second run.

The RAM limitation could be the explanation, it was running with 32 Gb on 8 threads (--threads 8 --ram 32, but no specification via --concurrent).

I will keep you updated,
Léo-Paul

@edgardomortiz
Copy link
Owner

My guess is that is hard to reproduce because when it failed it was running other samples that were sucking up RAM too. That is why I changed the behavior of Scipio to be parallel and lightweight.

Good luck!

Edgardo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants