Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AlleleCall issue - Determining BLASTp self-score for each representative... #195

Open
daraneda96 opened this issue Mar 6, 2024 · 4 comments
Assignees
Labels
Status: In Progress Has been assigned and is being worked on.

Comments

@daraneda96
Copy link

Hi everyone,
I am having trouble executing the AlleleCall command. Specifically, I am running the command:

chewBBACA.py AlleleCall -i /home/daniel.araneda/analisis_vibrios/genomas_mlst -g /home/daniel.araneda/analisis_vibrios/mlst_ok/mlst_schema/schema_vibrio -o /home/daniel.araneda/analisis_vibrios/mlst_ok/allelecall --cpu 10

And I get the following output in the .out file:
"chewBBACA version: 3.3.3
Authors: Rafael Mamede, Pedro Cerqueira, Mickael Silva, João Carriço, Mário Ramirez
Github: https://github.com/B-UMMI/chewBBACA
Documentation: https://chewbbaca.readthedocs.io/en/latest/index.html
Contacts: imm-bioinfo@medicina.ulisboa.pt

==========================
chewBBACA - AlleleCall

Started at: 2024-03-06T01:17:41

Configuration values

Minimum sequence length: 0
Size threshold: 0.2
Translation table: 11
BLAST Score Ratio: 0.6
Word size: 5
Window size: 5
Clustering similarity: 0.2
Prodigal training file: /home/daniel.araneda/analisis_vibrios/mlst_ok/mlst_schema/schema_vibrio/vibrio_trainingfile.trn
CPU cores: 10
BLAST path: /home/daniel.araneda/miniconda3/envs/chewie/bin
CDS input: False
Prodigal mode: single
Mode: 4
Number of inputs: 104
Number of loci: 60797
Intermediate files will be stored in /home/daniel.araneda/analisis_vibrios/mlst_ok/allelecall/temp

Pre-computed data

Loci allele size mode values stored in /home/daniel.araneda/analisis_vibrios/mlst_ok/mlst_schema/schema_vibrio/loci_modes
Hash tables stored in /home/daniel.araneda/analisis_vibrios/mlst_ok/mlst_schema/schema_vibrio/pre_computed

CDS prediction

Predicting CDSs for 104 inputs...
[====================] 100%
Extracted a total of 460028 CDSs from 104 inputs.

CDS deduplication

Identifying distinct CDSs...
Identified 403038 distinct CDSs.

CDS exact matching

Searching for CDS exact matches...
Found 68364 exact matches (60797 distinct schema alleles).
Unclassified CDSs: 342241

CDS translation

Translating 342241 CDSs...
[====================] 100%
204 CDSs could not be translated.
Unclassified CDSs: 342037

Protein deduplication

Identifying distinct proteins...
Identified 302610 distinct proteins.

Protein exact matching

Searching for Protein exact matches...
Found 1301 exact matches (2264 distinct CDSs, 2592 total CDSs).
Unclassified proteins: 301309

Protein clustering

Translating schema representative alleles...
Determining BLASTp self-score for each representative..."

And I get the following output in the .err file:
"[66, 76, 65, 83, 84, 32, 68, 97, 116, 97, 98, 97, 115, 101, 32, 101, 114, 114, 111, 114, 58, 32, 78, 111, 32, 97, 108, 105, 97, 115, 32, 111, 114, 32, 105, 110, 100, 101, 120, 32, 102, 105, 108, 101, 32, 102, 111, 117, 110, 100, 32, 102, 111, 114, 32, 112, 114, 111, 116, 101, 105, 110, 32, 100, 97, 116, 97, 98, 97, 115, 101, 32, 91, 47, 104, 111, 109, 101, 47, 100, 97, 110, 105, 101, 108, 46, 97, 114, 97, 110, 101, 100, 97, 47, 97, 110, 97, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 47, 97, 108, 108, 101, 108, 101, 99, 97, 108, 108, 47, 116, 101, 109, 112, 47, 51, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 47, 115, 101, 108, 102, 95, 115, 99, 111, 114, 101, 115, 47, 66, 76, 65, 83, 84, 112, 95, 100, 98, 47, 108, 111, 99, 105, 95, 116, 111, 95, 99, 97, 108, 108, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 93, 32, ............, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 47, 97, 108, 108, 101, 108, 101, 99, 97, 108, 108, 47, 116, 101, 109, 112, 47, 51, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 47, 115, 101, 108, 102, 95, 115, 99, 111, 114, 101, 115, 47, 66, 76, 65, 83, 84, 112, 95, 100, 98, 47, 108, 111, 99, 105, 95, 116, 111, 95, 99, 97, 108, 108, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 93, 32, 105, 110, 32, 115, 101, 97, 114, 99, 104, 32, 112, 97, 116, 104, 32, 91, 47, 104, 111, 109, 101, 47, 100, 97, 110, 105, 101, 108, 46, 97, 114, 97, 110, 101, 100, 97, 47, 97, 110, 97, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 58, 58, 93, 10, 66, 76, 65, 83, 84, 32, 68, 97, 116, 97, 98, 97, 115, 101, 32, 101, 114, 114, 111, 114, 58, 32, 78, 111, 32, 97, 108, 105, 97, 115, 32, 111, 114, 32, 105, 110, 100, 101, 120, 32, 102, 105, 108, 101, 32, 102, 111, 117, 110, 100, 32, 102, 111, 114, 32, 112, 114, 111, 116, 101, 105, 110, 32, 100, 97, 116, 97, 98, 97, 115, 101, 32, 91, 47, 104, 111, 109, 101, 47, 100, 97, 110, 105, 101, 108, 46, 97, 114, 97, 110, 101, 100, 97, 47, 97, 110, 97, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 47, 97, 108, 108, 101, 108, 101, 99, 97, 108, 108, 47, 116, 101, 109, 112, 47, 51, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 47, 115, 101, 108, 102, 95, 115, 99, 111, 114, 101, 115, 47, 66, 76, 65, 83, 84, 112, 95, 100, 98, 47, 108, 111, 99, 105, 95, 116, 111, 95, 99, 97, 108, 108, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 93, 32, 105, 110, 32, 115, 101, 97, 114, 99, 104, 32, 112, 97, 116, 104, 32, 91, 47, 104, 111, 109, 101, 47, 100, 97, 110, 105, 101, 108, 46, 97, 114, 97, 110, 101, 100, 97, 47, 97, 110, 97, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 58, 58, 93, 10]

What could be happening?
Sorry if there's another issue explaining this. I looked to see if anyone else asked about it but couldn't find anything.

Greetings and thank you very much in advance.

Daniel

@rfm-targa rfm-targa self-assigned this Mar 6, 2024
@rfm-targa rfm-targa added the Status: In Progress Has been assigned and is being worked on. label Mar 6, 2024
@rfm-targa
Copy link
Contributor

Greetings @daraneda96,

Sorry for the delayed response. It looks like the error occurs after running BLASTp to determine the self-score for the schema loci representatives (the sequences inside the FASTA files in the short directory). I think it exits when it detects that some BLASTp processes failed to run. This might be happening if the sequence header size exceeds 50 characters. Can you please verify if any sequence headers in the FASTA files in the short directory have more than 50 characters? The sequence header size detected by BLAST is based on everything up to the first blank space. You can also know this if any locus in your schema has an identifier longer than 50 characters. The loci identifiers are based on the unique identifiers determined for the input genomes during schema creation (performed by the CreateSchema module). If any input genomes used to create the schema had a unique identifier, everything in the basename up to the first ., longer than 50 characters, it might lead to errors with BLAST.

Kind regards,

Rafael

@daraneda96
Copy link
Author

Greetings Rafael,
I checked the sequence header with the "grep -E '^>.{50,}' *.fasta" command and i didn'y find anything. The largest sequence header has 37 characters.
I could run Allelecall once, but it didn't work for me again. However, the file name was the same as it is now.
Sorry for the vague information.

Kind regards,

Daniel

@rfm-targa
Copy link
Contributor

Greetings @daraneda96,

Can you share some data to reproduce the issue? The data can include the schema and a set of genomes or just a minimal test case with part of the schema and a genome that allows us to get the same error and pinpoint the cause.

Kind regards,

Rafael

@rfm-targa
Copy link
Contributor

Greetings @daraneda96,

We have updated chewBBACA to v3.3.4. This version includes some bug fixes. While the bug fixes do not target the issue you reported, it might be worth retrying with the new version.

Kind regards,

Rafael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: In Progress Has been assigned and is being worked on.
Projects
None yet
Development

No branches or pull requests

2 participants