You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ERROR ~ Error executing process > 'NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TX2GENE (genome.filtered.gtf)'
Caused by:
Missing output file(s) `*.tsv` expected by process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TX2GENE (genome.filtered.gtf)`
Command executed:
tx2gene.py \
--quant_type salmon \
--gtf genome.filtered.gtf \
--quants quants \
--id gene_id \
--extra gene_name \
-o tx2gene.tsv
cat <<-END_VERSIONS > versions.yml
"NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TX2GENE":
python: $(python --version | sed 's/Python //g')
END_VERSIONS
Command exit status:
0
Command output:
(empty)
Command error:
__main__ - 2024-01-30 16:37:17,060 WARNING: No attribute in GTF matching transcripts
__main__ - 2024-01-30 16:37:17,060 ERROR: Failed to map transcripts to genes.
Work dir:
/data/Proyectos/NGS_pipeline/work/e2/6ad3cf2cec77f4aeb2434722052450
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
-- Check '.nextflow.log' file for details
At first I thought that something might be wrong with my gtf of fasta files. However, when I run the "same" command using STAR + salmon I don't get any error:
BTW, just in case, genome gtf and fasta files are downloaded according to nf-core/rnaseq guidelines, and STAR/salmon/kalisto indexes are built as in the code from the respective .nf files.
Therefore, when running kallisto or salmon, the abundace file in quants folder has the quantification of the transcripts as per the fasta file; so when loading the transcripts by tx2gene.py, the following section of the discover_transcript_ºattribute() fails:
with open(gtf_file) as inh:
# Read GTF file, skipping header lines
for line in filter(lambda x: not x.startswith("#"), inh):
cols = line.split("\t")
# Use regular expression to correctly split the attributes string
attributes_str = cols[8]
attributes = dict(re.findall(r'(\S+) "(.*?)(?<!\\)";', attributes_str))
votes.update(key for key, value in attributes.items() if value in transcripts)
Because no value of the gtf follows the structure "ENSTXXXXXXX.Y".
So far, I've patched this problem by creating a new attribute in the gtf file that combines the transcript id and version.
If this problem can be replicated, I think it would be a good idea to make a more lenient discover_transcript_ºattribute() function that allows for transcript ids with or without version.
Description of the bug
I'm running nf-core/rnaseq with the following command:
When I run it I get the following error:
At first I thought that something might be wrong with my gtf of fasta files. However, when I run the "same" command using STAR + salmon I don't get any error:
So I don't really know where it fails.
Command used and terminal output
No response
Relevant files
nexflow.log
System information
Nextflow version: 23.10.0
Hardware: Desktop
Executor: local
Container engine: docker
OS: Linux
Version of nf-core/rnaseq: 3.14.0
The text was updated successfully, but these errors were encountered: