Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

funannotate update, kallisto error #1025

Open
schraderL opened this issue Apr 10, 2024 · 1 comment
Open

funannotate update, kallisto error #1025

schraderL opened this issue Apr 10, 2024 · 1 comment

Comments

@schraderL
Copy link

schraderL commented Apr 10, 2024

Hi,
I am running into an error with funannotate update, trying to add UTRs to existing CDS annotations. It looks like it is an issue with Kallisto finding duplicate fasta header names in update_misc/getBestModel/transcripts.fa. Any idea how to fix this?

Are you using the latest release?

I am using the docker container for funannotate 1.8.17 in singularity with

singularity pull funannotate.sif docker://nextgenusfs/funannotate
singularity shell --bind ~/projects/UTRs:/pasaUTRs,~/fq/A006200344_207613_S1/:/fq/ funannotate.sif

Describe the bug
Running funannotate update to update UTRs in a gff3 not produced by funannotate, I get the following error and the process dies:

-------------------------------------------------------
[Apr 10 09:11 AM]: OS: Debian GNU/Linux 10, 36 cores, ~ 98 GB RAM. Python: 3.8.12
[Apr 10 09:11 AM]: Running 1.8.17
[Apr 10 09:11 AM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt'
[Apr 10 09:11 AM]: Previous annotation consists of: 0 protein coding gene models and 0 non-coding gene models
[Apr 10 09:11 AM]: Existing annotation: locustag=COBS genenumber=20966
[Apr 10 09:11 AM]: Trimmomatic will be skipped
[Apr 10 09:11 AM]: Existing Trinity results found: CARD-0001/update_misc/trinity.fasta
[Apr 10 09:11 AM]: Existing BAM alignments found: CARD-0001/update_misc/trinity.alignments.bam, CARD-0001/update_misc/transcript.alignments.bam
[Apr 10 09:11 AM]: Using Kallisto TPM data to determine which PASA gene models to select at each locus
[Apr 10 09:11 AM]: Building Kallisto index
[Apr 10 09:11 AM]: CMD ERROR: kallisto index -i CARD-0001/update_misc/getBestModel/bestModel CARD-0001/update_misc/getBestModel/transcripts.fa
[Apr 10 09:11 AM]:
[build] loading fasta file CARD-0001/update_misc/getBestModel/transcripts.fa
[build] k-mer length: 31
Error: repeated name in FASTA file CARD-0001/update_misc/getBestModel/transcripts.fa
novel_model_3648_661583a6

Run with --make-unique to replace repeated names with unique names

In CARD-0001/update_misc/getBestModel/transcripts.fa there is one duplicate entry:
Running

cat CARD-0001/update_misc/getBestModel/transcripts.fa|grep ">" |sort|cut -f 1 -d " "|uniq -d

returns:

>novel_model_3648_661583a6

And

grep ">novel_model_3648_661583a6" CARD-0001/update_misc/getBestModel/transcripts.fa

returns

>novel_model_3648_661583a6 novel_gene_1684_661583a6 ** NO NAME ASSIGNED ** LG13:1345282-1352116(-)
>novel_model_3648_661583a6 novel_gene_1683_661583a6 ** NO NAME ASSIGNED ** LG6:5920601-5922436(-)

What command did you issue?

funannotate update -f CARD-0001.fa -g CARD-0001.gff3 --species CARD-0001 --out CARD-0001 -l /fq/CARD-0001-val_1.fq.gz -r /fq/CARD-0001-val_2.fq.gz --no_trimmomatic  --cpus 36

Logfiles

OS/Install Information

  • output of funannotate check --show-versions
Checking dependencies for 1.8.17

You are running Python v 3.8.12. Now checking python packages...
biopython: 1.79
goatools: 1.3.11
matplotlib: 3.7.0
natsort: 8.4.0
numpy: 1.23.0
pandas: 2.0.3
psutil: 5.9.1
requests: 2.31.0
scikit-learn: 0.24.2
scipy: 1.5.3
seaborn: 0.13.0
All 11 python packages installed


You are running Perl v b'5.026002'. Now checking perl modules...
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
local::lib: 2.000024
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed


Checking Environmental Variables...
$FUNANNOTATE_DB=/opt/databases
$PASAHOME=/venv/opt/pasa-2.4.1
$TRINITYHOME=/venv/opt/trinity-2.8.5
$EVM_HOME=/venv/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config
        ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir
-------------------------------------------------------
Checking external dependencies...
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.2
bamtools: bamtools 2.5.2
bedtools: bedtools v2.31.1
blat: BLAT v35
diamond: 2.1.8
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: 36.3.8g
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.4 (Aug 2023)
hmmsearch: HMMER 3.4 (Aug 2023)
java: 11.0.8-internal
kallisto: 0.46.1
mafft: v7.520 (2023/Mar/22)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.26-r1175
pigz: 2.8
proteinortho: 6.0.16
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.12
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 49
tbl2asn: 25.8
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
        ERROR: emapper.py not installed
        ERROR: gmes_petap.pl not installed
        ERROR: signalp not installed

@schraderL
Copy link
Author

A quick addition:
I can run kallisto index and kallisto quant just fine manually, when adding the --make-unique option to the indexing:

 kallisto index -i CARD-0001/update_misc/getBestModel/bestModel CARD-0001/update_misc/getBestModel/transcripts.fa --make-unique 
kallisto quant -i CARD-0001/update_misc/getBestModel/bestModel -o CARD-0001/update_misc/kallisto --plaintext -t 36 /fq/A006200344_207613_S1_L000_R1_001_val_1.fq.gz /fq/CARD-0001-val_1.fq.gz -r /fq/CARD-0001-val_2.fq.gz

But I can't pick up the funannotate update command after that again, as it complains that the getBestModel folder already exists:

FileExistsError: [Errno 17] File exists: 'CARD-0001/update_misc/getBestModel'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant