Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using MitoZ for annotating mitogenome generated by other assemblers #4

Closed
orient100 opened this issue Apr 1, 2019 · 6 comments
Closed
Labels
good first issue Good for newcomers

Comments

@orient100
Copy link

orient100 commented Apr 1, 2019

Hi,
I've got a mitochondrial assembly file through Novoplasty, and if you continue to use the Mitoz software to annotate and visualize, what can I replace with "" in the input file? I know the '' in the fasta files indicates that the nucleotide before is a possible deletion/insertion. 
thanks

@orient100
Copy link
Author

引号里面内容可能不能显示出来,它是中文中的星号。

@linzhi2013
Copy link
Owner

linzhi2013 commented Apr 1, 2019

Hi,
I've got a mitochondrial assembly file through Novoplasty, and if you continue to use the Mitoz software to annotate and visualize, what can I replace with "" in the input file? I know the '' in the fasta files indicates that the nucleotide before is a possible deletion/insertion. 
thanks

Dear orient100 ,

Thank you very much for using MitoZ and pointing out the problem!

Indeed, we found that mitogenome sequence assembled by NOVOPlasty sometimes contains non-IUPAC (https://www.bioinformatics.org/sms/iupac.html) characters, which will make MitoZ stop during annotate, and output an error message like this:

Traceback (most recent call last):
  File "/app/release_MitoZ_v2.3/bin/annotate/revise_CDS_pos_v6.py", line 242, in <module>
    pro_seq_start = sub_start_seq.translate(table=table, cds=True, to_stop=False)
  File "/app/anaconda/lib/python3.6/site-packages/Bio/Seq.py", line 1038, in translate
    cds, gap=gap)
  File "/app/anaconda/lib/python3.6/site-packages/Bio/Seq.py", line 2080, in _translate_str
    "First codon '{0}' is not a start codon".format(sequence[:3]))
Bio.Data.CodonTable.TranslationError: First codon 'T*C' is not a start codon

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/anaconda/lib/python3.6/site-packages/Bio/Data/CodonTable.py", line 341, in __getitem__
    self.ambiguous_nucleotide)
  File "/app/anaconda/lib/python3.6/site-packages/Bio/Data/CodonTable.py", line 194, in list_possible_proteins
    x2 = ambiguous_nucleotide_values[c2]
KeyError: '*'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/anaconda/lib/python3.6/site-packages/Bio/Seq.py", line 2107, in _translate_str
    amino_acids.append(forward_table[codon])
  File "/app/anaconda/lib/python3.6/site-packages/Bio/Data/CodonTable.py", line 344, in __getitem__
    raise KeyError(codon)  # stop codon
KeyError: 'T*C'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/release_MitoZ_v2.3/bin/annotate/revise_CDS_pos_v6.py", line 244, in <module>
    pro_seq_start = sub_start_seq.translate(table=table, cds=False, to_stop=False)
  File "/app/anaconda/lib/python3.6/site-packages/Bio/Seq.py", line 1038, in translate
    cds, gap=gap)
  File "/app/anaconda/lib/python3.6/site-packages/Bio/Seq.py", line 2124, in _translate_str
    "Codon '{0}' is invalid".format(codon))
Bio.Data.CodonTable.TranslationError: Codon 'T*C' is invalid
Error occured when running command:
/app/anaconda/bin/python3 /app/release_MitoZ_v2.3/bin/annotate/revise_CDS_pos_v6.py test_mitoscaf.fa test.cds.position.sorted 2 test.cds.position.sorted.revised

The crash is caused by Bio.Data.CodonTable.TranslationError: Codon 'T*C' is invalid, where * is not an IUPAC nucleotide code, which is not allowed by MitoZ (internally, not allowed by BioPython).

Thus, before using MitoZ for annotating mitogenome generated by other assemblers, please make sure it does not contain non-IUPAC nucleotide code. For example, replace them with Ns.

In your case, in which the mitogenome was assembled by NOVOPlasty, the * has following meanings (see https://github.com/ndierckx/NOVOPlasty for more details):

A '*' in the fasta output files indicates that the nucleotide before is a possible deletion/insertion. This can occur when the exact length of single nucleotide repeat can't be determined exactly due to systemic Illumina sequencing errors or within repetitive regions. Since this sign can interfere with post processing algorithms it is best resolve them manually or to delete them.

I would recommend that you resolve or delete the * manually preceding MitoZ. For example, you can map the reads to the mitogenome with the BWA program, then use the samtools mpileup command to determine the exact base(s) the * represents. Or simply you do not want to do this, you might just replace the * with N, the go to MitoZ directly.

Hope the explanation can help you!

Cheers,
Guanliang

@linzhi2013 linzhi2013 pinned this issue Apr 1, 2019
@linzhi2013 linzhi2013 changed the title about input file using MitoZ for annotating mitogenome generated by other assemblers Apr 1, 2019
@linzhi2013 linzhi2013 added the good first issue Good for newcomers label Apr 1, 2019
@linzhi2013
Copy link
Owner

please feel free to open new question when necessary.

@LorenaDerezanin
Copy link

Hey there,
I got a similar issue while running annotate command on a mitogenome generated using NOVOplasty. I've made sure the sequence doesn't contain any non-IUPAC codes and that the seqid is short >eira topology=circular. It's creating the .tmp dir with intermediary annot. files, properly identifying Martes martes as the most related species, but it doesn't create the .result dir with the genbank file. I'm using release_MitoZ_v2.4-alpha. I ran it couple of times, tried switching topology to linear, ran it on the published domestic ferret mitogenome (KT693383.1) but it keeps braking at the same stage:

Traceback (most recent call last):
  File "/home/derezanin/my_software/release_MitoZ_v2.4-alpha/bin/annotate/gene_feature.py", line 48, in read_fastaLike2
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/derezanin/my_software/release_MitoZ_v2.4-alpha/bin/annotate/find_contro_region.py", line 364, in <module>
    main()
  File "/home/derezanin/my_software/release_MitoZ_v2.4-alpha/bin/annotate/find_contro_region.py", line 254, in main
    all_seqid_geneCoor = read_featureTable(fea_files=args.fea_files)
  File "/home/derezanin/my_software/release_MitoZ_v2.4-alpha/bin/annotate/find_contro_region.py", line 109, in read_featureTable
    seqid_geneCoor = get_gene_coor(fea_f)
  File "/home/derezanin/my_software/release_MitoZ_v2.4-alpha/bin/annotate/gene_feature.py", line 62, in get_gene_coor
    for rec in records:
RuntimeError: generator raised StopIteration
Error occured when running command:
/home/derezanin/miniconda3/envs/mitozEnv/bin/python3 /home/derezanin/my_software/release_MitoZ_v2.4-alpha/bin/annotate/find_contro_region.py -fa_file /home/derezanin/NO_BACKUP/eira_barbara/mitogenome/mito_assembled/annotated_tayra_mito2/eira_barbara_mitogenome.fasta -PCG_cutoff_file /home/derezanin/my_software/release_MitoZ_v2.4-alpha/bin/profiles/CDS_HMM/Chordata_CDS_length_list -PCG_len_ratio 0.9  -s_rRNA_CM_file /home/derezanin/my_software/release_MitoZ_v2.4-alpha/bin/profiles/rRNA_CM/v1.1_12snew.cm -l_rRNA_CM_file /home/derezanin/my_software/release_MitoZ_v2.4-alpha/bin/profiles/rRNA_CM/v1.1_16snew.cm -rRNA_len_ratio 0.9  -tRNA_num_min 22  -fea_files mitoz_ann5_mitoscaf.fa.cds.ft /data/scratch/derezanin/eira_barbara/mitogenome/MitoZ_run/mitoz_ann5.tmp/mitoz_ann5.annotation/mitoz_ann5_mitoscaf.fa.trna.ft mitoz_ann5_mitoscaf.fa.s-rRNA.ft mitoz_ann5_mitoscaf.fa.l-rRNA.ft -CR_len_min 600  -outfile mitoz_ann5_mitoscaf.fa.control_region.ft

What might be the issue here?

Here are the mitoscaf.fa.tbl from tmp/annotation dirs for both target species as .txt files.
mtz_df_mitoscaf.fa.txt
mtz_eira_mitoscaf.fa.txt

Thank you in advance.

@linzhi2013
Copy link
Owner

Dear LorenaDerezanin,

It seemed to be the Python's problem, see https://stackoverflow.com/questions/51700960/runtimeerror-generator-raised-stopiteration-every-time-i-try-to-run-app

I annotate the KT693383.1 with following method

$ docker pull guanliangmeng/mitoz:2.3
$ docker run -v $PWD:/project --rm guanliangmeng/mitoz:2.3 /app/release_MitoZ_v2.3/MitoZ.py annotate  --genetic_code 2 --clade Chordata --outprefix test --thread_number 2 --fastafile KT693383.1.fa

It ran well and the result is
test.result.zip

Cheers

@LorenaDerezanin
Copy link

Thank you for a quick reply, much appreciated :)
So the problem mentioned in the Stack Overflow issue might be also valid for the Python v.3.8. in the conda env I've created for MitoZ v.2.4.
So after conda cleanup, I created the new conda env for MitoZ v.2.3. release from yaml file with list of pkgs mentioned in issue #47 including python 3.6., and ran annotate on domestic ferret (KT693383.1) genome to recreate your results, and once again on my target species. Everything worked out well.

Thank you once again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants