pairwise_distances fails for some sequence names #17

tyleraland · 2015-01-06T20:57:23Z

I run the command:
deenurp pairwise-distances -a cmalign test.fasta out.csv

If my test.fasta looks like:

M03029:20:000000000-D0BH9:1:1101:15010:1394:3
ATTGAACGCTGGCGGCAGGT
M03029:20:000000000-D0BH9:1:1101:16823:1639:3
ATTGAACGCTGGCGGCAGGC

I get the error:
Traceback (most recent call last):
File "/home/molmicro/working/tland9/2014-12-10_lumpyplots/local-env/bin/deenurp", line 9, in
load_entry_point('deenurp==0.0.4', 'console_scripts', 'deenurp')()
File "/home/molmicro/working/tland9/2014-12-10_lumpyplots/src/deenurp/deenurp/scripts/deenurp.py", line 25, in main
return action(arguments)
File "/home/molmicro/working/tland9/2014-12-10_lumpyplots/src/deenurp/deenurp/subcommands/pairwise_distances.py", line 36, in action
taxa, distmat = filter_outliers.distmat_cmalign(args.seqs, pfx, cpu=args.threads)
File "/home/molmicro/working/tland9/2014-12-10_lumpyplots/src/deenurp/deenurp/subcommands/filter_outliers.py", line 122, in distmat_cmalign
taxa, distmat = outliers.fasttree_dists(a_fasta.name)
File "/home/molmicro/working/tland9/2014-12-10_lumpyplots/src/deenurp/deenurp/outliers.py", line 50, in fasttree_dists
taxa, distmat = read_dists(stdout)
File "/home/molmicro/working/tland9/2014-12-10_lumpyplots/src/deenurp/deenurp/outliers.py", line 21, in read_dists
N = int(fobj.readline())
ValueError: invalid literal for int() with base 10: ''

From what I can tell it's FastTree that refuses to work with these sequence names. FWIW these names are produced by usearch -cluster_fast. For now I'll just rename the sequences to an integer to produce a distance matrix and then map them back to their names.

tyleraland added the bug label Jan 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pairwise_distances fails for some sequence names #17

pairwise_distances fails for some sequence names #17

tyleraland commented Jan 6, 2015

pairwise_distances fails for some sequence names #17

pairwise_distances fails for some sequence names #17

Comments

tyleraland commented Jan 6, 2015