Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pairwise_distances fails for some sequence names #17

Open
tyleraland opened this issue Jan 6, 2015 · 0 comments
Open

pairwise_distances fails for some sequence names #17

tyleraland opened this issue Jan 6, 2015 · 0 comments
Labels

Comments

@tyleraland
Copy link

I run the command:
deenurp pairwise-distances -a cmalign test.fasta out.csv

If my test.fasta looks like:

M03029:20:000000000-D0BH9:1:1101:15010:1394:3
ATTGAACGCTGGCGGCAGGT
M03029:20:000000000-D0BH9:1:1101:16823:1639:3
ATTGAACGCTGGCGGCAGGC

I get the error:
Traceback (most recent call last):
File "/home/molmicro/working/tland9/2014-12-10_lumpyplots/local-env/bin/deenurp", line 9, in
load_entry_point('deenurp==0.0.4', 'console_scripts', 'deenurp')()
File "/home/molmicro/working/tland9/2014-12-10_lumpyplots/src/deenurp/deenurp/scripts/deenurp.py", line 25, in main
return action(arguments)
File "/home/molmicro/working/tland9/2014-12-10_lumpyplots/src/deenurp/deenurp/subcommands/pairwise_distances.py", line 36, in action
taxa, distmat = filter_outliers.distmat_cmalign(args.seqs, pfx, cpu=args.threads)
File "/home/molmicro/working/tland9/2014-12-10_lumpyplots/src/deenurp/deenurp/subcommands/filter_outliers.py", line 122, in distmat_cmalign
taxa, distmat = outliers.fasttree_dists(a_fasta.name)
File "/home/molmicro/working/tland9/2014-12-10_lumpyplots/src/deenurp/deenurp/outliers.py", line 50, in fasttree_dists
taxa, distmat = read_dists(stdout)
File "/home/molmicro/working/tland9/2014-12-10_lumpyplots/src/deenurp/deenurp/outliers.py", line 21, in read_dists
N = int(fobj.readline())
ValueError: invalid literal for int() with base 10: ''

From what I can tell it's FastTree that refuses to work with these sequence names. FWIW these names are produced by usearch -cluster_fast. For now I'll just rename the sequences to an integer to produce a distance matrix and then map them back to their names.

@tyleraland tyleraland added the bug label Jan 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant