Offending sequences #37

kjkjindal · 2020-10-21T02:57:56Z

Hi, I am trying to run starcode sphere clustering on a set of sequences. These sequences contain certain (non-DNA) prefixes that I need to retain. I notice that starcode aborts when it encounters non-DNA characters in a sequence. Is this constraint essential to its (or specifically the sphere clustering algorithm's) function?

Thanks!

gui11aume · 2020-10-21T15:26:04Z

Hi! The issue is not sphere clustering per se but sequence clustering itself. If two identical sequences have different non-DNA tags, how do you suggest to group the sequences in the same cluster?

I am not sure what your biological problem is, but I would recommend to approach it this way:

Extract the pure DNA suffixes (make sure the lines match with the original file).
Run starcode on the DNA suffixes and use the flag --seq-id.
Use the row numbers in the output to get the clusters from the original file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offending sequences #37

Offending sequences #37

kjkjindal commented Oct 21, 2020

gui11aume commented Oct 21, 2020 •

edited

Offending sequences #37

Offending sequences #37

Comments

kjkjindal commented Oct 21, 2020

gui11aume commented Oct 21, 2020 • edited

gui11aume commented Oct 21, 2020 •

edited