##1 mismatch by cluster #26

penglbio · 2018-10-08T23:04:46Z

sorry to trouble you. In a paper, I saw someone use your software(starcode)to cluster sequences within 1nt mismatch. the following is the paper title and description:
title:Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding
description:We then used Starcode(45)to collapse UMIs of aligned reads that were within 1nt mismatch of another UMI

I am confused, because In your software, I didn't find a parameter to set. can you tell me did there is a method to solve this problem

ezorita · 2018-10-08T23:51:06Z

The parameter -d specifies the clustering distance (the number of mismatched nucleotides you want to allow). So, in your example with distance 1, you'd run starcode as follows:

starcode -d1 input-file.fastq

Hope it helps.

penglbio · 2018-10-09T02:13:16Z

I will try. Thank you very much.

penglbio · 2018-10-09T02:17:46Z

how about the fasta, I test like the following, but can't work.
$ starcode -d 1 test_file.fasta
running starcode with 1 thread
reading input files
FASTA format detected
sorting
progress: 100.00%
message passing clustering
AGGGCTTACAAGTATAGGCC 2
CCTCATTATTTGTCGCAATG 1
TGCGCCAAGTACGATTTCCG 1
TGGGCTTACAAGTATAGGCC 1

the last sequence just 1 mismatch with the first.

ezorita · 2018-10-09T10:25:45Z

Note that you are using message passing algorithm for clustering. Message passing has a parameter called --cluster-ratio which is set to 5 by default. This parameter sets a restriction on the ratio of sequences needed to cluster one sequence with another. So, in other words, by default two sequences will only be clustered together if the count of one is at least 5 time bigger than the count of the other.

In your example, you are running starcode with just a few sequences and default parameters. Note that the last and the first sequence did not cluster together because their cluster ratio is 2, i.e. the first has 2 counts and the last has only 1.

So, to solve this, do one of the following:

Run starcode with the whole dataset (but make sure that each canonical sequence is supposed to be over-represented compared to the others).
Run starcode with a smaller --cluster-ratio.
Use spheres clustering algorithm (this set with the parameter -s).

Hope it helps.

bettycatherine · 2020-07-27T03:01:11Z

I am really confused. Starcode was used in that paper for UMI collapse, so I think they used starcode-umi but not starcode. Am I correct? I am also wondering if there is any advice on how to set sequence distance when we use starcode-umi. Thank you very much!

wangjianing-web · 2020-08-03T04:50:01Z

I am really confused. Starcode was used in that paper for UMI collapse, so I think they used starcode-umi but not starcode. Am I correct? I am also wondering if there is any advice on how to set sequence distance when we use starcode-umi. Thank you very much!

But the UMI(10bp) is in the R2.fq file, it said the cDNA reads (Read 1) were mapped to genome, and then used Starcode (45) to collapse UMIs of aligned reads that were within 1 nt mismatch of another UMI, assuming the two aligned reads were also from the same UBC. I don't konw if I should combine the UMI and read 1, but it can not mappepd to genome,I don know what is the correct method.

ezorita · 2020-08-21T08:58:37Z

Hi @wangjianing-web. I can't tell which is the correct method they used. You should contact the authors for more details on how they used starcode in their work. What I understand from your description is that they followed these steps:

Map reads to genome
Take mapped reads and append UMI to them
Use starcode to cluster reads with similar UMI (1 mismatch)

gui11aume added the question label Oct 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

##1 mismatch by cluster #26

##1 mismatch by cluster #26

penglbio commented Oct 8, 2018 •

edited

ezorita commented Oct 8, 2018

penglbio commented Oct 9, 2018

penglbio commented Oct 9, 2018

ezorita commented Oct 9, 2018

bettycatherine commented Jul 27, 2020

wangjianing-web commented Aug 3, 2020

ezorita commented Aug 21, 2020

##1 mismatch by cluster #26

##1 mismatch by cluster #26

Comments

penglbio commented Oct 8, 2018 • edited

ezorita commented Oct 8, 2018

penglbio commented Oct 9, 2018

penglbio commented Oct 9, 2018

ezorita commented Oct 9, 2018

bettycatherine commented Jul 27, 2020

wangjianing-web commented Aug 3, 2020

ezorita commented Aug 21, 2020

penglbio commented Oct 8, 2018 •

edited