SureCell / ddSeq support #42

AskPascal · 2018-08-09T12:02:55Z

Hej,

I just got some data, generated with SureCell libraries on a ddSeq machine (i.e. the protocol by Illumina and Bio-Rad). I would like to test your pipeline for the analysis but I'm not sure if it can be used and if so how to fill the config.yaml.
Barcodes are in Read 1, however, they are not at a fixed position, and the cell barcode is split into three parts by spacer sequences:

Below is a small example from the first read fastq file of one of my samples.

Is it possible to process this data with dropSeqPipe?

Cheers

@D00457:259:HKWJNBCX2:1:1105:1128:2079 1:N:0:CCTAAGAC
CTCGGCGTTAGCCATCGCATTGCGGATTGTACCTCTGAGCTGAATCGCCTACGTCCCCGGAGACCNNT
+
<DDD0<CFHHHIIIIIIIIIIIIIIIHIHIGHHHIHHHGHFHHHIHHHIIIIHIIIIIEHHHIII##<
@D00457:259:HKWJNBCX2:1:1105:1168:2089 1:N:0:CCTAAGAC
AATGGAGTAGCCATCGCATTGCACCTTCTACCTCTGAGCTGAAGAAATAACGCCTACGAAGACTTNNT
+
<<<D01<<D1ECH?F0=CEE?<1DG@<1CGEH@HHHHIIHGEGCGEHFHIHGHHHHHIEHHHHEF##<
@D00457:259:HKWJNBCX2:1:1105:1122:2104 1:N:0:CCTAAGAC
ACCCAATAGCCATCGCATTGCCCGTAATACCTCTGAGCTGAATAAGCTACGAAACTGTGGACTTTNNT
+
0<DDDIHHIIEEHHGHIIEHIFDGHHHIIIHIIIH?GHHIIH1<FH1FGHIGHIIHIFHIHE@FH##<
@D00457:259:HKWJNBCX2:1:1105:1102:2126 1:N:0:CCTAAGAC
TTCGTAGAGGTAGCCATCGCATTGCTGAGACTACCTCTGAGCTGAACTCAATACGCTTCGAGCGANNT
+
0<<DBDHHHFCFHEGHIHIHIIIIHHIHGEHIHHIHIHIHI?1<1GHHIHIIIIIGIIGHHGHIH##<
@D00457:259:HKWJNBCX2:1:1105:1158:2127 1:N:0:CCTAAGAC
ACATAGATAGCCATCGCATTGCTAATAGTACCTCTGAGCTGAAGCGAATACGTCCCCCCTGACTTNNT
+
@@B@0<CEGHIIHHI=GEEHCGHEHHEEHHIHFHCHEHCHIHIIHIHIIHHHHI0EHHIII?@1<##<

The text was updated successfully, but these errors were encountered:

Hoohm · 2018-08-09T14:04:42Z

Hello @pascal-git

As of now, it is definitely not compatible. The split barcode pattern is not the big issue here, you could give those positional arguments in and it should work. Although I haven't tried it.

The main issue is the shift in base on the first read.

Right now the barcodes are picked by given position of the bases in R1, so it can't be shifted. One way to overcome this would be to first "deshift" R1, then run dropSeqPipe.

Although I don't have the time to try it out now, it might be a good idea to change the way I find the barcode and umi and use a similar idea to umi-tools which I recommend you try out.

This construct seems overly complicated though, would you know what are the advantages over 10x for example?

AskPascal · 2018-08-10T07:33:11Z

Hej @Hoohm

No, I am also still wondering what would be better with this approach than the 10x way. To not have all barcodes at the same position might hedge for systemic biases in sequencing maybe? Or its just a intellectual property thing...

Thanks for clarifying what the problem would be to get the data into dropSeqPipe and how to potentially solve it. umi-tools for sure looks interesting. I found however another tool yesterday: umis, which has even example code for SureCell / ddSeq available and in my preliminary tests it looks promising. I might use it in combination with dropSeqPipe or just as standalone...

Hoohm · 2018-08-10T15:40:17Z

Hey @pascal-git
I've come up with a small script that should be able to handle funky barcode structures.

You can check it out here

I'm working on a new version of dropSeqPipe (see develop branch) which is going to use cutadapt instead of trimmomatic. The main reason was to add adapter presence in R1 and R2 instead of just R2 trimmed as it is now.

To do this, I'm also changing a lot in the filtering. I'm trimming R1 and R2 separately and repair them after trimming. This cuts down running time as well as give more insight into the potential problem with the protocol.

Since I'm not depending on dropseq tools for this first part anymore, I'm capturing barcodes differently. This would make it easier for me to allow for fancy barcode structure.

So, keep checking, your protocol might be compatible in one month or so.

AskPascal · 2018-08-13T08:05:18Z

Hej @Hoohm

This is really interesting. I'll keep my eyes open for the new version then!

Hoohm · 2018-11-28T13:00:33Z

As you can see, this is not implemented yet at all.

I sadly I haven't found the time to work on it since this is not some technology we use.

I hope someone could help out on for integrating a universal cell barcode structure module

TomKellyGenetics · 2020-03-19T02:26:29Z

I've written a sed solution to extract the barcodes and UMI from R1. This will return a Read1 with an 18bp barcode and 8bp UMI.

Read1s=("Sample_S1_L001_R1_001.fastq" "Sample_S1_L002_R1_001.fastq")
Read2s=("Sample_S1_L001_R2_001.fastq" "Sample_S1_L002_R2_001.fastq")

    #remove adapter from SureCell (and correct phase blocks)
        for File in "${Read1s[@]}"; do
            #remove phase blocks and linkers
            sed -E '
                /.*(.{6})TAGCCATCGCATTGC(.{6})TACCTCTGAGCTGAA(.{6})ACG(.{8})GAC/ {
                s/.*(.{6})TAGCCATCGCATTGC(.{6})TACCTCTGAGCTGAA(.{6})ACG(.{8})GAC.*/\1\2\3\4/g
                n
                n
                s/.*(.{6}).{15}(.{6}).{15}(.{6}).{3}(.{8}).{3}/\1\2\3\4/g
                }' $File > .temp
            mv $.temp $File
        done

Hoohm · 2020-03-21T15:53:54Z

Thank you @TomKellyGenetics !

I'm gonna add this to the documentation :)

Hoohm added the enhancement label Oct 9, 2018

Hoohm self-assigned this Oct 9, 2018

Hoohm added the help wanted label Nov 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SureCell / ddSeq support #42

SureCell / ddSeq support #42

AskPascal commented Aug 9, 2018

Hoohm commented Aug 9, 2018

AskPascal commented Aug 10, 2018 •

edited

Hoohm commented Aug 10, 2018 •

edited

AskPascal commented Aug 13, 2018

Hoohm commented Nov 28, 2018

TomKellyGenetics commented Mar 19, 2020 •

edited

Hoohm commented Mar 21, 2020

SureCell / ddSeq support #42

SureCell / ddSeq support #42

Comments

AskPascal commented Aug 9, 2018

Hoohm commented Aug 9, 2018

AskPascal commented Aug 10, 2018 • edited

Hoohm commented Aug 10, 2018 • edited

AskPascal commented Aug 13, 2018

Hoohm commented Nov 28, 2018

TomKellyGenetics commented Mar 19, 2020 • edited

Hoohm commented Mar 21, 2020

AskPascal commented Aug 10, 2018 •

edited

Hoohm commented Aug 10, 2018 •

edited

TomKellyGenetics commented Mar 19, 2020 •

edited