Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with inDrops data? #33

Open
kfontanez opened this issue Oct 10, 2019 · 2 comments
Open

Compatibility with inDrops data? #33

kfontanez opened this issue Oct 10, 2019 · 2 comments
Assignees

Comments

@kfontanez
Copy link

Hello,

I am attempting to use Starcode-UMI with data produced from inDrops which has the structure Cellbarcode[8-12 bp]-fixed 22 bp sequence -Cellbarcode[8bp]-UMI[6bp]-PolyT.

I am able to cluster the UMI portion with a setting of 14 UMI bases but no matter how many bases I set seq-trim to the program hangs at the sequence clustering portion of the pipeline. I tried trimming every base following the first 14 bases so that the sequence clustering would have zero bases to work with and I also tried trimming nothing. In both cases, the program hangs at sequence clustering (I left it for over 14 hours with no progress). I'm running with 32 virtual cores and 64 Gb of RAM so I don't think it's a memory issue.

Here is what I ran:
./starcode-umi --umi-len 14 --umi-threads 8 --seq-threads 8 --umi-cluster s --seq-cluster s --umi-d 2 --seq-d 2 --seq-trim 15 ~/path/to/file/filename_R1.fastq

Here is the structure of the input sequence which is 51 bases long:
TGACANTACTTGAGTGATTGCTTGTGACGCCTTAGTCCCTTCTTTTATTTT

I can get several thousand UMI clusters but the program hangs at sequencing clustering with a cluster of size 1 that never increases.

Has starcode ever been tested with inDrops data that has this cell barcode structure?

Thank you.

@gui11aume
Copy link
Owner

Thank you for raising this issue. We will get back to you as soon as we can reproduce it.

@ezorita
Copy link
Collaborator

ezorita commented Oct 21, 2019

Hi @kfontanez,

Thank you for reporting and sorry for the late reply. As I understand, your UMI part is the combination of Cellbarcode[8bp]+UMI[6bp], 14bp in total. Some questions:

  • Is this part moved to the beginning of the read? Starcode assumes that the read starts with the UMI.
  • Can you be more specific with the behavior? Hangs means that is does not progress anymore, it crashes with some error message, or crashes without error message?
  • How many sequences are you clustering?
  • Can you provide us a minimal file which reproduces the behavior?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants