Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation "quick-start" #103

Open
schmucr1 opened this issue Feb 2, 2023 · 11 comments
Open

Documentation "quick-start" #103

schmucr1 opened this issue Feb 2, 2023 · 11 comments

Comments

@schmucr1
Copy link

schmucr1 commented Feb 2, 2023

Hello

I have some questions regarding the documentation in https://github.com/COMBINE-lab/alevin-fry#a-quick-start-run-through-on-sample-data

I read and worked through the quick-start and pull the latest singularity image, singularity pull docker://combinelab/usefulaf:latest

There some discrepancies between that version and the documentation:

  1. "Building the splici reference and index": there is no parameter -l 91, but there is a parameter -r.
  2. "Quantifying the sample": error: Invalid value 'u' for '--forced-cells <FORCED_CELLS>': invalid digit found in string

So, I am not sure whether the documentation fits to the latest (docker/singularity) version?

Thank you and best regards,
R.

PS: and when I run with it, for example, with option --knee I get an unknown error

singularity exec --cleanenv --bind $AF_SAMPLE_DIR:/workdir --pwd /usefulaf/bash usefulaf_latest.sif simpleaf quant -1 /workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz,/workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz -2 /workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz,/workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz -i /workdir/human_CR_3.0_splici/index -o /workdir/quants/pbmc1k_v3 --knee -c v3 -r cr-like -m /workdir/human_CR_3.0_splici/ref/transcriptome_splici_fl86_t2g_3col.tsv -t 8
2023-02-02T12:28:23.288555Z  INFO simpleaf: deserializing from File { fd: 3, path: "/afhome/simpleaf_info.json", read: true, write: false }
2023-02-02T12:28:23.288601Z  INFO simpleaf: prog info = ReqProgs { salmon: Some(ProgInfo { exe_path: "/opt/conda/bin/salmon", version: "1.9.0" }), alevin_fry: Some(ProgInfo { exe_path: "/opt/conda/bin/alevin-fry", version: "0.8.1" }), pyroe: Some(ProgInfo { exe_path: "/opt/conda/bin/pyroe", version: "0.7.1" }) }
Error: custom geometry string doesn't contain ';' character
@rob-p
Copy link
Contributor

rob-p commented Feb 2, 2023

Thanks! Can you please share the full command you are using fir quantification? We'll fix bugs in the documentation that may have arisen from the continued development of the software.

@schmucr1
Copy link
Author

schmucr1 commented Feb 2, 2023

Hi!

This the command I used, same as in previous post but with line breaks

singularity exec --cleanenv    \ 
--bind $AF_SAMPLE_DIR:/workdir     \
--pwd /usefulaf/bash usefulaf_latest.sif     \
simpleaf quant     \
-1 /workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz,/workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz     \
-2 /workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz,/workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz     \
-i /workdir/human_CR_3.0_splici/index     \
-o /workdir/quants/pbmc1k_v3     \
--knee -c v3 -r cr-like     \
-m /workdir/human_CR_3.0_splici/ref/transcriptome_splici_fl86_t2g_3col.tsv     \
-t 8

in the doc there is this line -f u -c v3 -r cr-like , but I replaced -f u by --knee because -f u raises an error.

I hope this helps.
Thank you!

@rob-p
Copy link
Contributor

rob-p commented Feb 2, 2023

Ohh! I see. The problem is that we have massively upgraded simpleaf but not this particular documentation page. Great catch! The simpleaf docs can be found here. We will update this QuickStart to be correct.

In addition to the -l to -r difference you found during indexing, you should change your quant command in the following way.

  1. change -c v3 to -c 10xv3
  2. replace —-knee with -u

This should then work equivalently to what is currently in the tutorial. Please let us know if this works, and great catch. Thanks for reporting it!

—-Rob

@schmucr1
Copy link
Author

schmucr1 commented Feb 2, 2023

Thanks, Rob!

The above parameters for the docker/singularity images worked. No more errors regarding input parameters.
The workflow finished with the message Error: quant failed with exit status ExitStatus(unix_wait_status(512))

singularity exec --cleanenv     --bind $AF_SAMPLE_DIR:/workdir     --pwd /usefulaf/bash usefulaf_latest.sif     simpleaf quant     -1 /workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz,/workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz     -2 /workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz,/workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz     -i /workdir/human_CR_3.0_splici/index     -o /workdir/quants/pbmc1k_v3     -u -c 10xv3 -r cr-like     -m /workdir/human_CR_3.0_splici/ref/transcriptome_splici_fl86_t2g_3col.tsv     -t 82023-02-02T13:04:55.123087Z  INFO simpleaf: deserializing from File { fd: 3, path: "/afhome/simpleaf_info.json", read: true, write: false }
2023-02-02T13:04:55.123152Z  INFO simpleaf: prog info = ReqProgs { salmon: Some(ProgInfo { exe_path: "/opt/conda/bin/salmon", version: "1.9.0" }), alevin_fry: Some(ProgInfo { exe_path: "/opt/conda/bin/alevin-fry", version: "0.8.1" }), pyroe: Some(ProgInfo { exe_path: "/opt/conda/bin/pyroe", version: "0.7.1" }) }
2023-02-02T13:04:55.123209Z  INFO simpleaf: cmd : "/opt/conda/bin/salmon" "alevin" "--index" "/workdir/human_CR_3.0_splici/index" "-l" "A" "-1" "/workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz" "/workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz" "-2" "/workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz" "/workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz" "--threads" "8" "-o" "/workdir/quants/pbmc1k_v3/af_map" "--sketch" "--chromiumV3"
2023-02-02T13:07:22.899766Z  INFO simpleaf: cmd : "/opt/conda/bin/alevin-fry" "generate-permit-list" "-i" "/workdir/quants/pbmc1k_v3/af_map" "-d" "fw" "--unfiltered-pl" "/afhome/plist/10x_v3_permit.txt" "--min-reads" "10" "-o" "/workdir/quants/pbmc1k_v3/af_quant"
2023-02-02T13:07:32.961164Z  INFO simpleaf: cmd : "/opt/conda/bin/alevin-fry" "collate" "-i" "/workdir/quants/pbmc1k_v3/af_quant" "-r" "/workdir/quants/pbmc1k_v3/af_map" "-t" "8"
2023-02-02T13:07:38.327151Z  INFO simpleaf: cmd : "/opt/conda/bin/alevin-fry" "quant" "-i" "/workdir/quants/pbmc1k_v3/af_quant" "-o" "/workdir/quants/pbmc1k_v3/af_quant" "-t" "8" "-m" "/workdir/human_CR_3.0_splici/ref/transcriptome_splici_fl86_t2g_3col.tsv" "-r" "cr-like"
Error: quant failed with exit status ExitStatus(unix_wait_status(512))

Maybe I should now switch to the documentation here, https://alevin-fry.readthedocs.io/en/latest/index.html, and tutorials referenced there.

Also, the docker/singularity images are not mentioned anymore in the "readthedocs" documentation. Would you then recommend to use only the versions from bioconda and not from docker/singularity ?

Many thanks!

@rob-p
Copy link
Contributor

rob-p commented Feb 2, 2023

@DongzeHE any thoughts about this error above? It gets to the very last step. Is there any difference in where e.g. pyroe puts the t2g vs roe?

@rob-p
Copy link
Contributor

rob-p commented Feb 2, 2023

@schmucr1,

Again, apologies for all of the difficulty here. The problem this time is that, again, the tool that build the splici transcriptome was updated, and uses a different default name for the t2g file. The proper command should be :

singularity exec --cleanenv --bind $AF_SAMPLE_DIR:/workdir \
--pwd /usefulaf/bash usefulaf_latest.sif \
simpleaf quant \
-1 /workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz,/workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz \
-2 /workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz,/workdir/data/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz \
-i /workdir/human_CR_3.0_splici/index \
-o /workdir/quants/pbmc1k_v3 \
-u -c 10xv3 -r cr-like \
-m /workdir/human_CR_3.0_splici/ref/splici_fl86_t2g_3col.tsv \
-t 8

note the only difference from your command is that we pass

-m /workdir/human_CR_3.0_splici/ref/splici_fl86_t2g_3col.tsv

instead of

/workdir/human_CR_3.0_splici/ref/transcriptome_splici_fl86_t2g_3col.tsv 

which represents the new name of the t2g file. Sorry for the confusion. Hopefully this works, and we'll update the documentation accordingly!

--Rob

@schmucr1
Copy link
Author

schmucr1 commented Feb 2, 2023

Hi @rob-p

Yes, now everything worked perfectly and without warnings nor errors.
Then, I will continue using the Singularity/Docker image and will try out the tutorials provided through the readthedocs manual.

Many thanks for your rapid support!

Best regards,
Roland

@DongzeHE
Copy link
Contributor

DongzeHE commented Feb 2, 2023

Hi @schmucr1,

Sorry for the inconvenience. As simpleaf copies and pastes the t2g files when building the index, if you don't want to check the name of the t2g_3cols.tsv file every time, you can also pass the t2g_3col.tsv file in the index folder, which is in the same directory as ref, to -m. This file is always named as t2g_3col.tsv.

-m /workdir/human_CR_3.0_splici/index/t2g_3col.tsv

Best,
Dongze

@schmucr1
Copy link
Author

schmucr1 commented Feb 2, 2023

Hi @DongzeHE and @rob-p

Thank you for the explanations. I will consider this naming conventions for the upcoming runs.

On another note and not related to this issue, do you think that alevin-fry is able to process data generated with the DNBelab C Series Single-Cell Library Prep Set, ie not 10x data ? I am mostly interested in a macaque data set published here: https://www.nature.com/articles/s41586-022-04587-3#Sec8

Many thanks and kind regards,
R.

@DongzeHE
Copy link
Contributor

DongzeHE commented Feb 2, 2023

Hi @schmucr1,

Short answer:

Yes. You need to pass the location of UMI and Cellular barcode in the technical read (read 1) as a custom geometry to simpleaf. See here for details.

Long answer:

Upon checking, I think alevin-fry is able to handle this dataset. However, it will be a little bit tricky because the geometry specification (the positions of UMI and cellular barcode in read1) of that protocol is unclear to me.

In the paper you shared, it mentioned that they used the prep set as another study. And in that study, the authors referred to the original preprint of that prep set.
The interesting thing is the three studies listed above did not apply the same sequencing strategy.

  • In the original preprint introducing that prep set, the author said

    For C4 scRNA-seq data, the cell barcodes (base 1 to base 10 and base 17 to base 26) and UMIs (base 32 to 41) are in read 1 and the cDNA reads are in read 2.

  • In the paper you shared, the authors used a different sequencing strategy and said they used PISA for processing the sequencing reads before aligning. However, I cannot find the commands for processing FASTQ files in the corresponding GitHub repository.

    41-bp read length for read 1 and 100-bp read length for read 2.

  • In the other study, the sequencing strategy was

    The read structure was paired-end with Read 1, covering 30 bases inclusive of 10-bp cell barcode 1, 10-bp cell barcode 2 and 10-bp unique molecular identifier (UMI), and Read 2 containing 100 bases of transcript sequence, and 10-bp sample index.

Therefore, I would suggest you ask the authors about the geometry specification of the prep set they used and set the appropriate custom geometry in simpleaf accordingly.

@rob-p: Please correct me if I misunderstood what's happening. Thanks.

Best,
Dongze

@rob-p
Copy link
Contributor

rob-p commented Feb 2, 2023

@DongzeHE,

I agree, the right course of action, first, is to really understand the geometry. From there, figuring out how to provide it should be straightforward.

One point of information is that while the mapping tools (both salmon and piscem) could support such geometry, I think that simpleaf may currently have some silly limitations about having a single contiguous barcode. That is if you look in the docs, I'm not sure if you could easily do something like B1[1-10;17-26]. Of course, there is no difficulty supporting this since both salmon and piscem support multi-part barcodes in the backend, we'd just have to write that support. Ultimately, what happens is that the "simpleaf" format is parsed, and then converted to the appropriate geometry type for either salmon or piscem. Once we know the library type we are dealing with, we can figure out the easiest way to get the data into the pipeline (a stop-gap solution would just be to preprocess read 1 e.g. with awk or some such) so that the barcode is contiguous.

--Rob

Update: Further inspection suggests that multiple ranges probably will work with salmon. It's only an issue parsing custom geometry with multiple ranges per piece for piscem. But, since the next release of simpleaf will be the first to support piscem, we can just fix it first ;P.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants