Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the resulting file has only the title and no content #102

Open
suhuanhou opened this issue Aug 30, 2023 · 6 comments
Open

the resulting file has only the title and no content #102

suhuanhou opened this issue Aug 30, 2023 · 6 comments

Comments

@suhuanhou
Copy link

seq_name="lcc54"
conda activate vireo
cd /share/home/shh/wes_out/vireo_out
bam_path=/share/home/shh/data/${seq_name}_cellranger5/outs/possorted_genome_bam.bam
barcode_path=/share/home/shh/data/${seq_name}_cellranger5/outs/filtered_feature_bc_matrix/barcodes.tsv
OUT_DIR=/share/home/shh/wes_out/cellsnp-lite
REGION_VCF=/share/home/shh/download/cellSNP/genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf

(vireo) [shh@node4 vireo_out]$ cellsnp-lite -s ${bam_path} -b ${barcode_path} -O ${OUT_DIR} -R ${REGION_VCF} -p 20 --minMAF 0.1 --minCOUNT 20 --gzip
[I::main] start time: 2023-08-30 18:54:10
[W::check_args] Max depth set to maximum value (2147483647)
[W::check_args] Max pileup set to maximum value (2147483647)
[I::main] global settings after checking:
num of input files = 1
out_dir = /share/home/shh/wes_out/cellsnp-lite
is_out_zip = 1, is_genotype = 0
is_target = 0, num_of_pos = 0
num_of_barcodes = 25859, num_of_samples = 0
refseq file = (null)
0 chroms:
cell-tag = CB, umi-tag = UB
nthreads = 20, tp_max_open = 1024
mthreads = 20, tp_errno = 0, tp_ntry = 0
min_count = 20, min_maf = 0.10, double_gl = 0
min_len = 30, min_mapq = 20
rflag_filter = 772, rflag_require = 0
max_depth = 2147483647, max_pileup = 2147483647, no_orphan = 1
[I::main] loading the VCF file for given SNPs ...
[I::main] fetching 36305578 candidate variants ...
[I::main] mode 1a: fetch given SNPs in 25859 single cells.
[I::csp_fetch_core][Thread-0] 2.00% SNPs processed.
.
.
.
[I::csp_fetch_core][Thread-11] 98.00% SNPs processed.
[I::csp_fetch_core][Thread-8] 98.00% SNPs processed.
[I::main] All Done!
[I::main] Version: 1.2.3 (htslib 1.17)
[I::main] CMD: cellsnp-lite -s /share/home/shh/data/lcc54_cellranger5/outs/possorted_genome_bam.bam -b /share/home/shh/data/lcc54_cellranger5/outs/filtered_feature_bc_matrix/barcodes.tsv -O /share/home/shh/wes_out/cellsnp-lite -R /share/home/shh/download/cellSNP/genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf -p 20 --minMAF 0.1 --minCOUNT 20 --gzip
[I::main] end time: 2023-08-30 20:37:12
[I::main] time spent: 6182 seconds.

cellsnp-lite works, but the resulting file has only the title and no content

@hxj5
Copy link
Collaborator

hxj5 commented Aug 31, 2023

Hi, is your data 10x scDNA-seq or scATAC-seq? If so, you may add --UMItag None. Or you may also check whether the cell barcodes you specified in the cmdline are compatible with the CB tag in the BAM? (See issue #44 for details)

@suhuanhou
Copy link
Author

Hi, is your data 10x scDNA-seq or scATAC-seq? If so, you may add --UMItag None. Or you may also check whether the cell barcodes you specified in the cmdline are compatible with the CB tag in the BAM? (See issue #44 for details)

cellSNP.base.vcf
##fileformat=VCFv4.2
#CHROM POS ID REF ALT QUAL FILTER INFO

cellSNP.tag.AD.mtx
%%MatrixMarket matrix coordinate integer general
%
0 25859 0

My data is 10X single-cell RNA-seq data.
I tried to add the argument "--chrom GRCh38_chr1", but the file is still blank.

@suhuanhou
Copy link
Author

cellSNP.samples.tsv
AAACCCAAGCGTTAGG-1
AAACCCAAGCTTACGT-1
AAACCCAAGTCACTAC-1
AAACCCACACAAATGA-1
AAACCCACACCCTAGG-1
AAACCCACAGACCAGA-1
AAACCCACAGACCTAT-1
AAACCCACAGGCAATG-1
AAACCCACAGTATTCG-1

@suhuanhou
Copy link
Author

[shh@node4 ~]$ samtools view /share/home/shh/data/${seq_name}_cellranger5/outs/possorted_genome_bam.bam

A00212:80:HWYWHDSX2:1:2215:27959:32252 16 GRCh38_chr1 10004 0 91M * 0 0 CCCTAACCCTAAACATAACCATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC ,F,FF:FF,FFF,F,FF:FF,FF:F:FFFF,F,FF,:F,FFFF,FFFF:FFFFFFFFFFFFFF:FF,FFFFFFFFFFFFFFFFFF:,FFFF NH:i:9 HI:i:1 AS:i:83 nM:i:RG:Z:lcc54_cellranger5:0:1:HWYWHDSX2:1 RE:A:I xf:i:0 CR:Z:TACATTCAGATTGATG CY:Z:FFFFFFF:FFFFFFFF CB:Z:TACATTCAGATTGATG-1 UR:Z:TAGGAAGTGT UY:Z:FFFFFFFFFF UB:Z:TAGGAAGTGT
A00212:80:HWYWHDSX2:1:2362:28908:1016 16 GRCh38_chr1 10004 0 91M * 0 0 CCCTATCCCTAACGCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTATCCCTTACCCTAACCCTAACCCTAACCCTAACCCTAAC F,FFFF:FFFFFF,F,FF:FFF,:FFFF:FFFFFFFF:F,FFFFF:FFFFFFF,F:FF,:FFFFF,FFFFFFFFF:FFFF,FFFFFFF:F: NH:i:6 HI:i:1 AS:i:81 nM:i:RG:Z:lcc54_cellranger5:0:1:HWYWHDSX2:1 RE:A:I xf:i:0 CR:Z:GATTTCTTCCCATAGA CY:Z:FFF:FFFFFFFFFFF: CB:Z:GATTTCTTCCCATAGA-1 UR:Z:CTCCCCACGT UY:Z:FFFFFFFFFF UB:Z:CTCCCCACGT
A00212:80:HWYWHDSX2:1:1477:26078:14841 16 GRCh38_chr1 10005 1 91M * 0 0 CCTCACCATAACCCTTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC F,F,:F:,:,FF,,F,,,F:F:::FFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:81 nM:i:RG:Z:lcc54_cellranger5:0:1:HWYWHDSX2:1 RE:A:I xf:i:0 CR:Z:TCATTGTTCTTCTTCC CY:Z:FFFFFFFFFFFFFFFF CB:Z:TCATTGTTCTTCTTCC-1 UR:Z:TAACCCTAAC UY:Z:FFFFFFFFFF UB:Z:TAACCCTAAC

@hxj5
Copy link
Collaborator

hxj5 commented Aug 31, 2023

Hi, thanks for the detailed information. The cell and umi tags seem all right. The issue is probably caused by the difference of chromosome names in the input VCF and BAM files, noted the chrom names in your BAM have prefix "GRCh38_" that does not exist in VCF. You may change the chromosome names in either file to make them compatible with each other.

@hxj5
Copy link
Collaborator

hxj5 commented Aug 31, 2023

BTW, it seems you were using the VCF file "genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf". You may try the VCF of AF5e2 version, which contains much less SNPs (hence much faster) and is sufficient for most downstream analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants