Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Total alignments : 0 / featureCount Results #115

Open
cagataykos opened this issue Jun 26, 2021 · 1 comment
Open

Total alignments : 0 / featureCount Results #115

cagataykos opened this issue Jun 26, 2021 · 1 comment

Comments

@cagataykos
Copy link

I have human RNA-Seq dataset it has two different barcodes in the different folder. I aligned with that command
minimap2 -ax splice -uf -k14 ref.fa direct-rna.fq > aln.sam
I try to quantify and counts using with Subread featureCounts function. In the subread results, there is a problem with one of the bam files. I downloaded reference and gtf files from GENCODE.
I checked the bam file with samtools view -H first.bam-second.bam I saw that I followed the same steps for each bam file.
In the IGV results, I saw matches and alignment for all bam files.

Do you have any suggestions the solve this problem? What am I doing wrong?

featureCounts -T 8 -a gencode.v38.chr_patch_hapl_scaff.annotation.gtf -g 'transcript_id' -o readcouts.txt bam/*.bam

|| Total alignments : 11214480 ||
|| Successfully assigned alignments : 4051945 (36.1%) ||
|| Running time : 2.67 minutes

|| Total alignments : 0 ||
|| Successfully assigned alignments : 0 ||
|| Running time : 2.89 minutes

I also tried with Salmon in the salmon alignment-based quantification results bam file has huge differences between each other.
salmon quant --ont -t reference.fa -l A -a first.bam -o salmon_quant1
Total # of mapped reads : 5465357

of uniquely mapped reads : 328808350000000

ambiguously mapped reads : 2177274
salmon quant --ont -t reference.fa -l A -a second.bam -o salmon_quant2
Completed first pass through the alignment file.

Total # of mapped reads : 3843632

of uniquely mapped reads : 2552463

ambiguously mapped reads : 1291169

@casmerifle
Copy link

Apologies for the necro but I've encountered a similar issue and would like to find out the issue. I ran nanoseq using a custom fasta and gtf, opting for stringtie2 to perform transcript quantification. Like @cagataykos, I've had transcripts identified (that is, a subset of the unique transcripts in my gtf) but counts of 0 for all of them.

To troubleshoot, I've verified that:
(i) both chromosome naming styles ("chr1" v. "1") result in the same output,
(ii) running stringtie2 independently using the intermediate minimap2 output of nanoseq results in the error:

WARNING: no reference transcripts were found for the genomic sequences where reads were mapped!
Please make sure the -G annotation file uses the same naming convention for the genome sequences.

I was considering the possibility that the reference used for minimap2 may not be compatible with the reference used in stringtie2. However, given I supplied to the parameters csv a unique transcriptome fa and gtf, and that transcript identification was still possible by stringtie2 as part of the nanoseq pipeline, I'm not certain how to proceed further.

The csv file I used is as follows:

group,replicate,barcode,input_file,fasta,gtf
HUMAN_FTO_WT_DMSO_M,1,,/home/user/data/nanoseq/raw/PAQ63378_FTO_WT_DMSO_M/PAQ63378_FTO_WT_DMSO_M_basecalled.fastq.gz,/home/user/data/nanoseq/gencode.v44.transcripts.fa,/home/user/data/nanoseq/gencode.v44.annotation.ChrPresent.gtf
HUMAN_FTO_WT_BT2,1,,/home/user/data/nanoseq/raw/PAQ87300_FTO_WT_BT2/PAQ87300_FTO_WT_BT2_basecalled.fastq.gz,/home/user/data/nanoseq/gencode.v44.transcripts.fa,/home/user/data/nanoseq/gencode.v44.annotation.ChrPresent.gtf

Executed this command:
nextflow run nf-core/nanoseq --input /home/samuel/data/nanoseq/raw/nanoseq_pars.csv --protocol directRNA --skip_demultiplexing --skip_fusion_analysis -profile singularity

Based on this, it seems like there's an inherent incompatibility with my ref files (genocode) and what stringtie2 is able to process?

For reference, the ref gtf I created (version where naming is "1" not "Chr1"):

1       HAVANA  gene    11869   14409   .       +       .       gene_id "ENSG00000290825.1"; gene_type "lncRNA"; gene_name "DDX11L2"; level 2; tag "overlaps_pseudogene";
1       HAVANA  transcript      11869   14409   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
1       HAVANA  exon    11869   12227   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
1       HAVANA  exon    12613   12721   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
1       HAVANA  exon    13221   14409   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";

Would there be an issue with the gencode source and Ensembl content?

Appreciate any help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants