Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of reads increased after Porechop #97

Open
sumitra20 opened this issue Nov 1, 2023 · 1 comment
Open

Number of reads increased after Porechop #97

sumitra20 opened this issue Nov 1, 2023 · 1 comment

Comments

@sumitra20
Copy link

sumitra20 commented Nov 1, 2023

Hi,

I have been using porechop to trim my nanopore generated raw reads and i noticed that after trimming the number of total bases reduces, but the number of total reads increases compared to the raw reads. The log file from porechop does indicate that adapter trimming has been done. Seems a little strange to me, shouldn't the number of trimmed reads reduce after porechop? Any advice will be appreciated. Thank you

porechop -i nanopore_strato_barcode2_TEST.fastq.gz -o ./porechop_strato_barcode2_TEST.fastq.gz

OUTPUT:

Looking for known adapter sets
10,000 / 10,000 (100.0%)
Best
read Best
start read end
Set %ID %ID
SQK-NSK007 100.0 79.2
Rapid 68.4 0.0
RBK004_upstream 80.0 0.0
SQK-MAP006 77.4 82.6
SQK-MAP006 short 80.0 76.0
PCR adapters 1 79.2 79.2
PCR adapters 2 82.6 82.6
PCR adapters 3 78.3 80.0
1D^2 part 1 72.4 74.1
1D^2 part 2 84.8 74.2
cDNA SSP 70.0 73.2
Barcode 1 (reverse) 100.0 80.0
Barcode 2 (reverse) 100.0 100.0
Barcode 3 (reverse) 75.0 77.8
Barcode 4 (reverse) 83.3 80.8
Barcode 5 (reverse) 80.8 80.8
Barcode 6 (reverse) 77.8 84.0
Barcode 7 (reverse) 76.9 76.0
Barcode 8 (reverse) 81.5 76.9
..

Trimming adapters from read ends
SQK-NSK007_Y_Top: AATGTACTTCGTTCAGTTACGTATTGCT
SQK-NSK007_Y_Bottom: GCAATACGTAACTGAACGAAGT
BC01_rev: CACAAAGACACCGACAACTTTCTT
BC01: AAGAAAGTTGTCGGTGTCTTTGTG
BC02_rev: ACAGACGACTACAAACGGAATCGA
BC02: TCGATTCCGTTTGTAGTCGTCTGT
NB01_start: AATGTACTTCGTTCAGTTACGTATTGCTAAGGTTAACACAAAGACACCGACAACTTTCTTCAGCACCT
NB01_end: AGGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCTTAGCAATACGTAACTGAACGAAGT
NB02_start: AATGTACTTCGTTCAGTTACGTATTGCTAAGGTTAAACAGACGACTACAAACGGAATCGACAGCACCT
NB02_end: AGGTGCTGTCGATTCCGTTTGTAGTCGTCTGTTTAACCTTAGCAATACGTAACTGAACGAAGT

277,493 / 277,493 (100.0%)

270,816 / 277,493 reads had adapters trimmed from their start (20,279,939 bp removed)
234,960 / 277,493 reads had adapters trimmed from their end (12,093,349 bp removed)

Splitting reads containing middle adapters
277,493 / 277,493 (100.0%)

655 / 277,493 reads were split based on middle adapters

Saving trimmed reads to file
pigz found - using it to compress instead of gzip

RAW DATA:
General summary:
Mean read length: 5,901.0
Mean read quality: 10.1
Median read length: 4,650.0
Median read quality: 10.8
Number of reads: 277,493.0
Read length N50: 7,197.0
STDEV read length: 4,579.3
Total bases: 1,637,482,075.0

TRIMMED DATA:
General summary:
Mean read length: 5,774.1
Mean read quality: 10.2
Median read length: 4,529.0
Median read quality: 10.9
Number of reads: 277,956.0
Read length N50: 7,128.0
STDEV read length: 4,555.5
Total bases: 1,604,957,251.0

@ombystoma-young
Copy link

Hello, @sumitra20,
I suppose, it happened due to splitting reads containing middle adapters. In your case:

655 / 277,493 reads were split based on middle adapters

See Split reads with internal adapters and Discard reads with internal adapters for more detailed description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants