n50 prereads & GC content #583

hermeseduardo · 2017-10-01T21:08:46Z

Hi there,
It is normal to loss considerable N50 length after the correction step?, in my case when from N50 13000 to N50 7000
I have about 40X of coverage bellow my pre_assembly_stats.json and fc_run.cfg
I was suspecting of the GC content (30%) may affect DALIGNER, any clue regarding this?

thanks

{
"genome_length": 550000000,
"length_cutoff": 1000,
"preassembled_bases": 13073338353,
"preassembled_coverage": 23.77,
"preassembled_esize": 8290.033,
"preassembled_mean": 4874.457,
"preassembled_n50": 7271,
"preassembled_p95": 12989,
"preassembled_reads": 2682009,
"preassembled_seed_fragmentation": 1.443,
"preassembled_seed_truncation": 3720.872,
"preassembled_yield": 0.583,
"raw_bases": 22478005278,
"raw_coverage": 40.869,
"raw_esize": 14585.894,
"raw_mean": 9948.45,
"raw_n50": 13241,
"raw_p95": 22760,
"raw_reads": 2259448,
"seed_bases": 22442989421,
"seed_coverage": 40.805,
"seed_esize": 14607.423,
"seed_mean": 10139.719,
"seed_n50": 13254,
"seed_p95": 22877,
"seed_reads": 2213374
}

[General]
input_fofn = input.fofn
input_type = raw
length_cutoff = 1000
genome_size = 550000000
length_cutoff_pr = 10000

sge_option_da = --ntasks 1 --nodes 1 --cpus-per-task 8 --mem 30gb --time 5:30:00
sge_option_la = --ntasks 1 --nodes 1 --cpus-per-task 4 --mem 32gb --time 4:56:00
sge_option_cns = --ntasks 1 --nodes 1 --cpus-per-task 5 --mem 32gb --time 3:00:00
sge_option_pda = --ntasks 1 --nodes 1 --cpus-per-task 8 --mem 30gb --time 3:30:00
sge_option_pla = --ntasks 1 --nodes 1 --cpus-per-task 4 --mem 35gb --time 3:58:00
sge_option_fc = --ntasks 1 --nodes 1 --cpus-per-task 8 --mem 20gb --time 59:00

da_concurrent_jobs = 396
la_concurrent_jobs = 396
cns_concurrent_jobs = 396
pda_concurrent_jobs = 396
pla_concurrent_jobs = 396

pa_HPCdaligner_option = -v -B70 -t16 -e.70 -l1000 -s1000
ovlp_HPCdaligner_option = -v -B70 -t32 -h60 -e.96 -l500 -s1000

pa_DBsplit_option = -x500 -s120
ovlp_DBsplit_option = -x500 -s120

falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 2 --max_n_read 200

overlap_filtering_setting = --max_diff 100 --max_cov 200 --min_cov 1 --bestn 1

skip_checks = true

gconcepcion · 2017-10-02T16:33:45Z

Hi,

Yes, though dependent on the nature(read length/quality) and quantity of your input data, seeing a decrease in N50 from raw reads to corrected preads is typical, especially in a coverage limited situation. Long reads are often broken during the correction process in low coverage situations, resulting in an overall decrease in N50.

hermeseduardo · 2017-10-02T16:47:02Z

OK thanks. Do you know if there is anything that can be done to help? eg. reduce -e.70 to -e.60, or it may be 'bad' for the final assembly?
I am also currently trying with the -b option for daligner, apparently it helps when there is compositional bias.
pa_HPCdaligner_option = -vb ..........
ovlp_HPCdaligner_option = -vb ........

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

n50 prereads & GC content #583

n50 prereads & GC content #583

hermeseduardo commented Oct 1, 2017

gconcepcion commented Oct 2, 2017

hermeseduardo commented Oct 2, 2017

n50 prereads & GC content #583

n50 prereads & GC content #583

Comments

hermeseduardo commented Oct 1, 2017

gconcepcion commented Oct 2, 2017

hermeseduardo commented Oct 2, 2017