Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCS data assemble far too small #260

Open
fkyoung1992 opened this issue Apr 19, 2023 · 4 comments
Open

CCS data assemble far too small #260

fkyoung1992 opened this issue Apr 19, 2023 · 4 comments

Comments

@fkyoung1992
Copy link

Dear prof. Ruan
I assembled a plant genome (~600m ~1.94% heterozygosity) based on ~400G Pacbio Sequel II CCS data with the followed line:
wtdbg2 -t 0 -x ccs -g 600m -i ccs23.fastq.gz -o beichai34 -e 2
The kmer distribution was like this:
|
|
|
|
|
|
|
|
|
||
||
||
||
||
|||
|||
||||
|||||
|||||||
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
********************** 1 - 201 **********************
Quatiles:
10% 20% 30% 40% 50% 60% 70% 80% 90% 95%
1 2 4 6 11 20 55 269 1779 9742
** PROC_STAT(0) **: real 2439.237 sec, user 6326.840 sec, sys 773.720 sec, maxrss 95177552.0 kB, maxvsize 130789572.0 kB
[Wed Apr 19 12:10:56 2023] - high frequency kmer depth is set to 13776
[Wed Apr 19 12:10:56 2023] - Total kmers = 728629161
[Wed Apr 19 12:10:56 2023] - average kmer depth = 7
[Wed Apr 19 12:10:56 2023] - 368836231 low frequency kmers (<2)
[Wed Apr 19 12:10:56 2023] - 4011 high frequency kmers (>13776)
Finally obtained only 4 contigs TOT 54784.
How to adjust the parameter to get a reliable output in this case ? thanks alot.

@fkyoung1992
Copy link
Author

Then I used the default parameter and did not choose the -x option like this : (please ignore the different name of input, they are actually the same file )
wtdbg2 -t 36 -i SRR16122634.fastq.gz -g 600m -e 3 -o beichai
obtained the Kmer distribution like this:
|
|
|
|
|
|
|
|
|
|
|
||
||
||
||
|||
|||
||||
|||||
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
********************** 1 - 201 **********************
Quatiles:
10% 20% 30% 40% 50% 60% 70% 80% 90% 95%
1 2 3 6 10 22 99 507 3494 20155

If the kmer distribution is not good, please kill me and adjust -k, -p, and -K

Cannot get a good distribution anyway, should adjust -S -s, also -A -e in assembly

** PROC_STAT(0) **: real 3155.775 sec, user 6807.670 sec, sys 931.790 sec, maxrss 118617004.0 kB, maxvsize 189408212.0 kB
[Mon Apr 17 16:54:12 2023] - high frequency kmer depth is set to 23594
[Mon Apr 17 16:54:12 2023] - Total kmers = 674175558
[Mon Apr 17 16:54:12 2023] - average kmer depth = 6
[Mon Apr 17 16:54:12 2023] - 364160349 low frequency kmers (<2)
[Mon Apr 17 16:54:12 2023] - 2058 high frequency kmers (>23594)
[Mon Apr 17 16:54:12 2023] - indexing 310013151 kmers, 2157318418 instances (at most)

this time seemed to get a much better assemble result :
[Mon Apr 17 18:33:17 2023] Estimated: TOT 305317120, CNT 11722, AVG 26047, MAX 212992, N50 31744, L50 2891, N90 14848, L90 8550, Min 5120
[Mon Apr 17 18:33:35 2023] output 11722 contigs
But still only half of the expected size. So how should I adjust the parameter. Any of your response would be very appreciated!!

@ruanjue
Copy link
Owner

ruanjue commented Apr 23, 2023

Please have a look at #259

@fkyoung1992
Copy link
Author

Please have a look at #259

Thank you for your reply.
As you suggested in the cited case, I tried wtdbg2 -g 600m -t 0 -p 0 -k 19 -AS 4 -K 0.05 -s 0.3 -i SRR16122634.fastq.gz -o bei424 however obtained a much worse result (see as followed ) compared to the default parameter as you could see in my second comment.
[Tue Apr 25 15:20:13 2023] Estimated: TOT 528640, CNT 21, AVG 25174, MAX 81664, N50 59136, L50 4, N90 10240, L90 14, Min 5376
[Tue Apr 25 15:20:13 2023] output 21 contigs.

I tried hifiasm too but the software ran for weeks and didnot output any log and files which seemed to be abnormal for a 600m genome. So I really need your help. Do you have any other suggestions for wtdbg2?

@ruanjue
Copy link
Owner

ruanjue commented Apr 26, 2023

Please check your fastq data first, and read the #259 again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants