Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mem_mb for Clumpify rule #6

Open
szsctt opened this issue Oct 20, 2021 · 4 comments
Open

mem_mb for Clumpify rule #6

szsctt opened this issue Oct 20, 2021 · 4 comments

Comments

@szsctt
Copy link
Collaborator

szsctt commented Oct 20, 2021

Clumpify seems to sometimes use a ton of memory (usually I use subs=2). For example, when I have input files like so:

-r--r--r-- 1 sco305 hpc-users 1.6G Sep  9 14:33 /scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in20_JBH4G_ATCCAGAG-CGACGTTA_L001_R2.fastq.gz
-r--r--r-- 1 sco305 hpc-users 1.4G Sep  9 14:33 /scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in20_JBH4G_ATCCAGAG-CGACGTTA_L001_R1.fastq.gz

And I run the clumpify rule on a cluster, and use sacct to check how the job went, I see this:

       JobID    JobName  Partition      User  AllocCPUS   NNodes    Elapsed   TotalCPU      State  MaxVMSize     MaxRSS     ReqMem        NodeList 
------------ ---------- ---------- --------- ---------- -------- ---------- ---------- ---------- ---------- ---------- ---------- --------------- 
56175482     snakejob.+         h2    sco305          1        1   00:30:23  28:43.027    TIMEOUT                          30968Mn            c204 
56175482.ba+      batch                               1        1   00:30:26  28:43.027  CANCELLED  33022032K  18715348K    30968Mn            c204 

This particular job didn't even finish, but it's already had a MaxVMSize of ~33GB, and MaxRSS of ~18GB.

Sometimes I also see errors in the clumpify rule like so:

Activating singularity image /scratch1/sco305/intvi_cmri/tools_align/.snakemake/singularity/8c5d4fe7802fb686d4db98d5a5773fce.simg
java -ea -Xmx21912m -Xms21912m -cp /opt/conda/opt/bbmap-38.86-0/current/ clump.Clumpify -Xmx21912m in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/6M_A_1in5_JBH4G_CGGCTAAT-AGAACGAG_L001_R1.fastq.gz in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/6M_A_1in5_JBH4G_CGGCTAAT-AGAACGAG_L001_R2.fastq.gz out1=/scratch2/sco305/intvi_cmri/out_align/AGRF_CAGRF20083481_JBH4G/dedup_reads/6M_A_1in5_JBH4G_CGGCTAAT-AGAACGAG.1.fastq.gz out2=/scratch2/sco305/intvi_cmri/out_align/AGRF_CAGRF20083481_JBH4G/dedup_reads/6M_A_1in5_JBH4G_CGGCTAAT-AGAACGAG.2.fastq.gz dedupe=t ac=f subs=2 threads=1
Executing clump.Clumpify [-Xmx21912m, in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/6M_A_1in5_JBH4G_CGGCTAAT-AGAACGAG_L001_R1.fastq.gz, in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/6M_A_1in5_JBH4G_CGGCTAAT-AGAACGAG_L001_R2.fastq.gz, out1=/scratch2/sco305/intvi_cmri/out_align/AGRF_CAGRF20083481_JBH4G/dedup_reads/6M_A_1in5_JBH4G_CGGCTAAT-AGAACGAG.1.fastq.gz, out2=/scratch2/sco305/intvi_cmri/out_align/AGRF_CAGRF20083481_JBH4G/dedup_reads/6M_A_1in5_JBH4G_CGGCTAAT-AGAACGAG.2.fastq.gz, dedupe=t, ac=f, subs=2, threads=1]
Version 38.86

java.lang.Exception: 
Mismatch between length of bases and qualities for read 1010 (id=M00859:339:000000000-JBH4G:1:1101:10421:3670 1:N:0:CGGCTAAT+AGAACGAG).
# qualities=223, # bases=300

CCCCCGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGEGGFDDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG8FGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@FFGGGCFFFGGGEGGGGGGGGGGFGGGGGGGGGGGGFGGGGGGGGCGGGGFGFCFGG+2@FFEEGCCFGEGGGG9++5<CEG
AAATTAGCCGGGTGTGGTGGCAGGCGCCTGTAGTCCCAGCTACTCAGGAGGCTGAGGCAGGAGAATTGCTTGAACCTAGGAGGCGGAGGTTGCAGTGAGCAGAGATCGCTCCATTGCACTCCAGCCTGGGCGACGAGCGAAACTCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGGCTAATTCTTCTGCCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAACACTATACTGACAACTTATTCACAACGACATCGAGAGCTATGCTGTATGGTTTTGTGGCCTGCGGGT

This can be bypassed with the flag 'tossbrokenreads' or 'nullifybrokenquality'
	at shared.KillSwitch.kill(KillSwitch.java:96)
	at stream.Read.validateQualityLength(Read.java:216)
	at stream.Read.validate(Read.java:104)
	at stream.Read.<init>(Read.java:76)
	at stream.Read.<init>(Read.java:50)
	at stream.FASTQ.quadToRead_slow(FASTQ.java:824)
	at stream.FASTQ.toReadList(FASTQ.java:659)
	at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:107)
	at stream.FastqReadInputStream.hasMore(FastqReadInputStream.java:73)
	at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:667)
	at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:656)

However, when I check the sequence and quality of the read in question, they're the same length. I assume this is a memory-related issue.

I've yet to work out how to work out the best way to calculate how much memory to give these jobs. I suppose it depends on the size of the input files and the number of substitutions?

@TheBready
Copy link
Collaborator

I hadn't had this issue yet. Do you think this could be related to the cluster execution? For my latest runs setting the Java VM to the sum of all input files worked fine.

@szsctt
Copy link
Collaborator Author

szsctt commented Oct 20, 2021

I can't think of a reason that it would be specific to the cluster, except that on the cluster memory usage is enforced. If you run everything on one machine it's not (unless you use all of the memory of that machine), so if the jobs aren't all running at the same time then maybe you just get away with it? Have you checked to see how much memory these jobs use in your case?

@szsctt
Copy link
Collaborator Author

szsctt commented Oct 21, 2021

Perhaps it could be related to this: https://www.mail-archive.com/slurm-dev@schedmd.com/msg09340.html
But in this case I'm only using one thread for Clumpify so it's probably not exactly the same issue

@szsctt
Copy link
Collaborator Author

szsctt commented Oct 21, 2021

Did a few more tests today on these files:

-r--r--r-- 1 sco305 hpc-users 394M Sep  9 14:33 /scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/13F_A_1in5_JBH4G_TACGCTAC-ATCACACG_L001_R2.fastq.gz
-r--r--r-- 1 sco305 hpc-users 340M Sep  9 14:33 /scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/13F_A_1in5_JBH4G_TACGCTAC-ATCACACG_L001_R1.fastq.gz

I first requested an interactive session via salloc --mem 100gb.

Then ran clumpify in a singularity container as follows:

/usr/bin/time singularity exec bbmap_1.sif clumpify.sh in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/13F_A_1in5_JBH4G_TACGCTAC-ATCACACG_L001_R1.fastq.gz in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/13F_A_1in5_JBH4G_TACGCTAC-ATCACACG_L001_R2.fastq.gz out1=test.1.fastq.gz out2=test.2.fastq.gz dedupe=t ac=f subs=2 threads=1

The output of this was:

java -ea -Xmx65103m -Xms65103m -cp /opt/conda/opt/bbmap-38.86-0/current/ clump.Clumpify in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/13F_A_1in5_JBH4G_TACGCTAC-ATCACACG_L001_R1.fastq.gz in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/13F_A_1in5_JBH4G_TACGCTAC-ATCACACG_L001_R2.fastq.gz out1=test.1.fastq.gz out2=test.2.fastq.gz dedupe=t ac=f subs=2 threads=1
Executing clump.Clumpify [in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/13F_A_1in5_JBH4G_TACGCTAC-ATCACACG_L001_R1.fastq.gz, in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/13F_A_1in5_JBH4G_TACGCTAC-ATCACACG_L001_R2.fastq.gz, out1=test.1.fastq.gz, out2=test.2.fastq.gz, dedupe=t, ac=f, subs=2, threads=1]
Version 38.86

Read Estimate:          38429734
Memory Estimate:        29319 MB
Memory Available:       51566 MB
Set groups to 1
Executing clump.KmerSort1 [in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/13F_A_1in5_JBH4G_TACGCTAC-ATCACACG_L001_R1.fastq.gz, in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/13F_A_1in5_JBH4G_TACGCTAC-ATCACACG_L001_R2.fastq.gz, out1=test.1.fastq.gz, out2=test.2.fastq.gz, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, dedupe=t, threads=1]

Set threads to 1
Making comparator.
Made a comparator with k=31, seed=1, border=1, hashes=4
Starting cris 0.
Fetching reads.
Making fetch threads.
Starting threads.
Waiting for threads.
Fetch time: 	31.904 seconds.
Closing input stream.
Combining thread output.
Combine time: 	0.015 seconds.
Sorting.
Sort time: 	0.217 seconds.
Making clumps.
Clump time: 	0.351 seconds.
Deduping.
Dedupe time: 	0.927 seconds.
Writing.
Waiting for writing to complete.
Write time: 	232.567 seconds.
Done!
Time:                         	266.141 seconds.
Reads Processed:         3653k 	13.73k reads/sec
Bases Processed:         1096m 	4.12m bases/sec

Reads In:              3653538
Clumps Formed:          769778
Duplicates Found:        36292
Reads Out:             3617246
Bases Out:          1085173800
Total time: 	266.260 seconds.
257.46user 4.19system 4:26.99elapsed 98%CPU (0avgtext+0avgdata 4811160maxresident)k
1561081inputs+1382890outputs (83major+32448minor)pagefaults 0swaps

So in this case GNU time says that the max RSS is 4.8 GB.

After this run, sstat outputs the following:

sstat 56215666
       JobID  MaxVMSize  MaxVMSizeNode  MaxVMSizeTask  AveVMSize     MaxRSS MaxRSSNode MaxRSSTask     AveRSS MaxPages MaxPagesNode   MaxPagesTask   AvePages     MinCPU MinCPUNode MinCPUTask     AveCPU   NTasks AveCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ConsumedEnergy  MaxDiskRead MaxDiskReadNode MaxDiskReadTask  AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite TRESUsageInAve TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutAve TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot 
------------ ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ------------- ------------- ------------- -------------- ------------ --------------- --------------- ------------ ------------ ---------------- ---------------- ------------ -------------- -------------- ------------------ ------------------ -------------- ------------------ ------------------ -------------- --------------- --------------- ------------------- ------------------- --------------- ------------------- ------------------- --------------- 
56215666.0    69933236K           c114              0    734384K   4882584K       c114          0     46064K      195         c114              0        128  03:59.000       c114          0  00:03.000        1      2.59M       Unknown       Unknown       Unknown              0    798023040            c114               0    798023040    724771006             c114                0    724771006 cpu=00:00:03,+ cpu=00:03:59,+ cpu=c114,energy=c+ cpu=00:00:00,fs/d+ cpu=00:03:59,+ cpu=c114,energy=c+ cpu=00:00:00,fs/d+ cpu=00:00:03,+ energy=0,fs/di+ energy=0,fs/di+ energy=c114,fs/dis+           fs/disk=0 energy=0,fs/di+ energy=c114,fs/dis+           fs/disk=0 energy=0,fs/di+ 

SLURM seems to agree that the max RSS was 4.8GB. So I don't think that the problem is that SLURM is incorrectly measuring the memory used.

Trying with some different files:

-r--r--r-- 1 sco305 hpc-users 869M Sep  9 14:33 /scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R2.fastq.gz
-r--r--r-- 1 sco305 hpc-users 757M Sep  9 14:33 /scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R1.fastq.gz

Running this command:

/usr/bin/time singularity exec bbmap_1.sif clumpify.sh in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R1.fastq.gz in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R2.fastq.gz out1=test.1.fastq.gz out2=test.2.fastq.gz dedupe=t ac=f subs=2 threads=1

I immediately get this output:

java -ea -Xmx65413m -Xms65413m -cp /opt/conda/opt/bbmap-38.86-0/current/ clump.Clumpify in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R1.fastq.gz in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R2.fastq.gz out1=test.1.fastq.gz out2=test.2.fastq.gz dedupe=t ac=f subs=2 threads=1
Executing clump.Clumpify [in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R1.fastq.gz, in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R2.fastq.gz, out1=test.1.fastq.gz, out2=test.2.fastq.gz, dedupe=t, ac=f, subs=2, threads=1]
Version 38.86

java.lang.Exception: 
Mismatch between length of bases and qualities for read 1042 (id=M00859:339:000000000-JBH4G:1:1101:8373:2725 1:N:0:AGAGTAGC+TACGCCTT).
# qualities=186, # bases=300

CCCCCFGGGGGGEGGGGGGGGDFGGGEFGGGGGGGGGGGGDEGGGGGGGGGGGFFGFGGGGGGGGGGGGGGFGFGGGGGGGGGGGGGGFFFGGGGGGGGGGGGGGGGGGGGFGGGGGGGDGGGGGGGGGGGGCGGGGGGGFGG8FGGGGGGGGCGGGGGGGFFGGGGGGGGGGGGGGGGGGDGFGG
TCCACTCACAGCTCCAGCGCTGGGCTGTGGTTGAGGTCGCCTGCCCTCGGTAGCTCCTGGGCATTTCTTCCCCTCTCTGGGCCTTTGTTTTCCCATCTGCACAATGACCCCCACTCTAAGCCCTGCTGTCCCTCCCACCTGTGGAACTGAGTGAGCAGCAGCAATGTCCCACCTTTCCTGCTCTCCTCAAGCTCTCCTCAAGCTCTGTCTCTTCTGGCAGGCACAGGAGAGTGGCCTGAAGGCTGGCAGGAGGTTGCCGCCCCTCCAACCTGAGACCGGAAGAGCACACGTCTGAACTCC

This can be bypassed with the flag 'tossbrokenreads' or 'nullifybrokenquality'
	at shared.KillSwitch.kill(KillSwitch.java:96)
	at stream.Read.validateQualityLength(Read.java:216)
	at stream.Read.validate(Read.java:104)
	at stream.Read.<init>(Read.java:76)
	at stream.Read.<init>(Read.java:50)
	at stream.FASTQ.quadToRead_slow(FASTQ.java:824)
	at stream.FASTQ.toReadList(FASTQ.java:659)
	at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:107)
	at stream.FastqReadInputStream.hasMore(FastqReadInputStream.java:73)
	at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:667)
	at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:656)
0.38user 0.35system 0:00.91elapsed 80%CPU (0avgtext+0avgdata 179860maxresident)k
25868inputs+64outputs (70major+15129minor)pagefaults 0swaps

Clumpify has already allocated itself 65413MB, which seems like a lot, but if I try to allocate more by adding the argument -Xmx90000m to the previous command:

/usr/bin/time singularity exec bbmap_1.sif clumpify.sh in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R1.fastq.gz in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R2.fastq.gz out1=test.1.fastq.gz out2=test.2.fastq.gz dedupe=t ac=f subs=2 threads=1 -Xmx90000m

This runs ok:

java -ea -Xmx90000m -Xms90000m -cp /opt/conda/opt/bbmap-38.86-0/current/ clump.Clumpify in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R1.fastq.gz in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R2.fastq.gz out1=test.1.fastq.gz out2=test.2.fastq.gz dedupe=t ac=f subs=2 threads=1 -Xmx90000m
Executing clump.Clumpify [in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R1.fastq.gz, in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R2.fastq.gz, out1=test.1.fastq.gz, out2=test.2.fastq.gz, dedupe=t, ac=f, subs=2, threads=1, -Xmx90000m]
Version 38.86

Read Estimate:          85193729
Memory Estimate:        64997 MB
Memory Available:       71301 MB
Set groups to 11
Executing clump.KmerSplit [in1=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R1.fastq.gz, in2=/scratch1/sco305/intvi_cmri/data/reads/AGRF_CAGRF20083481_JBH4G/21F_B_1in5_JBH4G_AGAGTAGC-TACGCCTT_L001_R2.fastq.gz, out=test.1_clumpify_p1_temp%_4a077ca6c95af3e4.fastq.gz, out2=, groups=11, ecco=false, addname=f, shortname=f, unpair=false, repair=f, namesort=f, ow=true, dedupe=t, threads=1]

Set threads to 1
Reset INTERLEAVED to false because paired input files were specified.
Set INTERLEAVED to false
Input is being processed as paired
Writing interleaved.
Made a comparator with k=31, seed=1, border=1, hashes=4
Time:                         	189.051 seconds.
Reads Processed:       8539k 	45.17k reads/sec
Bases Processed:       2561m 	13.55m bases/sec
Executing clump.KmerSort3 [in1=test.1_clumpify_p1_temp%_4a077ca6c95af3e4.fastq.gz, in2=, out=test.1.fastq.gz, out2=test.2.fastq.gz, groups=11, ecco=f, addname=false, shortname=f, unpair=f, repair=false, namesort=false, ow=true, dedupe=t, threads=1]

Set threads to 1
Making comparator.
Made a comparator with k=31, seed=1, border=1, hashes=4
Making 2 fetch threads.
Starting threads.
Fetching reads.
Fetched 354080 reads: 	11.106 seconds.
Making clumps.
Clump time: 	0.143 seconds.
Deduping.
Dedupe time: 	4.065 seconds.
Writing.
Fetching reads.
Fetched 390256 reads: 	0.000 seconds.
Making clumps.
Clump time: 	0.377 seconds.
Deduping.
Dedupe time: 	5.042 seconds.
Writing.
Fetching reads.
Fetched 364669 reads: 	15.920 seconds.
Making clumps.
Clump time: 	0.211 seconds.
Deduping.
Dedupe time: 	4.567 seconds.
Writing.
Fetching reads.
Fetched 336018 reads: 	0.000 seconds.
Making clumps.
Clump time: 	0.102 seconds.
Deduping.
Dedupe time: 	1.336 seconds.
Writing.
Fetching reads.
Fetched 267952 reads: 	0.000 seconds.
Making clumps.
Clump time: 	0.067 seconds.
Deduping.
Dedupe time: 	0.995 seconds.
Writing.
Fetching reads.
Fetched 281561 reads: 	0.000 seconds.
Making clumps.
Clump time: 	0.091 seconds.
Deduping.
Dedupe time: 	0.626 seconds.
Writing.
Fetching reads.
Fetched 486399 reads: 	0.000 seconds.
Making clumps.
Clump time: 	0.165 seconds.
Deduping.
Dedupe time: 	6.414 seconds.
Writing.
Fetching reads.
Fetched 619054 reads: 	0.000 seconds.
Making clumps.
Clump time: 	0.193 seconds.
Deduping.
Dedupe time: 	9.388 seconds.
Writing.
Fetching reads.
Fetched 229377 reads: 	0.000 seconds.
Making clumps.
Clump time: 	0.063 seconds.
Deduping.
Dedupe time: 	0.264 seconds.
Writing.
Fetching reads.
Fetched 329919 reads: 	0.000 seconds.
Making clumps.
No more reads to fetch.
Adding poison.
Clump time: 	0.073 seconds.
Deduping.
Dedupe time: 	1.695 seconds.
Writing.
Fetching reads.
Encountered poison; count=1
Fetched 610275 reads: 	0.000 seconds.
Making clumps.
A fetch thread finished.
No more reads to fetch.
Adding poison.
Clump time: 	0.162 seconds.
Deduping.
Dedupe time: 	6.823 seconds.
Writing.
Closing fetch threads.
A fetch thread finished.
Closed fetch threads.
Waiting for writing to complete.
Write time: 	93.750 seconds.
Done!
Time:                         	530.144 seconds.
Reads Processed:         8539k 	16.11k reads/sec
Bases Processed:         2561m 	4.83m bases/sec

Reads In:              8539120
Clumps Formed:         1136193
Duplicates Found:       231488
Reads Out:             8307632
Bases Out:          2492289600
Total time: 	719.439 seconds.
674.98user 32.54system 12:00.19elapsed 98%CPU (0avgtext+0avgdata 17765540maxresident)k
7317969inputs+6972593outputs (72major+80465minor)pagefaults 0swaps

So in this case Clumpify crashed if I gave it less than 70GB, but then it only had a max RSS of 17.8GB.

And afterwards, I checked the memory usage with sstat:

      JobID  MaxVMSize  MaxVMSizeNode  MaxVMSizeTask  AveVMSize     MaxRSS MaxRSSNode MaxRSSTask     AveRSS MaxPages MaxPagesNode   MaxPagesTask   AvePages     MinCPU MinCPUNode MinCPUTask     AveCPU   NTasks AveCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ConsumedEnergy  MaxDiskRead MaxDiskReadNode MaxDiskReadTask  AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite TRESUsageInAve TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutAve TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot 
------------ ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ------------- ------------- ------------- -------------- ------------ --------------- --------------- ------------ ------------ ---------------- ---------------- ------------ -------------- -------------- ------------------ ------------------ -------------- ------------------ ------------------ -------------- --------------- --------------- ------------------- ------------------- --------------- ------------------- ------------------- --------------- 
56215666.0    96174344K           c114              0     25756K  17800432K       c114          0      7348K      195         c114              0          0  11:36.000       c114          0  00:00.000        1      2.59M       Unknown       Unknown       Unknown              0   4574691674            c114               0   4574691674   4231048201             c114                0   4231048201 cpu=00:00:00,+ cpu=00:11:36,+ cpu=c114,energy=c+ cpu=00:00:00,fs/d+ cpu=00:11:36,+ cpu=c114,energy=c+ cpu=00:00:00,fs/d+ cpu=00:00:00,+ energy=0,fs/di+ energy=0,fs/di+ energy=c114,fs/dis+           fs/disk=0 energy=0,fs/di+ energy=c114,fs/dis+           fs/disk=0 energy=0,fs/di+ 

SLURM agrees that the max RSS is 17.8GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants