1) AfterQC is slow 2)Aggregate results from many samples into a single report #18

gk-bioin4m8x · 2017-05-18T15:01:01Z

Hi,
I am running 20 paired-end RNA-seq samples since yesterday (more than 24 hours over) and only 7 samples have been completed (others are still running) on 16 GB RAM computer.
Any way to make it faster?

Secondly, I am wondering if there is any possibility to aggregate results from many samples into a single report?As per MultiQC [https://github.com/ewels/MultiQC], AfterQC is not in their list of supported tools.

Any suggestions please!
Thanks!

alnf · 2017-05-18T15:04:44Z

I second this:) I had to analyze with FastQC after AfterQC, so I can aggregate with MultiQC. I think maybe adding issue to MultiQC repo can motivate authors to implement AfterQC support.

sfchen · 2017-05-18T15:21:11Z

@gk-bioin4m8x have you installed editdistance module? Are u using pypy or native python?

alnf · 2017-05-18T15:24:52Z

@gk-bioin4m8x Better to use different issues for different things. I was replying to the aggregation issue only.

gk-bioin4m8x · 2017-05-18T15:33:24Z

@sfchen
I am using AfterQC inside cygwin (on 64-bit WIndows 8.1). I did as follows inside AfterQC folder:

# editdistance
>make editdistance                        # displayed message of successful installation

# bash script for multiple samples (only showing the syntax)
>python after.py -d input_folder -g good_out -b bad_out

gk-bioin4m8x · 2017-05-18T15:37:16Z

@alnf
I was editing my issue and saw your reply after I submitted comment. I thought may be because of many samples it may be slow, so combined two issues.
.

sfchen · 2017-05-19T00:11:50Z

@gk-bioin4m8x How big is your sample data? Did you run them concurrently, or just one by one?

sfchen · 2017-05-19T00:13:51Z

@gk-bioin4m8x I saw you run with the whold folder, it is correct. But performance may be decreased in cygwin.

If you didn't see any warning information of editdistance when you run AfterQC, then it is installed well.

gk-bioin4m8x · 2017-05-19T07:23:00Z

@sfchen

Each sample is around 10-12 GB (R1 ~ 5-6GB, R2 ~ 5-6GB).
I ran them together via bash script.
Yes, no warning while installing editdistance.

gk-bioin4m8x · 2017-05-25T14:34:15Z

@sfchen As integration of AfterQC inside MultiQC is in process, meanwhile do you recommend any other way to integrate AfterQC output from multiple samples?

sfchen · 2017-05-25T15:08:58Z

gk-bioin4m8x how about a 2-column framed homepage, whose left column contains links to different samples and right column is the QC report of corresponding sample?

gk-bioin4m8x · 2017-05-25T15:17:58Z

I have AfterQC output from 20 samples, so 20 html files with following features:

AfterQC summary (for General Stats table):
sequencing: .......... pair end
estimated seq error: ...........%
total reads: ...........
filtered out reads: .......... (.................%)
total bases: ..........................
filtered out bases: ........................... (.............................%)
auto trimming front:...., tail:.... (use -f0 -t0 to disable)
Good reads and bad reads after filtering (filtering statistics)
Sequencing error transform distribution
Pair Overlap length histogram
Read1 quality curve before filtering
Read1 base content distribution before filtering
Read1 GC curve before filtering
Read1 per base discontinuity before filtering
Read1 kmer strand bias before filtering
Read1 quality curve after filtering
Read1 base content distribution after filtering
Read1 GC curve after filtering
Read1 per base discontinuity after filtering
Read1 kmer strand bias after filtering
Read2 quality curve before filtering
Read2 base content distribution before filtering
Read2 GC curve before filtering
Read2 per base discontinuity before filtering
Read2 kmer strand bias before filtering
Read2 quality curve after filtering
Read2 base content distribution after filtering
Read2 GC curve after filtering
Read2 per base discontinuity after filtering
Read2 kmer strand bias after filtering

I am interested for:
(i) a common table with columns from summary section for all samples (good for comparison).
(ii) A combined plot for each feature (2 to 24) from each sample. For e.g. 23 plots, each plot shows aggregated results from all samples.

sfchen · 2017-05-25T15:21:12Z

Got your ideas. You mean each plot combines results of all samples.

gk-bioin4m8x · 2017-05-25T15:21:54Z

Yes and it will be very helpful in comparisons. :)

sfchen · 2017-05-25T15:23:22Z

Seems a good idea, although it may take more effort to implement. I will figure out how to realize that.

gk-bioin4m8x · 2017-05-25T15:25:29Z

Just curious to know, how much time will it take. :P

sfchen · 2017-05-25T15:29:29Z

As I mentioned above, aggregating all results is not easy and need more effort. I think it may take a couple of weeks, considering that I am also busy on other projects (e.g MutScan)

gk-bioin4m8x · 2017-05-26T09:12:08Z

No problem, take your time. :)

sfchen · 2017-07-19T12:33:43Z

AfterQC should be much faster with peppy now. Please try v0.9.4

gk-bioin4m8x changed the title ~~Aggregate results from many samples into a single report~~ 1) AfterQC is slow 2)Aggregate results from many samples into a single report May 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1) AfterQC is slow 2)Aggregate results from many samples into a single report #18

1) AfterQC is slow 2)Aggregate results from many samples into a single report #18

gk-bioin4m8x commented May 18, 2017 •

edited

alnf commented May 18, 2017

sfchen commented May 18, 2017

alnf commented May 18, 2017 •

edited

gk-bioin4m8x commented May 18, 2017 •

edited

gk-bioin4m8x commented May 18, 2017

sfchen commented May 19, 2017

sfchen commented May 19, 2017

gk-bioin4m8x commented May 19, 2017 •

edited

gk-bioin4m8x commented May 25, 2017

sfchen commented May 25, 2017

gk-bioin4m8x commented May 25, 2017

sfchen commented May 25, 2017

gk-bioin4m8x commented May 25, 2017 •

edited

sfchen commented May 25, 2017

gk-bioin4m8x commented May 25, 2017

sfchen commented May 25, 2017

gk-bioin4m8x commented May 26, 2017

sfchen commented Jul 19, 2017

1) AfterQC is slow 2)Aggregate results from many samples into a single report #18

1) AfterQC is slow 2)Aggregate results from many samples into a single report #18

Comments

gk-bioin4m8x commented May 18, 2017 • edited

alnf commented May 18, 2017

sfchen commented May 18, 2017

alnf commented May 18, 2017 • edited

gk-bioin4m8x commented May 18, 2017 • edited

gk-bioin4m8x commented May 18, 2017

sfchen commented May 19, 2017

sfchen commented May 19, 2017

gk-bioin4m8x commented May 19, 2017 • edited

gk-bioin4m8x commented May 25, 2017

sfchen commented May 25, 2017

gk-bioin4m8x commented May 25, 2017

sfchen commented May 25, 2017

gk-bioin4m8x commented May 25, 2017 • edited

sfchen commented May 25, 2017

gk-bioin4m8x commented May 25, 2017

sfchen commented May 25, 2017

gk-bioin4m8x commented May 26, 2017

sfchen commented Jul 19, 2017

gk-bioin4m8x commented May 18, 2017 •

edited

alnf commented May 18, 2017 •

edited

gk-bioin4m8x commented May 18, 2017 •

edited

gk-bioin4m8x commented May 19, 2017 •

edited

gk-bioin4m8x commented May 25, 2017 •

edited