/
OpenCytoTutorial.Rmd
641 lines (545 loc) · 28.6 KB
/
OpenCytoTutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
---
title: "OpenCyto Tutorial<br><font
size=6>Robust and Reproducible Automated Gating of Cytometry Data<br>BioC 2014</font>"
author: "Greg Finak, PhD<br>Staff Scientist<br>Vaccine and Infectious Disease Division,
Fred Hutchinson Cancer Research Center"
date: "July, 2014"
output:
ioslides_presentation:
fig_caption: yes
fig_retina: null
pandoc_args:
- -c
- mystyles.css
widescreen: yes
---
## What is OpenCyto? {.smaller}
**Not an algorithm, but a <emph>*framework*</emph> for automated gating.**
**Goals**
* Easily build <emph>*reproducible gating pipelines*</emph>.
* <emph>*Use any gating algorithm*</emph>
* interchange any algorithm at any step (support gating plugins)
* Simplify <emph>data handling and data management</emph>.
* Easily pass subsets of the data (cell subsets) to different gating algorithms.
* <emph>Simple(r) pipeline template definitions</emph>
* Pipeline defined via text file (csv)
* Templates and code are <emph>*re-usable*</emph> for standardized assays and data.
* <emph>Facilitate comparative analysis</emph>
* Import manually gated data from FlowJo workspaces
* <emph>Scale to *large* data sets </emph>
* HDF5 support - data sets limited by disk space not RAM.
<footer>Fred Hutchinson Cancer Research Center</footer>
## Overview {.smaller}
<p align="center">Raw data ➙ Preprocessing ➙ Annotation ➙ Gating ➙ Statistical analysis ➙ Output</p>
```{r framework_figure, echo=FALSE,fig.cap='The OpenCyto Gating Framework is a collection of R/BioConductor packages for easily building reproducible flow data analysis pipelines.',message=FALSE,fig.align='center',fig.width=8,cache=TRUE}
require(png)
require(grid)
require(gplots)
img<-readPNG("resources/Framework.png")
grid.raster(img)
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Getting Started | Installation
**Requirements: R + Bioconductor**
* Install *release version* of R from [CRAN](http://cran.r-project.org/).
* Install *release version* of BioConductor from [bioconductor.org/install](http://www.bioconductor.org/install/)
* Install OpenCyto and its dependencies
* Within R type the following:
```{r eval=FALSE}
require(BiocInstaller)
biocLite("openCyto")
```
This installs all the required packages.
<p class="smaller">
Still have problems?
[Bioconductor mailing list](https://stat.ethz.ch/mailman/listinfo/bioconductor)
Email: [Mike Jiang](wjiang2@fhcrc.org) or [Greg Finak](gfinak@fhcrc.org)
Twitter: [\@OpenCyto](http://www.twitter.com/OpenCyto)
</p>
<footer>Fred Hutchinson Cancer Research Center</footer>
## Getting Started II
*Alternately* if you are brave and want the latest bug fixes and features - *github.com/RGLab*
```{r eval=FALSE}
require(devtools)
packages<-c("RGLab/flowStats","RGLab/flowCore","RGLab/flowViz","RGLab/ncdfFlow","RGLab/flowWorkspace","RGLab/openCyto")
install_github(packages,quick=TRUE)
```
You may use the *devtools* package to install the latest stable versions directly from github.
## A Worked Example | Intracellular Cytokine Staining of Antigen-stimulated T-cells
* Full data set at Flowrepository.org [FR-FCM-ZZ7U](http://flowrepository.org/experiments/254/public_id)
* Batch 0882, 76 sample files, 13 compensation controls.
```{r libs,echo=FALSE,message=FALSE,warning=FALSE}
require(openCyto)
require(data.table)
require(reshape2)
require(clue)
require(ggplot2)
```
```{r echo=1,results='markup',results='hide'}
ws<-openWorkspace("data/workspace/080 batch 0882.xml")
ws
```
<pre>
FlowJo Workspace Version 2.0
File location: data/workspace
File name: 080 batch 0882.xml
Workspace is open.
Groups in Workspace
Name Num.Samples
1 All Samples 158
2 0882-L-080 157
3 Comps 13
4 0882 Samples 76
</pre>
<footer>Fred Hutchinson Cancer Research Center</footer>
## Import Manual Gating (parseWorkspace) {.smaller}
Create a gating set of manual gates.
```{r parseWorkspace_echo,echo=TRUE,eval=FALSE}
gating_set<-parseWorkspace(ws,name="0882 Samples",path="data/FCS/",isNcdf=TRUE)
```
```{r parseWorkspace,echo=FALSE,results='hide',eval=TRUE}
if(!file.exists("data/manual_gating/")){
gating_set<-parseWorkspace(ws,name="0882 Samples",path="data/FCS/",isNcdf=TRUE,overwrite=TRUE)
save_gs(gating_set,path="data/manual_gating")
}else{
gating_set<-load_gs("data/manual_gating/")
}
```
<pre>
Parsing 76 samples
calling c++ parser...
...
</pre>
We now have gated, compensated and transformed data in an <emph>*HDF5*</emph> file represented in a <emph>*GatingSet*</emph> object. We can save it for later use.
```{r eval=FALSE,echo=TRUE,results='hide'}
save_gs(gating_set,path="data/manual_gating")
```
<pre>saving ncdf...
saving tree object...
saving R object...
Done
To reload it, use 'load_gs' function
</pre>
The archived gating set contains all the information on <emph>transformation, compensation, single-cell events, and gates</emph> and can be *shared with collaborators*.
<footer>Fred Hutchinson Cancer Research Center</footer>
## Visualizing the Gating Layout (plotGate)
```{r clean_manual,echo=FALSE,results='hide',warning=FALSE,message=FALSE}
Rm("Not 4+",gating_set)
```
```{r vis_manual_gates,echo=TRUE,eval=TRUE,results='hide',cache=FALSE,message=FALSE,warning=FALSE,fig.align='center',fig.cap='Layout of manual gates'}
plotGate(gating_set[[1]],xbin=16,gpar=list(ncol=5)) # Binning for faster plotting
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Visualizing the Gating Tree (plot)
Calling `plot` on the gating set gives us a view of the gating tree.
```{r plot_manual_tree,echo=2,results="hide",message=FALSE,warning=FALSE,fig.cap=''}
plot(gating_set)
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Annotation {.smaller}
We <emph>annotate</emph> our gating set from the <emph>keywords</emph> and <emph>flowrepository</emph>. We'll keep only the <emph>GAG</emph> and <emph>negative control</emph> stimulations
```{r keyword,echo=c(2,3,4,8,9,10),eval=TRUE}
getKeywords<-function(gs,kv){
r<-as.data.frame(do.call(cbind,lapply(kv,function(k){
keyword(gs,k)[1]
})))
data.table::setnames(r,"$FIL","name")
r
}
keyword_vars<-c("$FIL","Stim","Sample Order","EXPERIMENT NAME") #relevant keywords
pd<-data.table(getKeywords(gating_set,keyword_vars)) #extract relevant keywords to a data table
annotations<-data.table:::fread("data/workspace/pd_submit.csv") # read the annotations from flowrepository
setnames(annotations,"File Name","name")
setkey(annotations,"name")
setkey(pd,"name")
pd<-data.frame(annotations[pd]) #data.table style merge
setnames(pd,c("Timepoint","Individual"),c("VISITNO","PTID"))
pData(gating_set)<-pd #annotate
gating_subset<-gating_set[subset(pd,!is.na(Condition))$name] #subset only the GAG and negctrl
```
```{r keywords_tbl,eval=TRUE,echo=FALSE,results='asis',cache=FALSE}
knitr::kable(head(subset(na.omit(pd)[,c(1:5)],PTID%in%"080-17"),4),row.names = FALSE)
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Clone and save for automated gating {.smaller}
We want to perform automated gating of this data.
* We'll clone the gating set, delete existing nodes and re-save the data as a new gating set.
```{r copy_and_save,eval=FALSE,message=FALSE,warining=FALSE,results='hide'}
auto_gating<-clone(gating_subset)
Rm("S",auto_gating)
save_gs(auto_gating,path="data/autogating",overwrite=TRUE)
```
```{r load_empty,eval=TRUE,echo=FALSE,results='hide',message=FALSE,warning=FALSE}
if(!file.exists("data/autogating")){
auto_gating<-clone(gating_set)
try(Rm("S",auto_gating))
save_gs(auto_gating,path="data/autogating",overwrite=TRUE)
}else{
auto_gating<-load_gs("data/autogating/")
}
```
```{r echo=TRUE,eval=TRUE}
list.files("data/autogating")
```
* `.nc` file is the HDF5 file of event-level data..
* `.dat` file contains the gating set representation from the `C` data structure.
* `.rds` file is an `R` data file that contains the R-object information.
Send it to a friend, `load_gs()` will read it all in and the data will be available.
<footer>Fred Hutchinson Cancer Research Center</footer>
## Costructing a Template - I {.smaller}
```{r read_template,echo=FALSE,results='asis',cache=FALSE}
template<-read.csv("data//template//gt_080.csv",as.is=FALSE,allowEscapes=TRUE,stringsAsFactors=FALSE)
template$collapseDataForGating[is.na(template$collapseDataForGating)]<-" "
Kmisc::htmlTable(data.frame(template)[1:8,c(1:6,8,9)],attr="width=100%")
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Costructing a Template - II {.smaller}
Each row defines a cell population
* `alias`: how we refer to the population / shorthand
* `pop`: The population definition i.e. do we keep the positive (+) or negative (-) cells for a marker / pair of markers after gating.
* `parent`: The alias of the parent population on which the current population is defined
* `dims`: The dimensions / markers used to define this cell population.
* `gating_method`: Which gating algorithm to use.
* `gating_args`: additional arguments passed to the gating method to tweak various parameters
* `collaseDataForGating`: TRUE or FALSE. Together with groupBy will gate multiple samples with a common gate.
* `groupBy`: Specify metadata variables for combining samples (e.g. PTID)
* `preprocessing_method`: advanced use for some gating methods
* `preprocessing_args`: additional arguments
<footer>Fred Hutchinson Cancer Research Center</footer>
## Constructing a Template - III
We read in the template and visualize it
```{r read_plot_template,echo=TRUE,message=FALSE,warning=FALSE,results='hide',fig.align="center",fig.width=8,fig.height=5,out.width=600,dpi=200,fig.cap=''}
gt<-gatingTemplate("data/template/gt_080.csv")
plot(gt)
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Automated Gating
openCyto walks through the template and gates each population in each sample using the algoirthm named in the template.
```{r gate_subset,echo=TRUE,message=FALSE,warning=FALSE,results='hide',eval=FALSE,fig.cap=''}
gating(x = gt, y = auto_gating)
## Some output..
plot(auto_gating)
```
```{r plot_autogate_tree,echo=FALSE,message=FALSE,warning=FALSE,fig.align='center',out.width=600,results='hide'}
if(length(getNodes(auto_gating))<1){
gating(x = gt, y = auto_gating)
save_gs(auto_gating,path="data/autogating",overwrite = TRUE)
}
setNode(auto_gating,"nonNeutro",FALSE)
setNode(auto_gating,"DebrisGate",FALSE)
setNode(auto_gating,"cd4/cd57-/gzb_gate",FALSE)
setNode(auto_gating,"cd4/cd57-/prf_gate",FALSE)
setNode(auto_gating,"cd8/cd57-/gzb_gate",FALSE)
setNode(auto_gating,"cd8/cd57-",FALSE)
setNode(auto_gating,"cd4/cd57-",FALSE)
setNode(auto_gating,"cd8/cd57-/prf_gate",FALSE)
setNode(auto_gating,"cd4neg",FALSE)
setNode(auto_gating,"cd4pos",FALSE)
setNode(auto_gating,"cd8+",FALSE)
setNode(auto_gating,"cd8-",FALSE)
plot(auto_gating)
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Automated Gating - II {.smaller}
```{r compare_to_manual,echo=TRUE,fig.cap='Automated and Manual Gates for CD4/CD8',cache=FALSE,fig.align='center',fig.width=5,fig.height=2.5,out.width=600}
p1<-plotGate(auto_gating[[1]],c("cd8","cd4"),arrange=FALSE,
projections=list("cd4"=c(x="CD8",y="CD4"),"cd8"=c(x="CD8",y="CD4")),
main="Automated Gate",path=2)[[1]]
p2<-plotGate(gating_subset[[1]],c("8+","4+"),
projections=list("4+"=c(x="CD8",y="CD4"),"8+"=c(x="CD8",y="CD4")),
arrange=FALSE,main="Manual Gate",path=2)[[1]]
grid.arrange(arrangeGrob(p1,p2,ncol=2))
```
Stats and gates are comparable, could be tweaked if necessary, but importantly it's reproducible. Always generate the same result.
<footer>Fred Hutchinson Cancer Research Center</footer>
## Extract Stats and Compare Manual to Automated{.smaller}
```{r extract_stats,echo=FALSE,message=FALSE, warning=FALSE,error=FALSE,results='hide',fig.cap='Comparison of Manual vs Automated Gating Cell Subset Counts',fig.width=12,fig.height=5,cache=FALSE,fig.align='center',out.width=800}
auto_stats<-getPopStats(auto_gating,statistic="count")
manual_stats<-getPopStats(gating_subset,statistic="count")
gates_to_remove<-rownames(manual_stats)[which(sapply(basename(rownames(manual_stats)),function(x)length(which(strsplit(x,"")[[1]]=="+"))>1))]
gates_to_remove<-c(gates_to_remove,rownames(manual_stats)[which(sapply(basename(rownames(manual_stats)),function(x)length(which(strsplit(x,"")[[1]]=="-"))>1))],"/S/Lv/L/Not 4+","/S/Lv/L/3+/8+/Granzyme B-","/S/Lv/L/3+/8+/IFN\\IL2","/S/Lv/L/3+/4+/IFN\\IL2","/S/Lv/L/3+/8+/IFN+IL2-","/S/Lv/L/3+/4+/IFN+IL2-","/S/Lv/L/3+/8+/IFN-IL2+", "/S/Lv/L/3+/4+/Granzyme B-","4+/IFN-IL2+","/S/Lv/L/3+/8+/57-", "/S/Lv/L/3+/4+/57-" )
for(i in gates_to_remove){
try(Rm(i,gating_set),silent=TRUE)
}
.getCounts<-function(stat="count"){
auto_stats<-getPopStats(auto_gating,statistic=stat)
manual_stats<-getPopStats(gating_subset,statistic=stat)
#combine
melted_auto<-melt(auto_stats)
melted_manual<-melt(manual_stats)
setnames(melted_manual,c("population","file","counts"))
setnames(melted_auto,c("population","file","counts"))
melted_auto<-subset(melted_auto,!population%in%c("boundary","root"))
melted_auto$population<-factor(melted_auto$population)
melted_manual<-subset(melted_manual,!population%in%c("4+/57-","4+/Granzyme B-","4+/IFN+IL2-","4+/IFN-IL2+","4+/IFN\\IL2","8+/57-","8+/Granzyme B-","8+/IFN+IL2-", "8+/IFN-IL2+" ,"8+/IFN\\IL2","root"))
melted_manual$population<-factor(melted_manual$population)
D<-adist((levels(melted_auto$population)),(levels(melted_manual$population)),ignore.case=TRUE,partial=FALSE)
D<-solve_LSAP(t(D))
data.frame(levels(melted_auto$population)[D],levels(melted_manual$population))
levels(melted_auto$population)[D]<-levels(melted_manual$population)
merged<-rbind(cbind(melted_auto,method="auto"),cbind(melted_manual,method="manual"))
merged
}
merged_counts<-.getCounts(stat="count")
merged_props<-.getCounts(stat="freq")
corr.coef<-cor(dcast(merged_counts,file+population~method,value.var = "counts")[,c("auto","manual")],use="complete")
corr.coef.freq<-cor(dcast(merged_props,file+population~method,value.var = "counts")[,c("auto","manual")],use="complete")
et<-element_text(size=20)
et2<-element_text(size=12)
p1<-ggplot(dcast(merged_counts,file+population~method,value.var = "counts"))+geom_point(aes(x=auto,y=manual,color=basename(as.character(population))))+scale_y_log10(Kmisc::wrap("Manual Gating Population Count",30))+scale_x_log10(Kmisc::wrap("Autogating Population Count",30))+theme_bw()+geom_abline(lty=3)+theme(axis.text.x=et,axis.text.y=et,axis.title.x=et,axis.title.y=et,legend.text=et2,legend.position="none",plot.title=et)+scale_color_discrete("Population")+geom_text(aes(x=10,y=100000,label=sprintf("rho=%s",signif(corr.coef[1,2],4))),data=data.frame(corr.coef))+ggtitle("Counts")
p2<-ggplot(dcast(merged_props,file+population~method,value.var = "counts"))+geom_point(aes(x=auto,y=manual,color=basename(as.character(population))))+scale_y_log10(Kmisc::wrap("Manual Gating Population Proportion",30))+scale_x_log10(Kmisc::wrap("Autogating Population Proportion",30))+theme_bw()+geom_abline(lty=3)+theme(axis.text.x=et,axis.text.y=et,axis.title.x=et,axis.title.y=et,legend.text=et2,legend.position="left",plot.title=et)+scale_color_discrete("Population")+geom_text(aes(x=0.01,y=0.75,label=sprintf("rho=%s",signif(corr.coef.freq[1,2],4))),data=data.frame(corr.coef.freq))+ggtitle("Proportions")
grid.draw(cbind(ggplotGrob(p1), ggplotGrob(p2), size="last"))
```
```{r extract_stats_echo,echo=TRUE,eval=FALSE}
#Extract stats
auto_stats<-getPopStats(auto_gating,statistic="count")
manual_stats<-getPopStats(gating_subset,statistic="count")
```
Note Perforin is incorrectly gated in the manual analysis. Minor differences at low end, but <emph>reproducible and objective</emph>.
<footer>Fred Hutchinson Cancer Research Center</footer>
## Perforin - Cytokine gate and reference gate{.smaller}
```{r example_Perforin,echo=FALSE,eval=TRUE,fig.cap='Automated and Manual Gating of Perforin/CD8',cache=FALSE,fig.align='center',fig.width=7.5,fig.height=3.5,out.width=500}
p1<-plotGate(auto_gating[[sampleNames(gating_subset)[7]]],"cd8/Prf",default.y="<Alexa 680-A>",xbin=256,margin=FALSE,path=2,arrange=FALSE,main="Automated",xlab=list(cex=1.5),ylab=list(cex=1.5),scales=list(cex=1.5),par.strip.text=list(cex=1.5))[[1]]
p2<-plotGate(gating_set[[sampleNames(gating_subset)[7]]],"8+/Perforin+",default.y="<Pacific Blue-A>",xbin=256,margin=FALSE,path=2,xlab=list(cex=1.5),ylab=list(cex=1.5),scales=list(cex=1.5),par.strip.text=list(cex=1.5),arrange=FALSE,main="Manual")[[1]]
grid.arrange(p1,p2,ncol=2)
```
```{r example_derivative_gate,echo=FALSE,eval=TRUE,fig.align='center',out.width=500,fig.width=7.5,fig.height=3.5}
fr<-getData(auto_gating[[7]],"cd8/cd57-")
cut<-tailgate(fr,channel="<APC-A>",filter_id="perforin_gate")
dens<-density(exprs(fr[,"<APC-A>"]),adjust=2)
second_deriv <- diff(sign(diff(dens$y)))
which_maxima <- which(second_deriv == -2) + 1
which_maxima <- which_maxima[findInterval(dens$x[which_maxima], range(exprs(fr[,"<APC-A>"]))) == 1]
which_maxima <- which_maxima[order(dens$y[which_maxima], decreasing = TRUE)]
peaks <- dens$x[which_maxima]
deriv_out <- openCyto:::.deriv_density(x = exprs(fr[,"<APC-A>"]), adjust = 2, deriv = 1)
par(mfrow=c(1,2))
plot(dens,main="Density")
abline(v=peaks)
abline(v=cut@min,col="red")
plot(deriv_out,type="l",main="First Derivative",xlab="X",ylab="Density")
abline(v=cut@min,col="red")
abline(v=peaks)
```
Automated gate set on CD57- (reference). Perforin-negative cells included in the manual gate.
<footer>Fred Hutchinson Cancer Research Center</footer>
## TNFa
```{r example_TNFa,echo=FALSE,eval=TRUE,cache=FALSE,fig.cap="Automated and manual gating of TFNa",fig.align='center'}
p1<-plotGate(auto_gating[[sampleNames(gating_subset)[7]]],c("cd8/TNFa","cd4/TNFa"),default.y="<PE Cy7-A>",xbin=256,margin=FALSE,path=2,arrange=FALSE,main="Automated",xlab=list(cex=1.1),ylab=list(cex=1.1),scales=list(cex=1.1),par.strip.text=list(cex=1.1))
p2<-plotGate(gating_set[[sampleNames(gating_subset)[7]]],c("8+/TNFa+","4+/TNFa+"),default.y="<Pacific Blue-A>",xbin=256,margin=FALSE,path=2,xlab=list(cex=1.1),ylab=list(cex=1.1),scales=list(cex=1.1),par.strip.text=list(cex=1.1),arrange=FALSE,main="Manual")
do.call(grid.arrange,c(p1,p2,ncol=2))
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Some Useful Functions {.smaller}
Return a `flowSet` containing <emph>event-level data for the named cell population</emph>.
```{r getData,echo=TRUE,eval=FALSE}
cd3_population<-getData(auto_gating,"cd3")
```
<emph>Plot a named cell population</emph>.
```{r plotGate,echo=TRUE,eval=FALSE}
plotGate(auto_gating,"cd3")
```
<emph>Subset by FCS file(s)</emph>.
```{r subset,echo=TRUE,eval=FALSE}
first_ten_fcs_files<-auto_gating[1:10]
```
List <emph>supported gating methods</emph> (that can be used in a template).
```{r listgtMethods,echo=TRUE,eval=FALSE}
listgtMethods()
```
Register a <emph>new gating or preprocessing plugin</emph>.
```{r register_plugin,echo=TRUE,eval=FALSE}
registerPlugins(myfunction,methodName,dependencies, "preprocessing"|"gating")
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Some More Useful Functions {.smaller}
<emph>Generate a Basic GatingTemplate from a Manual Gating Hierarchy</emph>.
```{r gen_template,echo=TRUE,eval=FALSE,results='asis'}
templateGen(gating_subset[[1]])
```
```{r gen_template_noecho,echo=FALSE,eval=TRUE,results='asis'}
hd<-head(data.frame(templateGen(gating_subset[[1]])),7)
hd[is.na(hd)]<-""
Kmisc::htmlTable(hd)
```
Just fill in the `gating_method` and `dims` to get started.
<footer>Fred Hutchinson Cancer Research Center</footer>
## Some Typical Gating Methods Use Cases
* `mindensity`: Finds the minimum density cut point between two primary populations. Can be restricted to a range of the data.
* `cytokine / tailgate`: Identifies rare populations that are in the tails of a large primary population. Estimates 2st derivative of the denstiy. Smoothing and tolerance can be adjusted.
* `flowClust`: 1D, 2D, or n-Dimensional clustering. Generally useful for lymphocytes. Can infer a data-driven empirical-Bayes prior across samples.
* `singletGate`: Fits a model that approximates a typical singlet gate on scatter area vs height or width.
* `boundary`: Filters out boundary events.
* `refGate`: A *reference* gate. Used to refer to a gate defined elsewhere in the hierarchy, the data-driven threshold can be reused. Similar to *"back-gating"*.
* `flowDensity`: Supported via plugin, density-based gating from our good friends at the BC Cancer Agency.
* Other methods: `rangeGate`, `quadrantGate`, `quantileGate`
<footer>www.biocondcutor.org</footer>
## Where we've used OpenCyto
<emph>Studies wtih 100s of GB of data.</emp>
<emph>None take more than a couple of hours to run.</emph>
* Gating of Lyoplate standardized staining panels (FlowCAP III)
* Many large clinical trial ICS data sets at the HVTN
* ICS data - MTB infected vs. healthy subjects.
* CyTOF combinatorial cytokine data
* Exhaustive gating of polyfunctional T-cell subsets.
Event-level data and cell population memberships can easily be shared with collaborators.
Quickly pushed to downstream analysis.
* MIMOSA (PMC3862207)
* COMPASS (http://rglab.github.org/COMPASS/)
<emph>We're usually pretty friendly and can help you get started</emph>
## Summary
* A <emph>framework</emph> for standardizing analysis pipelines.
* <emph>Flexible</emph> support of different gating approaches via <emph>plugins</emph>
* e.g <emph>flowDensity</emph> is supported.
* <emph>Re-usable templates and code</emph>
* Work is done up front for set up.
* Fully reproducible and <emph>*objective, data-driven gating*</emph>.
* <emph>OpenCyto simplifies</emph>
* Data import (raw data and/or manual gating from FlowJo workspace)
* Preprocessing
* Data manipulation (interacting with cell subsets)
* Plotting and visualization
* Extracting statistics for reports
* Downstream analysis with the full power of R's tools.
<footer>Fred Hutchinson Cancer Research Center</footer>
## Online Resources {.bigger}
* OpenCyto website: http://opencyto.org
* Documentation
* Reproducible examples and data
* Our Github repository: http://github.org/RGLab/OpenCyto
* These slides: http://gfinak.github.io/OpenCytoTutorial
* Code: http://www.github.com/gfinak/OpenCytoTutorial
* The BioConductor website: http://www.bioconductor.org
<footer>Fred Hutchinson Cancer Research Center</footer>
## Acknowledgements {.columns-2 .lessbigger}
<emph>*RGLab @ Fred Hutchinson Cancer Research Center*</emph>
**Raphael Gottardo**
**Mike Jiang**
**Jacob Frelinger**
*John Ramey*
<emph>*Collaborators*</emph>
**Steve De Rosa** @ HIV Vaccine Trials Network
**Adam Asare** @ Immune Tolerance Network
**Evan Newell** @ Singapore Immunology Network
**Mark Davis** @ Stanford
**Adam Triester** @ TreeStar Inc.
**Jay Almarode** @ TreeStar Inc.
<emph>*Funding* </emph>
National Institutes of Health
Human Immune Project Consortium (HIPC)
<footer>Fred Hutchinson Cancer Research Center</footer>
## Heatmap of Antigen-producing Cell Population Proportions
```{r heatmap,echo=FALSE,result='hide'}
auto_stats<-melt(getPopStats(auto_gating))
setnames(auto_stats,c("population","name","value"))
pd<-pData(auto_gating)
auto_stats<-merge(pd,auto_stats,by="name",all.y=TRUE)
M<-dcast(subset(auto_stats,(population%like%"cd4/.*$"|population%like%"cd8/.*$")&!Stim%in%c("CMV","sebctrl")&!population%like%c("cd57")&!population%like%"GzB"&!population%like%"Prf"),population~PTID+VISITNO+Stim,fun.aggregate=mean)
rownames(M)<-M[,1]
M<-as.matrix(M[,-1L])
COMPASS::pheatmap(t(M))
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Heatmap code
```{r heatmap_code,echo=TRUE,eval=FALSE}
#Get the autogating stats and combine with annotations on samples
auto_stats<-melt(getPopStats(auto_gating))
setnames(auto_stats,c("population","name","value"))
pd<-pData(auto_gating)
auto_stats<-merge(pd,auto_stats,by="name",all.y=TRUE)
#Some reshaping and filtering to include only CD4 and CD8 subsets
#of cytokine producing cells, and only Pol, Gag,
#and negative controls
M<-dcast(subset(auto_stats,(population%like%"cd4/.*$"|population%like%"cd8/.*$")&
!Stim%in%c("CMV","sebctrl")&!population%like%c("cd57")&!population%like%"GzB"&
!population%like%"Prf"),population~PTID+VISITNO+Stim,fun.aggregate=mean)
rownames(M)<-M[,1]
M<-as.matrix(M[,-1L])
#Heatmap
COMPASS::pheatmap(t(M))
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Constructing a Compensation Matrix From FCS Files {.smaller}
```{r comp_and_transform,echo=FALSE,eval=TRUE,warning=FALSE}
suppressPackageStartupMessages({
require(plyr)
require(flowCore)
})
#Get the compensation controls
compensation_controls<-subset(getSamples(ws),sampleID%in%subset(getSampleGroups(ws),groupName%in%"Comps")$sampleID)$name
#Read them into a flowSet
comp_controls<-read.flowSet(file.path("data","FCS",compensation_controls))[-3]
#Grab the annotations from the keywords
comp_annotation<-ldply(fsApply(comp_controls,function(x)na.omit(pData(parameters(x))[,c("name","desc")])))
#Rename the samples to the stained channel
sampleNames(comp_controls)<-c(comp_annotation$name,"Unst 1","Unst 2")
#Determine the order of the channels to reorder the files
ord<-order(match(comp_annotation$name,pData(parameters(comp_controls[[1]]))$name))
comp_controls<-comp_controls[c(ord,11)]
#Compute a spillover matrix, supplying the unstained control, the forward and side scatter, and a pattern for the channels to compensate
spill<-spillover(x=comp_controls,unstained="Unst 1",fsc="FSC-A",ssc="SSC-A",useNormFilt=TRUE,pregate=FALSE,stain_match="order",patt=paste(sampleNames(comp_controls)[-11],collapse="|"),method="mean")
#Build a compensation object
comp_object<-compensation(spill)
#Get a few sample files
sample_files<-sample(subset(getSamples(ws),sampleID%in%subset(getSampleGroups(ws),groupName%in%"0882 Samples")$sampleID)$name,5)
#Read them into a flowSet
fs<-read.flowSet(sample_files,path = "data/FCS")
#Compensate them
fs_comp<-compensate(fs,comp_object)
#Merge them into one flowFrame
fs_merge<-flowFrame(fsApply(fs_comp,exprs))
#Estimate the parameters of the logicle transformation based on the merged data
trans<-estimateLogicle(fs_merge,channels=sampleNames(comp_controls)[-11])
#Transform the data
fs_trans<-trans%on%fs
xyplot(`PE Tx RD-A`~`APC Cy7-A`,fs_trans,smooth=FALSE,xbin=32)
#Now read in all your sample files and compensate and transform them with the above
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Compensation step by step
```{r comp_step_by_step, eval=FALSE, echo=TRUE }
#Get the compensation controls
compensation_controls<-subset(getSamples(ws),sampleID%in%subset(getSampleGroups(ws),
groupName%in%"Comps")$sampleID)$name
#Read them into a flowSet
comp_controls<-read.flowSet(file.path("data","FCS",compensation_controls))[-3]
#Grab the annotations from the keywords
comp_annotation<-ldply(fsApply(comp_controls,function(x)na.omit(pData(parameters(x))[,c("name","desc")])))
#Rename the samples to the stained channel
sampleNames(comp_controls)<-c(comp_annotation$name,"Unst 1","Unst 2")
#Determine the order of the channels to reorder the files
ord<-order(match(comp_annotation$name,pData(parameters(comp_controls[[1]]))$name))
comp_controls<-comp_controls[c(ord,11)]
#Compute a spillover matrix, supplying the unstained control,
#the forward and side scatter, and a pattern for the channels to compensate
spill<-spillover(x=comp_controls,unstained="Unst 1",fsc="FSC-A",ssc="SSC-A",
useNormFilt=TRUE,pregate=FALSE,stain_match="order",
patt=paste(sampleNames(comp_controls)[-11],collapse="|"),method="mean")
#Build a compensation object
comp_object<-compensation(spill)
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Transformation step by step
```{r transformation_step_by_step,eval=FALSE,echo=TRUE}
#Get a few sample files
sample_files<-sample(subset(getSamples(ws),sampleID%in%subset(getSampleGroups(ws),
groupName%in%"0882 Samples")$sampleID)$name,5)
#Read them into a flowSet
fs<-read.flowSet(sample_files,path = "data/FCS")
#Compensate them
fs_comp<-compensate(fs,comp_object)
#Merge them into one flowFrame
fs_merge<-flowFrame(fsApply(fs_comp,exprs))
#Estimate the parameters of the logicle transformation based on the merged data
trans<-estimateLogicle(fs_merge,channels=sampleNames(comp_controls)[-11])
#Transform the data
fs_trans<-trans%on%fs
xyplot(`PE Tx RD-A`~`APC Cy7-A`,fs_trans,smooth=FALSE,xbin=32)
#Now read in all your sample files and compensate and transform them with the above
```
<footer>Fred Hutchinson Cancer Research Center</footer>
## Compensation and Transformation Summary
- Easier if you already have a compensation matrix
- Easiest if you're importing data from FlowJo, for example
- Scales of the transformed parameters will differ between FlowJo ("channel space"), and R (logicle scale).
- Template arguments will have to change depending on the above (i.e. gate range etc.)