/
Week7Interactive.html
1601 lines (1458 loc) · 177 KB
/
Week7Interactive.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />
<title>Week 7: Statistical Inference Revision</title>
<style type="text/css">
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
</style>
<style type="text/css">code{white-space: pre;}</style>
<script type="text/javascript">
if (window.hljs) {
hljs.configure({languages: []});
hljs.initHighlightingOnLoad();
if (document.readyState && document.readyState === "complete") {
window.setTimeout(function() { hljs.initHighlighting(); }, 0);
}
}
</script>
<style type = "text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
img {
max-width:100%;
}
.tabbed-pane {
padding-top: 12px;
}
.html-widget {
margin-bottom: 20px;
}
button.code-folding-btn:focus {
outline: none;
}
summary {
display: list-item;
}
pre code {
padding: 0;
}
</style>
<!-- tabsets -->
<style type="text/css">
.tabset-dropdown > .nav-tabs {
display: inline-table;
max-height: 500px;
min-height: 44px;
overflow-y: auto;
border: 1px solid #ddd;
border-radius: 4px;
}
.tabset-dropdown > .nav-tabs > li.active:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li.active:before {
content: "";
border: none;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs > li.active {
display: block;
}
.tabset-dropdown > .nav-tabs > li > a,
.tabset-dropdown > .nav-tabs > li > a:focus,
.tabset-dropdown > .nav-tabs > li > a:hover {
border: none;
display: inline-block;
border-radius: 4px;
background-color: transparent;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li {
display: block;
float: none;
}
.tabset-dropdown > .nav-tabs > li {
display: none;
}
</style>
<!-- code folding -->
<style type="text/css">
.code-folding-btn { margin-bottom: 4px; }
</style>
<style type="text/css">
#section-TOC {
margin: 25px 0px 20px 0px;
}
@media (max-width: 768px) {
#section-TOC {
position: relative;
width: 100%;
}
}
@media print {
.toc-content {
/* see https://github.com/w3c/csswg-drafts/issues/4434 */
float: right;
}
}
.toc-content {
padding-left: 30px;
padding-right: 40px;
}
div.main-container {
max-width: 1200px;
}
div.tocify {
width: 20%;
max-width: 260px;
max-height: 85%;
}
@media (min-width: 768px) and (max-width: 991px) {
div.tocify {
width: 25%;
}
}
@media (max-width: 767px) {
div.tocify {
width: 100%;
max-width: none;
}
}
.tocify ul, .tocify li {
line-height: 20px;
}
.tocify-subheader .tocify-item {
font-size: 0.90em;
}
.tocify .list-group-item {
border-radius: 0px;
}
</style>
</head>
<body>
<div class="container-fluid main-container">
<!-- setup 3col/9col grid for toc_float and main content -->
<div class="row">
<div class="col-sm-12 col-md-4 col-lg-3">
<div id="section-TOC" class="tocify">
</div>
</div>
<div class="toc-content col-sm-12 col-md-8 col-lg-9">
<div id="section-header">
<div class="btn-group pull-right float-right">
<button type="button" class="btn btn-default btn-xs btn-secondary btn-sm dropdown-toggle" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"><span>Code</span> <span class="caret"></span></button>
<ul class="dropdown-menu dropdown-menu-right" style="min-width: 50px;">
<li><a id="rmd-show-all-code" href="#">Show All Code</a></li>
<li><a id="rmd-hide-all-code" href="#">Hide All Code</a></li>
</ul>
</div>
<h1 class="title toc-ignore">Week 7: Statistical Inference Revision</h1>
</div>
<!-- Make output window wider for R output so it doesn't split at columns at end -->
<p style="text-align: center;">
<font size="+2">James McBroom - August 2020 </font>
</p>
<p><br><br></p>
<div id="section-statistical-inference-refresher---basic-hypothesis-tests" class="section level1">
<h1>Statistical Inference Refresher - Basic Hypothesis Tests</h1>
<div id="section-introduction" class="section level2">
<h2>Introduction</h2>
<p>In this workshop we’ll be going over some of the basic tests used for statistical inference. All of this material is revision from any decent first year statistics course, so if you have done any introductory statistics at university before, crack out your old lecture notes or text books if you want more information than what is presented here.</p>
<div id="section-course-r-library" class="section level3">
<h3>Course R Library</h3>
<p>I have created a very small R package for the material in this part of the course. It simply enables you to access the data sets we use in the examples below. You have two options for installing these packages. I would highly suggest trying to install the binary version first. If that is unsuccessful for some reason, try installing the source package.</p>
<p>To install the binary package pick the one that corresponds to your operating system (xxx.zip for Windows; xxx.tgz for MacOS), download and save it to your computer. Then in RStudio go to the Tools menu item and select the “Install Packages” item from the drop-down menu. In the box that opens up, choose “Package Archive File” in the “Install from:” section. Then under “Package archive:” click the “Browse” button and navigate to where you saved the downloaded package archive. You can safely ignore the “Install to Library:” section and just click on the “Install” button. It should then install for you with no issues.</p>
<p>If for some reason there are issues with installing the binary for your OS, try downlowding and saving the source package instead (xxx.tar.gz). Use the same process as described in the previous paragraph to install it. If you are still having issues, contact me (<a href="mailto:j.mcbroom@griffith.edu.au" class="email">j.mcbroom@griffith.edu.au</a>) as soon as possible.</p>
</div>
</div>
<div id="section-hypothesis-testing-the-general-process" class="section level2">
<h2>Hypothesis Testing – The General Process</h2>
<p>Do the sample data support the claim made by the researcher? In such situations there are two main types of question:</p>
<ol style="list-style-type: decimal">
<li>Question asked is:
<ul>
<li>Is the value (parameter) as proposed?</li>
<li>Is the proportion of males equal to 0.5?</li>
<li>Is the standard deviation of leaf area greater than 10% of its mean?</li>
<li>Is the maximum energy output greater than 10kw?</li>
<li>Is the mean dissolved oxygen (DO) in the Brisbane river below the critical level for fish survival?</li>
</ul></li>
<li>Question asked is:
<ul>
<li>Are the parameters the same for different groups/situations/etc?</li>
<li>Is the mean level of NOX (nitrogen oxide) in the atmosphere increasing – time 1 versus time 2?</li>
<li>Is a particular grass species more tolerant to pressure from foot traffic than another grass species?</li>
<li>Is the average house loan through a particular bank the same this year as at the same time last?</li>
</ul></li>
</ol>
</div>
<div id="section-the-10-steps-of-hypothesis-testing" class="section level2">
<h2>The 10 Steps of Hypothesis Testing</h2>
<ol style="list-style-type: decimal">
<li><p>Identify clearly the scientific problem and question.</p></li>
<li><p>From the identified question, clearly define the research hypothesis at issue.</p></li>
<li><p>Decide on the resources, required detectable difference and significance level.</p></li>
<li><p>Formulate the statistical hypotheses: null and alternative.</p></li>
<li><p>Determine the theoretical model - based on null hypothesis and assumptions.</p></li>
<li><p>Identify the test statistic, its null distribution, and the relevant critical region.</p></li>
<li><p>Obtain the sample data and calculate the sample test statistic.</p></li>
<li><p>Compare the sample test statistic with the null distribution using the critical region OR evaluate the p-value for the test.</p></li>
<li><p>Make statistical conclusion and interpret result in terms of original question.</p></li>
<li><p>Consider the possible errors - type I, type II.</p></li>
</ol>
<div id="section-the-scientific-problem-and-question" class="section level3">
<h3>1 The Scientific Problem and Question</h3>
<p>It is the duty of the researcher to identify and explain the problem being studied. If this is not carried out with care improper, incorrect, and/or misleading conclusions may occur.</p>
</div>
<div id="section-the-research-hypothesis" class="section level3">
<h3>2 The Research Hypothesis</h3>
<ul>
<li>A specific belief about some feature of the population variable – eg a mean, proportion, range.</li>
<li>The feature will describe the variable in some way.</li>
<li>The feature must be measurable or observable (not necessarily quantitative).</li>
<li>Also known as a scientific hypothesis or an English hypothesis.</li>
<li>Refers to a situation, problem, question.</li>
</ul>
<p>Dictionary Definitions of the English word hypothesis</p>
<blockquote>
<p>Supposition made as basis for reasoning, without assumption of its truth, or as starting-point for investigation (The Concise Oxford Dictionary, 1975)</p>
</blockquote>
<blockquote>
<p>A proposition assumed as a premise in an argument; a proposition (or set of propositions) proposed as an explanation for the occurrence of some specified group of phenomena, either asserted or merely as a provisional conjecture to guide investigation (Macquarie Concise Dictionary, 1996)</p>
</blockquote>
<p>One of the most common problem areas in research design is inadequate clarification of the research hypothesis – it must be specific and unambiguous; it must be clear what is to be measured. What may seem obvious to the researcher at the time may be less than obvious to someone else, for example a research assistant actually collecting the data, and may be no longer obvious to anyone at a later date!</p>
<p><strong>Example:</strong></p>
<p>Decide whether each of the following is a good research hypothesis.</p>
<ol style="list-style-type: decimal">
<li><p>36% of Australian females between 15 and 24 years of age smoke cigarettes.</p></li>
<li><p>The probability that a cyclone first located in the Coral Sea will cross the Queensland coast is 0.20.</p></li>
<li><p>Budgerigars in inland Australia have a smaller range of body weights than do budgerigars on the coast.</p></li>
<li><p>The minimum temperature in Brisbane never goes below 0<span class="math inline">\(^{\circ}\)</span>C.</p></li>
<li><p>The average Mastercard debt is $600.</p></li>
<li><p>Toyota Corollas are better cars than Ford Lasers.</p></li>
<li><p>Most people eat meat.</p></li>
<li><p>OPs in Private Schools cover a smaller range than OPs in State Schools.</p></li>
<li><p>Five percent of women who take the contraceptive pill still fall pregnant.</p></li>
<li><p>The average level of hydrocarbon concentration in body tissues increases up the food chain indicating an accumulation process.</p></li>
<li><p>The noise levels from the freeway are above the maximum decibel level set by the Australian standards.</p></li>
</ol>
<p><strong>Difficulties in defining the research hypothesis</strong></p>
<p>The following are common difficulties encountered by researchers when they are attempting to define the research hypothesis. - Identifying the problem of interest - Defining the population - Identifying the specific question which is being asked - Stating the specific belief</p>
<p><strong>Remember, the feature describes the population variable</strong></p>
<p><strong>Example:</strong> Identify the variables, populations and research hypotheses for some of the examples given in the example above.</p>
</div>
<div id="section-resources-required-detectable-differences-significance-level-required" class="section level3">
<h3>3 Resources, Required Detectable Differences, Significance Level Required</h3>
<p><strong>Resources:</strong> The resources that are available for the study need to be assessed at the beginning of the project and compared with the resources required to achieve the desired aim. If the two are not compatible, proceeding with the research may be a complete waste of time and money. Statistical input can help with this process, and ‘clever’ designs may enable research that would otherwise not be possible.</p>
<p><strong>Detectable Differences:</strong> It is important to recognise the difference between ‘statistical difference’ and ‘observed difference’. For example, two sample means may have different values, but because of the variation associated with the measurement, it is not possible to say that they come from different populations – they are not statistically different. The researcher needs to think about the minimum difference he wishes to be able to detect – this will influence the size of the sample needed in the experimental design. It may also mean that the resources will not be sufficient; this will mean further thinking and maybe the decision not to go ahead with the study.</p>
<p><strong>The Significance Level:</strong> The chance the researcher is willing to take of incorrectly supporting the research hypothesis – usually designated by <span class="math inline">\(\alpha\)</span> (alpha). - Traditionally the level is set at 0.05 or 0.01, <strong>why?</strong> - The level depends on the situation. - 0.05 and 0.01 are like hair lengths, different people and/or problems require different reliabilities - be yourself! - The possible error if the conclusion is to reject the null hypothesis.</p>
</div>
<div id="section-the-statistical-hypotheses" class="section level3">
<h3>4 The Statistical Hypotheses</h3>
<p><strong>The Alternative Hypothesis</strong> – <span class="math inline">\(H_1\)</span> or <span class="math inline">\(H_a\)</span></p>
<ul>
<li>The ‘research’ hypothesis – possibly reformulated in statistical jargon.</li>
<li>The ‘belief’ we want to prove true.</li>
<li>The opposite of the null hypothesis.</li>
<li>By disproving the null, we say we have ‘proved’ the alternative.</li>
<li>Usually represented as H1</li>
</ul>
<p><strong>The Null Hypothesis</strong> - <span class="math inline">\(H_0\)</span></p>
<ul>
<li>Restatement of the research hypothesis in a form that is testable – usually involves negation.</li>
<li>Expresses the belief about the feature describing the variable in a way that is testable.</li>
<li>There must be a known theoretical model relating to the distribution of the feature OR a way of obtaining an empirical null distribution (resampling or bootstrapping).</li>
<li>Is true if and only if the alternative is false. We can never prove it true.</li>
</ul>
<p><strong>Hypotheses are statements about the population not about the sample.</strong></p>
<p><strong><em>One and Two Tailed Hypotheses</em></strong></p>
<p>Where do the tails fit in?</p>
<p>Tails play a significant (pun intended!) role in statistical inference – depends on question being asked.</p>
<p><strong>Two Tailed:</strong><br />
Null contains: <em>equals</em></p>
<p>Alternative contains: <em>not equals</em></p>
<p><strong>One Tailed:</strong><br />
Null contains: <em>equals and greater than</em> <strong>OR</strong> <em>equals and less than</em></p>
<p>Alternative contains: <em>less than</em> <strong>OR</strong> <em>greater than</em></p>
<p><strong>Example:</strong></p>
<p>A comparative study is to be carried out on the populations of fiddler crabs in the Tweed River and the Brisbane River. One aspect to be studied is the weight of an adult crab, a component of interest to a potential marketing venture. Write the hypotheses for the following:</p>
<ol style="list-style-type: decimal">
<li><p>Belief: crabs in the Tweed River have a different weight from those in the Brisbane River.</p></li>
<li><p>Belief: crabs in the Tweed River weigh more than those in the Brisbane River.</p></li>
<li><p>Belief: crabs in the Tweed River weigh less than those in the Brisbane River.</p></li>
</ol>
</div>
<div id="section-theoretical-models-used-in-testing-hypotheses" class="section level3">
<h3>5 Theoretical Models used in Testing Hypotheses</h3>
<p>Theoretical models are used to specify the <strong><em>null distribution</em></strong>, that is, the distribution of the test statistic <strong>if the null hypothesis is true</strong>.</p>
<p>The model will depend on the measurement and on the feature of interest in the research hypothesis. For example:</p>
<ul>
<li>A study involves a series of Bernoulli trials; feature of interest is a count or proportion - the theoretical model will be the Binomial;</li>
<li>If a continuous measurement such as weight is to be investigated and the mean is of interest - the Normal or t- distribution may be an appropriate model;</li>
<li>If the aim is to test the goodness of fit of some data to a specified distribution - the chi-squared model could be used.</li>
</ul>
<p>The feature of interest is usually converted to a <em>test statistic</em> which has a known distribution, assuming the null hypothesis is true (the Null Distribution).</p>
<p>All theoretical models involve assumptions. Violations of these assumptions may or may not have a dramatic effect on the outcome of any inference undertaken. If you are ever in any doubts regarding assumptions and your data, consult a statistician for advice.</p>
</div>
<div id="section-the-test-statistic-its-null-distribution-significance-level-and-critical-region" class="section level3">
<h3>6 The Test Statistic, its Null Distribution, Significance Level and Critical Region</h3>
<p><strong>The Test Statistic:</strong></p>
<ul>
<li><p>Usually a function of the ‘feature of interest’ and is known to have a particular distribution – this contributes to the ‘testability’ of the process.</p></li>
<li><p>Should be something that has meaning in the context of the feature of interest – if you want to determine if two things are different, you might decide to look at their absolute difference, and include some sort of weighting – a difference of two has more impact if the values are near10 than if the values are near 1000</p></li>
</ul>
<p>For example, when testing hypotheses about the population mean the equivalent Z score (or t- value if the standard deviation is estimated) becomes a test statistic.</p>
<p><strong>The null distribution</strong></p>
<ul>
<li>Is the probability distribution of the test statistic, assuming the null hypothesis is true.</li>
<li>If <span class="math inline">\(H_0\)</span> is true, this is the distribution we would expect the feature (or some expression based on it) to have.</li>
<li>The distribution for the population of ‘feature values’ if H0 is true – eg, the distribution of the sample mean .</li>
</ul>
<p><strong>Significance Level – alpha, <span class="math inline">\(\alpha\)</span></strong></p>
<p>The risk you are willing to take that you will reject the null hypothesis when it is really true. The probability of a Type I error. It defines the ‘cut off’ point for the test statistic.</p>
<p><strong>Critical Region</strong></p>
<ul>
<li>Determined by the specified significance level, <span class="math inline">\(\alpha\)</span></li>
<li>The region of the null distribution where it is considered unlikely for a value of the test statistic to occur.</li>
<li>If sample value lies here, it is regarded as evidence to reject <span class="math inline">\(H_0\)</span> in favour of <span class="math inline">\(H_1\)</span>.</li>
</ul>
<p><strong><em>The relationships of the test statistic to the sample and population are critical.</em></strong></p>
</div>
<div id="section-sample-collection-and-calculation-of-sample-test-statistic" class="section level3">
<h3>7 Sample Collection and Calculation of Sample Test Statistic</h3>
<p>Ways of selecting the sample are discussed at length in various introductory texts. In general, samples should be random and representative of the population they are taken from. The test statistic is calculated as per the definition of whatever ‘meaningful’ feature has been selected, given the question asked and the available data – eg a count or a mean or a sum of deviations or …</p>
</div>
<div id="section-comparison-of-sample-test-statistic-with-null-distribution" class="section level3">
<h3>8 Comparison of Sample Test Statistic with Null Distribution</h3>
<ul>
<li>The sample test statistic is calculated from the observed data and compared with the null distribution which reflects the population if <span class="math inline">\(H_0\)</span> is true.</li>
<li>If the sample test statistic lies in the ‘critical region’ the null hypothesis is rejected at the specified level of significance.</li>
<li>If it does not lie in the critical region the null hypothesis is not rejected – the data do not provide evidence to reject the null hypothesis in favour of the research (alternative) hypothesis.</li>
</ul>
<p><strong><em>The p-Value of a Test</em></strong></p>
<ul>
<li>Probability of observing a value of the test statistic <strong>as extreme as, or more extreme than</strong>, that seen in the sample.</li>
<li>Calculated from the null distribution.</li>
<li>Called the p-value for the sample test statistic</li>
<li>Is the probability of selecting a sample <strong>at least as favourable to the research hypothesis</strong> (alternative) as the observed sample.</li>
<li>It represents the <strong>attained level of significance</strong> for the test.</li>
</ul>
</div>
<div id="section-conclusion-and-interpretation" class="section level3">
<h3>9 Conclusion and Interpretation</h3>
<ul>
<li>Depends on whether we reject or fail to reject the null hypothesis.</li>
<li>Remember, <strong>failing to reject the null hypothesis does not mean the null hypothesis is true</strong></li>
</ul>
</div>
<div id="section-consider-possible-errors" class="section level3">
<h3>10 Consider Possible Errors:</h3>
<p>Two basic types of error can occur whenever hypothesis testing is carried out. These are summarised in the following table:</p>
<table>
<thead>
<tr class="header">
<th align="left"></th>
<th align="left"></th>
<th align="center">TRUE</th>
<th align="center">SITUATION</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left"></td>
<td align="left"></td>
<td align="center"><span class="math inline">\(H_0\)</span> is True</td>
<td align="center"><span class="math inline">\(H_0\)</span> is False</td>
</tr>
<tr class="even">
<td align="left"><strong>TEST</strong></td>
<td align="left">Fail to Reject <span class="math inline">\(H_0\)</span></td>
<td align="center">Correct</td>
<td align="center">Type II Error (<span class="math inline">\(\beta\)</span>)</td>
</tr>
<tr class="odd">
<td align="left"><strong>CONCLUSION</strong></td>
<td align="left">Reject <span class="math inline">\(H_0\)</span></td>
<td align="center">Type I Error (<span class="math inline">\(\alpha\)</span>)</td>
<td align="center">Correct</td>
</tr>
</tbody>
</table>
<p>The <strong>LEVEL OF SIGNIFICANCE</strong> is the probability of making a Type I error and is under the control of the person carrying out the statistical test. The symbol used is <span class="math inline">\(\alpha\)</span> (alpha).</p>
<p>The <strong>PROBABILITY OF A TYPE II ERROR</strong> depends on the true alternative hypothesis (and several other things) and is thus usually unknown. The symbol used is <span class="math inline">\(\beta\)</span> (beta).</p>
<p><strong>Power of a Statistical Test</strong></p>
<ul>
<li>The power of a statistical test is the probability of correctly rejecting the null hypothesis.</li>
<li>The probability of correctly detecting a valid alternative hypothesis.</li>
<li>Power is calculated as one minus the probability of a Type II error. Power = 1 - <span class="math inline">\(\beta\)</span></li>
<li>A test with low power results in a higher chance of not rejecting the null hypothesis when it should in fact be rejected.</li>
</ul>
<p>For example, if we conclude that the null hypothesis: <em>equal numbers of males and females</em> cannot be rejected, then it may be that the test of proportion being used has a low power and we are simply not detecting the actual difference.</p>
<p>This may be a case of no <em>statistical difference</em> when there is a <em>meaningful real difference</em>.</p>
<p><strong><em>Note: It is also possible to find a statistically significant difference that is not a scientifically significant or meaningful effect. Being a slave to p-values can lead you into trouble - there is no substitute for common sense and scientific knowledge. You should always ask yourself" “Does this result make sense?”</em></strong></p>
</div>
</div>
</div>
<div id="section-hypothesis-tests-specific-tests" class="section level1">
<h1>Hypothesis Tests: Specific Tests</h1>
<div id="section-chi2-test-of-independence---two-way-contingency-table" class="section level2">
<h2>1. <span class="math inline">\(\chi^2\)</span> Test of Independence - Two-Way Contingency Table</h2>
<p>The <span class="math inline">\(\chi^2\)</span> test of independence is a form of the goodness of fit test and occurs when the question raised concerns whether or not two categorical variables are independent of each other. For example,</p>
<ul>
<li>are hair colour and eye colour independent of each other?</li>
<li>Are sex and dexterity (right or left handedness) independent of each other?</li>
<li>Is the incidence of asthma in children related to the use of mosquito coils?</li>
<li>Does the type of credit card preferred depend on the sex of the customer?</li>
<li>Do males prefer Android phones and females prefer iPhones?</li>
</ul>
<p>You will recall that if two events, <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span> are independent of each other, the probability that <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span> occur simultaneously is the product of their marginal probabilities:</p>
<p><span class="math display">\[\begin{equation}
P(A \& B) = P(A)P(B) \label{eq:indep} \tag{1}
\end{equation}\]</span></p>
<p>This is the basis of the Test of Independence. We start with the hypotheses:</p>
<p><span class="math display">\[
\begin{align*}
H_0: & \text{ Variable A is independent of Variable B} \\
H_1: & \text{ Variable A and Variable B are dependent}
\end{align*}
\]</span></p>
<p>We then create a two-way contingency table - the rows are the categories of one of the variables, the columns are the categories of the other variable. If Variable A has <span class="math inline">\(a\)</span> categories and Variable B has <span class="math inline">\(b\)</span> categories, the table will contain <span class="math inline">\(a \times b\)</span> cells, and each cell will contain a count of how many observations fall into that particular combination of A and B. These are the <em>observed cell frequencies</em>, <span class="math inline">\(O_{ij}\)</span> for <span class="math inline">\(i=1, \dots, a\)</span> and <span class="math inline">\(j = 1, \ldots, b\)</span>.</p>
<p>We then calculate, for each cell, the <em>expected cell frequencies</em>, <span class="math inline">\(E_{ij}\)</span>, <span class="math inline">\(i=1, \dots, a\)</span>, <span class="math inline">\(j = 1, \ldots, b\)</span>, <strong><em>assuming the null hypothesis to be true</em></strong>, that is, assuming Variables A and B are independent of each other. We do this by using the definition of independence as shown in Equation <span class="math inline">\(\eqref{eq:indep}\)</span>. These expected values are calculated using:</p>
<p><span class="math display">\[\begin{equation}
E(i, j) = \frac{(T_i \times T_j)}{T} \label{eq:expecteds} \tag{2}
\end{equation}\]</span></p>
<p>where <span class="math inline">\(T_i\)</span> denotes the total of the <span class="math inline">\(i^{\text{th}}\)</span> row, <span class="math inline">\(T_j\)</span> denotes the total of the <span class="math inline">\(j^{\text{th}}\)</span> column, and <span class="math inline">\(T\)</span> denotes the total across all cells (the total sample size, <span class="math inline">\(n\)</span>).</p>
<p>Once the expected values have been calculated for each cell we compare them to the observed values using the test statistic:</p>
<p><span class="math display">\[\begin{equation}
T = \sum_{i = 1}^a \sum_{j = 1}^b \frac{(O_{ij} - E_{ij})^2}{E_{ij}}
\label{eq:chisq-test} \tag{3}
\end{equation}\]</span></p>
<p>It can be shown that, <a href="#section-conditions">under certain conditions</a>, the statistic defined in equation <span class="math inline">\(\eqref{eq:chisq-test}\)</span> has a <span class="math inline">\(\chi^2_{(a-1) \times (b-1)}\)</span> distribution if the null hypothesis is true.</p>
<div id="section-example" class="section level3">
<h3>Example:</h3>
<p>One hundred students were selected at random from the ESC School first year Statistics course and their hair and eye colours recorded. These values have then been summarised into a two-way table as follows:</p>
<table>
<thead>
<tr class="header">
<th align="center"></th>
<th align="center"><strong>Hair</strong></th>
<th align="center"></th>
<th align="center"><strong>Colour</strong></th>
<th align="center"></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="center"><strong>Eye Colour</strong></td>
<td align="center"><strong>Brown/Black</strong></td>
<td align="center"><strong>Blonde</strong></td>
<td align="center"><strong>Red</strong></td>
<td align="center"><strong>Total (Eye)</strong></td>
</tr>
<tr class="even">
<td align="center"><strong>Blue</strong></td>
<td align="center">5</td>
<td align="center">12</td>
<td align="center">1</td>
<td align="center"><strong>18</strong></td>
</tr>
<tr class="odd">
<td align="center"><strong>Green/Hazel</strong></td>
<td align="center">25</td>
<td align="center">2</td>
<td align="center">8</td>
<td align="center"><strong>35</strong></td>
</tr>
<tr class="even">
<td align="center"><strong>Brown</strong></td>
<td align="center">40</td>
<td align="center">6</td>
<td align="center">1</td>
<td align="center"><strong>47</strong></td>
</tr>
<tr class="odd">
<td align="center"><strong>Total (hair)</strong></td>
<td align="center"><strong>70</strong></td>
<td align="center"><strong>20</strong></td>
<td align="center"><strong>10</strong></td>
<td align="center"><strong>100</strong></td>
</tr>
</tbody>
</table>
<p>Do these data support or refute the belief that a person’s hair and eye colours are independent?</p>
<p><span class="math display">\[\begin{align*}
H_0: & \text{ Hair colour is independent of eye colour} \\
H_1: & \text{ Hair and eye colour are dependent}
\end{align*}\]</span></p>
<p>We calculate the expected values for this table using Equation <span class="math inline">\(\eqref{eq:expecteds}\)</span>. They are shown in the table below in <em>italics</em>:</p>
<table>
<thead>
<tr class="header">
<th align="center"></th>
<th align="center"><strong>Hair</strong></th>
<th align="center"></th>
<th align="center"><strong>Colour</strong></th>
<th align="center"></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="center"><strong>Eye Colour</strong></td>
<td align="center"><strong>Brown/Black</strong></td>
<td align="center"><strong>Blonde</strong></td>
<td align="center"><strong>Red</strong></td>
<td align="center"><strong>Total (Eye)</strong></td>
</tr>
<tr class="even">
<td align="center"><strong>Blue</strong></td>
<td align="center">5 <em>12.6</em></td>
<td align="center">12 <em>3.6</em></td>
<td align="center">1 <em>1.8</em></td>
<td align="center"><strong>18</strong></td>
</tr>
<tr class="odd">
<td align="center"><strong>Green/Hazel</strong></td>
<td align="center">25 <em>24.5</em></td>
<td align="center">2 <em>7</em></td>
<td align="center">8 <em>3.5</em></td>
<td align="center"><strong>35</strong></td>
</tr>
<tr class="even">
<td align="center"><strong>Brown</strong></td>
<td align="center">40 <em>32.9</em></td>
<td align="center">6 <em>9.4</em></td>
<td align="center">1 <em>4.7</em></td>
<td align="center"><strong>47</strong></td>
</tr>
<tr class="odd">
<td align="center"><strong>Total (hair)</strong></td>
<td align="center"><strong>70</strong></td>
<td align="center"><strong>20</strong></td>
<td align="center"><strong>10</strong></td>
<td align="center"><strong>100</strong></td>
</tr>
</tbody>
</table>
<p>Using Equation <span class="math inline">\(\eqref{eq:chisq-test}\)</span>, we calculate the test statistic as:</p>
<p><span class="math display">\[\begin{align*}
T & = \sum_{i = 1}^3 \sum_{j = 1}^3 \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \\
& = \frac{(5 - 12.6)^2}{12.6} + \frac{(12 - 3.6)^2}{3.6} + \ldots + \frac{(1 - 4.7)^2}{4.7} \\
& = 39.582.
\end{align*}\]</span></p>
<p>If the assumptions of the test hold (see section below on <a href="#section-conditions">assumptions</a>), this is a realisation from a <span class="math inline">\(\chi^2_{4}\)</span> distribution, if the null is true. (Note that the number of cells with expected values < 5 would indicate this is not a good approximation in this case. However, in the interests of explanation we will continue with the example as if it were fine.). Using a <span class="math inline">\(\alpha = 0.05\)</span> level of significance, <span class="math inline">\(\chi^2_4(0.95) = 9.49\)</span> from tables.</p>
<p>Since our <span class="math inline">\(T = 39.582\)</span> lies further in the tail than the critical value of 9.49, we reject the null hypothesis in favour of the alternative and conclude that, at the 5% level of significance, hair and eye colour are dependent.</p>
<p><a name="conditions"></a></p>
</div>
<div id="section-assumptions-of-the-test-of-independence" class="section level3">
<h3>Assumptions of the Test of Independence</h3>
<ul>
<li><p>Cells should contain frequencies (counts);</p></li>
<li><p>Variable categories are <em>mutually exclusive</em>. Individuals cannot belong to more than one category;</p></li>
<li><p>Individuals can only contribute to one cell: the sum of the cells must equal the total number of individuals measured. This is related to the second dot point above;</p></li>
<li><p>Individuals must be independent of each other;</p></li>
<li><p>Variables must be categorical - usually nominal though can be ordinal, or interval or ratio that have been “categorised”. However, note that in these latter cases there is usually an associated loss of information and it is therefore not recommended that you categorise numerical data just to fit it into a test of independence - there are many other more suitable analyses available for numerical data.</p></li>
<li><p>The result that the test statistic, Equation <span class="math inline">\(\eqref{eq:chisq-test}\)</span>, has a <span class="math inline">\(\chi^2\)</span> distribution is approximate. There are several different versions of the rules of thumb needed for this approximation to be reasonable. They all usually boil down to the proportion of cells with expected frequencies greater than 5. For example, the <code>chisq.test()</code> function in <code>R</code> issues a warning if any expected cell frequency is less than 5. Other sources suggest no fewer than 80% of cells contain expected values smaller than 5, with no expected counts less than 3. Essentially the result is <em>asymptotic</em>, and therefore the sample size needs to be large enough for the finite approximation to hold to an acceptable degree.</p></li>
</ul>
</div>
<div id="section-using-r" class="section level3">
<h3>Using R:</h3>
<p>The function <code>chisq.test</code> can be used to do a <span class="math inline">\(\chi^2\)</span> test of independence. It expects a table object (or a matrix representing a two-way contingency table), or you can also enter the names of the two factor variables if the data are in non-aggregated form. Note that this function does other things as well, none of which are within the scope or level of the material we are looking at.</p>
<p>To demonstrate its use, let’s look at the hair and eye colour data again.</p>
<p>The hair and eye colour data is contained in the <code>SciDatAnalysis</code> library you (hopefully) successfully installed following the instructions at the beginning of these notes. Use this library in the usual way.</p>
<p>If you want to see information about the hair and eye colour data, use the help facility:</p>
<div class="tutorial-exercise" data-label="he-dat" data-caption="Code" data-completion="1" data-diagnostics="1" data-startover="1" data-lines="0">
<pre class="text"><code>?hair.eyes</code></pre>
<script type="application/json" data-opts-chunk="1">{"fig.width":7,"fig.height":5,"fig.retina":2,"fig.align":"default","fig.keep":"high","fig.show":"asis","out.width":672,"warning":true,"error":false,"message":true,"exercise.df_print":"default","exercise.checker":"NULL"}</script>
</div>
<p>Do hair and eye colour depend on each other?</p>
<div class="tutorial-exercise" data-label="hetest" data-caption="Code" data-completion="1" data-diagnostics="1" data-startover="1" data-lines="0">
<pre class="text"><code>hair.eye <- chisq.test(hair.eyes$eye, hair.eyes$hair)
hair.eye$expected
hair.eye</code></pre>
<script type="application/json" data-opts-chunk="1">{"fig.width":7,"fig.height":5,"fig.retina":2,"fig.align":"default","fig.keep":"high","fig.show":"asis","out.width":672,"warning":true,"error":false,"message":true,"exercise.df_print":"default","exercise.checker":"NULL"}</script>
</div>
<p>We need to heed all warnings.</p>
<p>Four of the nine cells contain expected cell counts less than 5. We should be wary of the results we obtain from this analysis. For the sake of the example, let’s look at the results anyway.</p>
<p>The p-value is much smaller than the level of significance, 0.05. We reject the null in favour of the alternative, and conclude that hair colour and eye colour are not independent of each other (but note the warning above).</p>
</div>
<div id="section-excercise" class="section level3">
<h3>Excercise:</h3>
<ol style="list-style-type: decimal">
<li>The <code>SciDatAnslysis</code> library contains a data set called <code>survey</code>. Examine the <code>survey</code> data. You might want to start by accessing the help on the data for a description of the variables etc.</li>
</ol>
<div class="tutorial-exercise" data-label="survhlp" data-caption="Code" data-completion="1" data-diagnostics="1" data-startover="1" data-lines="0">
<script type="application/json" data-opts-chunk="1">{"fig.width":7,"fig.height":5,"fig.retina":2,"fig.align":"default","fig.keep":"high","fig.show":"asis","out.width":672,"warning":true,"error":false,"message":true,"exercise.df_print":"default","exercise.checker":"NULL"}</script>
</div>
<ol start="2" style="list-style-type: decimal">
<li>Examine the relationship between sex (<code>Sex</code>) and smoking status (<code>Smoke</code>). Try faceting with a barplot using both the raw counts and proportions:</li>
</ol>
<div class="tutorial-exercise" data-label="facbarsurvprop" data-caption="Code" data-completion="1" data-diagnostics="1" data-startover="1" data-lines="0">
<pre class="text"><code>library(tidyverse)
# Make a new dataset with counts and percentages. Note there are NAs in the data,
# so filter them out first to match chisq.test().
surv.prop <- survey %>%
filter(Sex != "NA", Smoke != "NA") %>%
count(Sex, Smoke) %>%
group_by(Sex) %>%
mutate(proportion = n / sum(n)) %>%
ungroup()
surv.prop
# Faceted Barplot - raw counts
ggplot(surv.prop, aes(x = Smoke, y = n)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = n), vjust = 1.5, color = "white",
size = 3.5) +
labs(title = "Smoking Status by Sex",
x = "Smoking Status",
y = "Frequency") +
facet_wrap(~Sex) +
theme_bw() +
theme(panel.grid = element_blank()) </code></pre>
<script type="application/json" data-opts-chunk="1">{"fig.width":7,"fig.height":5,"fig.retina":2,"fig.align":"default","fig.keep":"high","fig.show":"asis","out.width":672,"warning":true,"error":false,"message":true,"exercise.df_print":"default","exercise.checker":"NULL"}</script>
</div>
<p><strong><em>Exercise:</em></strong> Try modifying the above code to create a barchart using proportions instead of frequencies - note you might have to round the proportions to say 2 decimal places if you want to label your bar heights (<code>aes(label = round(proportion, 2))</code>) in <code>geom_text</code>.</p>
<p>What do these graphs suggest regarding any dependency between smoking status and sex?</p>
<ol start="3" style="list-style-type: decimal">
<li>Do a test of independence for smoking status and sex by hand by following the example given above. To create the two-way contingency table use</li>
</ol>
<div class="tutorial-exercise" data-label="tabl" data-caption="Code" data-completion="1" data-diagnostics="1" data-startover="1" data-lines="0">
<pre class="text"><code>table(survey$Sex, survey$Smoke)</code></pre>
<script type="application/json" data-opts-chunk="1">{"fig.width":7,"fig.height":5,"fig.retina":2,"fig.align":"default","fig.keep":"high","fig.show":"asis","out.width":672,"warning":true,"error":false,"message":true,"exercise.df_print":"default","exercise.checker":"NULL"}</script>
</div>
<ol start="4" style="list-style-type: decimal">
<li>Check your working, including the expected values and test statistic, in R using the <code>chisq.test()</code> function.</li>
</ol>
<div class="tutorial-exercise" data-label="survtest" data-caption="Code" data-completion="1" data-diagnostics="1" data-startover="1" data-lines="0">
<pre class="text"><code>chisq.test(survey$Sex, survey$Smoke)</code></pre>
<script type="application/json" data-opts-chunk="1">{"fig.width":7,"fig.height":5,"fig.retina":2,"fig.align":"default","fig.keep":"high","fig.show":"asis","out.width":672,"warning":true,"error":false,"message":true,"exercise.df_print":"default","exercise.checker":"NULL"}</script>
</div>
<ol start="5" style="list-style-type: decimal">
<li>Try a test of independence on <code>Sex</code> and exercise status, <code>Exer</code>.</li>
</ol>
<div class="tutorial-exercise" data-label="sexexertest" data-caption="Code" data-completion="1" data-diagnostics="1" data-startover="1" data-lines="0">
<pre class="text"><code>chisq.test(survey$Sex, survey$Exer)</code></pre>
<script type="application/json" data-opts-chunk="1">{"fig.width":7,"fig.height":5,"fig.retina":2,"fig.align":"default","fig.keep":"high","fig.show":"asis","out.width":672,"warning":true,"error":false,"message":true,"exercise.df_print":"default","exercise.checker":"NULL"}</script>
</div>
</div>
</div>
<div id="section-students-two-independent---sample-t-test" class="section level2">
<h2>2. Student’s Two (Independent) - Sample T-Test</h2>
<p>Student’s two sample <span class="math inline">\(t\)</span>-test is used to test hypotheses comparing two population means. William Gosset was Chief Experimental Brewer at the Guinness brewery in Dublin. Guinness apparently had a policy of not allowing employees to publish their findings, so Gosset used the pseudonym “Student” to disseminate his work to the academic community. In fact the <span class="math inline">\(t\)</span>-distribution was derived (in a Bayesian context) well before Gosset’s work, but as with much of academia, popularisation often wins over correct attribution.</p>
<p>We assume we have two independent samples (<span class="math inline">\(Y_{1}, \ldots, Y_{m}\)</span> and <span class="math inline">\(X_{1}, \ldots, X_{n}\)</span>) taken from normally distributed populations, where</p>
<p><span class="math display">\[Y \sim N(\mu_Y, \sigma)\]</span> and <span class="math display">\[X \sim N(\mu_X, \sigma)\]</span></p>
<p>Note the assumption of equal variance in the two populations. There are “adjustments” that can be made if this assumption is not valid for any particular set of samples (R uses the “Welch” or “Satterthwaite” degrees of freedom approximation/adjustment - also note that R assumes unequal population variances by default). We will focus on the equal-variance case here, but note that Welch’s degress of freedom adjustment is implemented in all good statistial software for situations where the assumption of equal population variance is suspect.</p>
<p>(<em>Question: Why would a test to compare means be of interest if populations have unequal standard deviations?</em>)</p>
<p>We are most commonly interested in testing the hypotheses:</p>
<p><span class="math display">\[\begin{align*}
H_0: & (\mu_Y - \mu_X) = \mu_0 \\
H_1: & (\mu_Y - \mu_X) \neq \mu_0
\label{eq:ttest-hyp} \tag{4}
\end{align*}\]</span></p>
<p>where <span class="math inline">\(\mu_0\)</span> denotes the hypothesised difference between population means (this is, more often than not, 0). There are, of course, one-tailed versions of these hypotheses also.</p>
<p>The test statistic used to test hypotheses <span class="math inline">\(\eqref{eq:ttest-hyp}\)</span> is:</p>
<p><span class="math display">\[\begin{equation}
T = \frac{(\overline{Y} - \overline{X}) -(\mu_Y - \mu_X)}{S_p \sqrt{\frac{1}{m} + \frac{1}{n}}} ,
\label{eq:ttest} \tag{5}
\end{equation}\]</span></p>
<p>where</p>
<p><span class="math display">\[
S_p^2 = \frac{(m-1)S_Y^2 + (n-1)S_X^2}{m+n-2},
\label{eq:pooled-var} \tag{6}
\]</span></p>
<p><span class="math inline">\(\overline{X}\)</span> and <span class="math inline">\(\overline{Y}\)</span> denote the sample means of the two samples, and <span class="math inline">\(S_Y\)</span> and <span class="math inline">\(S_X\)</span> denote the sample standard deviations of the two samples.</p>
<p>Under the above conditions it can be shown that the test statistic given by <span class="math inline">\(\eqref{eq:ttest}\)</span> has a <span class="math inline">\(t\)</span> distrubition with degrees of freedom equal to <span class="math inline">\(m+n-2\)</span> if the null hypothesis is true.</p>
<div id="section-using-r-1" class="section level3">
<h3>Using R</h3>
<p>You can do a two independent sample <span class="math inline">\(t\)</span>-test in R in several ways (eg this test is a special case of a one-way ANOVA). However, the function <code>t.test</code> is specifically designed for this purpose (in fact, it will also do one-sample and paired versions of the <span class="math inline">\(t\)</span>-test).</p>
<p>The <code>t.test</code> function will accept input in two ways, depending on what form you have your data in:</p>
<ol style="list-style-type: decimal">
<li><p>each sample can be given as a separate variable: <code>t.test(x, y)</code></p></li>
<li><p>you can use a formula interface (only for two-sample tests): <code>t.test(y ~ group)</code> Note <code>group</code> should be a two-level catagorical factor identifying which group the correspoding <code>y</code> observation belongs to.</p></li>
</ol>
<p>See the help for more arguments (<code>?t.test</code>). In particular, look at the arguments <code>mu =</code>, <code>alternative =</code>, and <code>var.equal =</code>. Make sure you know what they do and how to use them.</p>
</div>
<div id="section-exercises" class="section level3">
<h3>Exercises:</h3>
<p><strong>Q1.</strong></p>
<p>Researchers are interested in whether the mean weight for Tweed River crabs is greater than the mean weight for Brisbane River crabs. Using strict experimental protocols they randomly select and weigh 16 Tweed River crabs and 25 Brisbane River crabs. Based on this data they calculate the following sample statistics:</p>
<ul>
<li>Tweed River: <span class="math inline">\(\overline{X}_T = 240\)</span> grams; <span class="math inline">\(S_T = 24\)</span>.</li>
<li>Brisbane River: <span class="math inline">\(\overline{X}_B = 215\)</span> grams; <span class="math inline">\(S_B = 18\)</span>.</li>
</ul>
<p>Based on this sample data, and assuming the two crab weight populations satisfy the assumptions for the <span class="math inline">\(t\)</span>-test, test the researcher’s hypothesis using an <span class="math inline">\(\alpha = 0.05\)</span> level of significance.</p>
<p><strong>Q2.</strong></p>
<p><em>Before doing the tests below, check the model assumptions graphically (a boxplot is fine) and adjust your test arguments accordingly. Note carefully the way R orders factor levels - this impacts the calculated test statistic.</em></p>
<p>Using the <code>survey</code> data in the <code>SciDatAnalysis</code> library, test the hypothesis that male students are, on average, at least 10cm taller than female students in the following two ways: (Note, assume missing values are “missing at random”.)</p>
<ol style="list-style-type: decimal">
<li>By hand. Note there are missing values in the <code>Height</code> observations so you will need to “filter them out” (ie remove them, as we used to say) when using R to calculate the mean, sd, and sample size for each sex. As with most things in R, there are lots of ways to do this. For example, a traditional R approach using the <code>by</code> (or <code>tapply</code>) function:</li>
</ol>
<div class="tutorial-exercise" data-label="bysurv" data-caption="Code" data-completion="1" data-diagnostics="1" data-startover="1" data-lines="0">
<pre class="text"><code>boxplot(Height ~ Sex, data = survey)
by(survey$Height, survey$Sex,
function(x) c(mean = mean(x, na.rm = T), sd = sd(x, na.rm = T), n = sum(!is.na(x))))</code></pre>
<script type="application/json" data-opts-chunk="1">{"fig.width":7,"fig.height":5,"fig.retina":2,"fig.align":"default","fig.keep":"high","fig.show":"asis","out.width":672,"warning":true,"error":false,"message":true,"exercise.df_print":"default","exercise.checker":"NULL"}</script>
</div>
<ol start="2" style="list-style-type: decimal">
<li>Using the <code>t.test</code> function. Try it with and without the assumption of equal variance. What are the differences?</li>
</ol>
<div class="tutorial-exercise" data-label="survttest" data-caption="Code" data-completion="1" data-diagnostics="1" data-startover="1" data-lines="0">
<pre class="text"><code>t.test(Height ~ Sex, data = survey, mu = -10,
alternative = "less", var.equal = TRUE)</code></pre>
<script type="application/json" data-opts-chunk="1">{"fig.width":7,"fig.height":5,"fig.retina":2,"fig.align":"default","fig.keep":"high","fig.show":"asis","out.width":672,"warning":true,"error":false,"message":true,"exercise.df_print":"default","exercise.checker":"NULL"}</script>
</div>
<ol start="3" style="list-style-type: decimal">
<li>Write your conclusion to the hypothesis test. Ensure you include both a statistical conclusion and a plain English conclusion. If you don’t know what the difference is, ask.</li>
</ol>
</div>
</div>
<div id="section-analysis-of-variance---multiple-treatment-comparisons" class="section level2">
<h2>3. Analysis of Variance - Multiple Treatment Comparisons</h2>
<p>Once an ANOVA has been carried out and has identified that there are some differences between the treatment means, it is necessary to identify which treatments are different from each other. The familiar t-test was designed to compare between <strong>two</strong> treatments and there are potential problems when pairwise comparisons are made between more than two treatments. Because of the element of chance and uncertainty involved in any statistical test, in a situation where a large number of similar tests are carried out there is a potential to find significant differences simply by chance. The use of the t-test is one such situation; a similar problem arises when correlation coefficients are found between all possible pairs of variables in a multivariate situation.</p>
<p>Many tests exist for this purpose and four will be considered here. The decision as to which test to use is partly made by the underlying assumptions of each test, and the degree to which they can be met by the data set of interest. However, often the final decision becomes a personal choice between several that are equally possible; alternatively an editor or referee may dictate the test preferred if an article is to be published in a particular journal.</p>
<div id="section-using-the-protected-t-test-and-the-least-significant-difference" class="section level3">
<h3>3.1 Using the Protected t-test and the Least Significant Difference</h3>
<p>The original t-test was designed to compare two treatment means. If this test is simply extended and used to carry out all possible pairwise comparisons between more than two treatments, spurious significances may be found simply because so many of the tests are done - each test may be at a prescribed level of significance related to its specific type I error probability, but over all the possible tests this probability of error (the experimentwise error) may be quite different. To overcome this problem a requirement is imposed that the <span class="math inline">\(F\)</span>-test in the ANOVA <strong>must</strong> be significant before any <span class="math inline">\(t\)</span>-tests are carried out. If the overall test for significance says that there are no significant differences then no further testing is carried out.</p>
<p>The <span class="math inline">\(t\)</span>-test with this conditioning on the outcome of the <span class="math inline">\(F\)</span>-test is known as the <strong>Protected <span class="math inline">\(t\)</span>-test</strong>. It is implemented in R by installing and attaching the <code>agricolae</code> library and using the <code>LSD.test()</code> function contained in that library. Recall that the standard deviation used in the protected <span class="math inline">\(t\)</span>-test is the best estimate of <span class="math inline">\(s\)</span>, which is the square root of the error mean square in the ANOVA.</p>
<p>Even though significant differences may occur in pairwise <span class="math inline">\(t\)</span>-testing, if the <span class="math inline">\(F\)</span>-test in the ANOVA is non-significant, the null hypothesis that all treatment means are equal must be accepted.</p>
<p>A number of authors do not recommend the protected <span class="math inline">\(t\)</span>-test, believing that it leads to too many spurious significances. However, other authors do favour its logical and consistent approach and believe that, providing the protection of the global <span class="math inline">\(F\)</span>-test is used, it is preferred over its more conservative alternatives.</p>
<p><strong>NOTE:</strong> There is no such thing as an <strong>LSD test</strong> - the test reflects the probability distribution used and when LSD’s are used this is t; using LSD’s is a short cut for the t-test.</p>
</div>
<div id="section-tukeys-honestly-significant-difference" class="section level3">
<h3>3.2 Tukey’s Honestly Significant Difference</h3>
<p>An analysis of variance (ANOVA) is carried out to test the standard hypotheses that the treatment means are all equal.</p>
<p><span class="math display">\[\begin{align*}
H_0 & : \mu_1 = \mu_2 = \ldots = \mu_k & & \textit{treatment means are all equal}\\
H_1 & : \mu_i \neq \mu_j \text{ for some } (i, j), i \neq j & & \textit{treatment means are not all equal}
\end{align*}\]</span></p>
<p>Test the Variance Ratio using the F-test – a global test.</p>
<p><strong>If H0 is accepted</strong>, i.e. the means are not significantly different, then <strong>STOP</strong>.</p>
<p><strong>If H0 is rejected</strong> proceed to some form of multiple comparison testing to identify which treatment means are different to each other.</p>
<p>If the null hypothesis is rejected using the global test in the ANOVA then proceed with pair-wise comparisons of the means.</p>
<p>Calculate the standard error of a mean as: <span class="math inline">\(SE_{\bar{X}} = \frac{s}{\sqrt{n}}\)</span>, where <span class="math inline">\(s\)</span> = Root MSE in the ANOVA and <span class="math inline">\(n\)</span> is the number of replicates in the treatment.</p>
<p>From the Studentised Range, <span class="math inline">\(q\)</span> tables, Table B5 in Zar or <code>qtukey</code> in R, find the critical value which depends on:</p>
<ul>
<li><span class="math inline">\(\alpha\)</span> - level of significance;</li>
<li><span class="math inline">\(\nu\)</span> - error degrees of freedom in the ANOVA table;</li>
<li><span class="math inline">\(k\)</span> - number of treatments.</li>
</ul>
<p>Calculate Tukey’s Honestly Significant Difference:</p>
<p><span class="math display">\[ \textit{THSD}_{\alpha} = q_{\alpha} \times SE_{\bar{X}} \]</span></p>
<p>Obtain a <strong>table of differences</strong> for the treatment means.</p>
<p>Compare the mean differences given in the table of differences with the <span class="math inline">\(\textit{THSD}_{\alpha}\)</span> to determine the significant ones. Summarise your results clearly in written English.</p>
<p><strong>NOTE:</strong> The Table of Studentised Ranges, or <em>q</em> values, is two-tailed; an <span class="math inline">\(\alpha\)</span> = 0.05 means two tails each of 0.025. This table is used for both the Tukey test and the SNK test which follows.</p>
<div id="section-unequal-replication-of-treatments" class="section level4">
<h4>Unequal Replication of Treatments</h4>
<p>For unequal sample sizes (i.e. unequal replication of the treatments): Calculate an average standard error of the mean:</p>
<p><span class="math display">\[\textit{SE}_{\textit{avg}\bar{X}}= \sqrt{ \frac{s^2}{2} \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}\]</span></p>
<p>Note that separate THSD’s will be needed for each comparison with different replicates.</p>
<p><strong>OR</strong></p>
<p>Carry out a direct test as:</p>
<p><span class="math display">\[ T = \frac{\overline{X}_i - \overline{X}_j}{\textit{SE}_{\textit{avg}\bar{X}}} \]</span></p>
<p>Then use the fact that under <span class="math inline">\(H_0:\mu_1 = \mu_2,\)</span> <span class="math inline">\(T \sim q_{\nu, k}\)</span>.</p>
<p>A direct test is needed for <strong>each pair</strong> of means.</p>
</div>
<div id="section-using-r-2" class="section level4">
<h4>Using R</h4>
<p>To carry out a Tukey HSD test we include the agricolae library in R and use the <code>HSD.test()</code> function instead of the previous <code>LSD.test()</code> which was used for the protected t-test.</p>
</div>
</div>
<div id="section-student-newman-keuls-least-significant-range-snklsr" class="section level3">
<h3>3.3 Student-Newman-Keuls Least Significant Range (SNKLSR)</h3>
<p>The SNK test takes into consideration the number of treatments, arranged in rank order, over which a pair of treatments is being compared; the difference needed for significance for two treatment means which are ranked side by side is different from the difference needed for two means which have other treatments in between them in the rank order.</p>
<p>The SNK test is essentially the same as the THSD except it uses a different <span class="math inline">\(q\)</span> value depending on the range of treatment means involved instead of a fixed <span class="math inline">\(k\)</span>, representing the total number of treatments.</p>
<p>It is known as a <strong>MULTIPLE RANGE TEST</strong>.</p>
<p>Other multiple range tests such as Duncan’s Multiple Range Test, which is used in agricultural research, are also available.</p>
<p>The SNK is <strong>less conservative</strong> than Tukey’s test - i.e. it is less likely to accept <span class="math inline">\(H_0\)</span> when it should reject it.</p>
<p>The SNK is <strong>more conservative</strong> than the t- test - i.e. it is more likely to accept <span class="math inline">\(H_0\)</span> when it should reject it.</p>
<p>An analysis of variance (ANOVA) is carried out to test the standard hypotheses that the treatment means are all equal</p>
<p><span class="math display">\[\begin{align*}
H_0 & : \mu_1 = \mu_2 = \ldots = \mu_k & & \textit{treatment means are all equal}\\
H_1 & : \mu_i \neq \mu_j \text{ for some } (i, j), i \neq j & & \textit{treatment means are not all equal}
\end{align*}\]</span></p>
<p>Test the Variance Ratio using the F-test.</p>
<p><strong>If H0 is accepted</strong>, i.e. the means are not significantly different, then <strong>STOP</strong>.</p>
<p><strong>If H0 is rejected</strong> proceed to some form of multiple comparison testing to identify which treatment means are different to each other.</p>
<p>Calculate the standard error of a mean as: <span class="math inline">\(SE_{\bar{X}} = \frac{s}{\sqrt{n}}\)</span>, where <span class="math inline">\(s\)</span> = Root MSE in the ANOVA and <span class="math inline">\(n\)</span> is the number of replicates in the treatment.</p>
<p>From the Studentised Range, <span class="math inline">\(q\)</span> tables, Table B5 in Zar or <code>qtukey</code> in R, find the Studentised Significant Ranges which depend on the following four variables – note that for SNK there will be <span class="math inline">\(k-1\)</span> <span class="math inline">\(q\)</span> values to read from the table, giving <span class="math inline">\(k-1\)</span> SNK ranges:</p>
<ul>
<li><span class="math inline">\(\alpha\)</span> - level of significance;</li>
<li><span class="math inline">\(\nu\)</span> - error degrees of freedom in the ANOVA table;</li>
<li><span class="math inline">\(k\)</span> - number of treatments;</li>
<li><span class="math inline">\(p\)</span> - the number of means in rank order (smallest to largest) between and including the two means to be compared.</li>
</ul>
<p>Note that <span class="math inline">\(p\)</span> = 2 when the means are adjacent, <span class="math inline">\(p\)</span> = 3 if there is one mean in between the two being compared, and the most extreme comparison will be between the largest and smallest means when <span class="math inline">\(p\)</span> will equal <span class="math inline">\(k\)</span>, the number of treatments.</p>
<p>Calculate the SNK Least Significant Ranges:</p>
<p><span class="math display">\[ \textit{SNKLSR}_{\alpha, p} = q_{\alpha, p} \times SE_{\overline{X}} \]</span></p>
<p>Obtain a table of differences for the treatment means.</p>
<p>Compare the mean differences given in the table of differences with the relevant <span class="math inline">\(\textit{SNKLSR}_{\alpha, p}\)</span> values to determine the significant differences. Summarise your results clearly in written English.</p>
<div id="section-using-r-3" class="section level4">
<h4>Using R</h4>
<p>To carry out an SNK test we include the <code>agricolae</code> library (if it isn’t already attached from a previous analysis) and use the <code>SNK.test()</code> function on the fitted <code>aov()</code> model.</p>
</div>
</div>
<div id="section-the-bonferroni-adjustment-to-the-lsd" class="section level3">
<h3>3.4 The Bonferroni Adjustment to the LSD</h3>
<p>The problem with the <span class="math inline">\(t\)</span>-test is that the set level of significance applies to a single comparison of two means – this is known as a <em>Type I comparison error rate</em>. When there are a lot of such comparisons, one or more may be significant simply by chance, and there is no way of knowing this. Multiple range tests such as SNK make such an allowance and provide a control across the whole set of comparisons. This is known as an <em>experiment wise error rate</em>. What is needed is a way of controlling the experiment wise error rate, that is, the possible error <strong>across all the paired comparisons</strong>, when the <span class="math inline">\(t\)</span>-test is used.</p>
<p>A simple modification to the <span class="math inline">\(t\)</span>-test was proposed by Bonferroni to achieve this. To obtain a ‘reliable’ critical value for <span class="math inline">\(t\)</span>, the nominated significance level is divided by the number of tests that will be carried out. For example, if four treatment means are to be compared there will be six individual tests. Recall that if a significance level of 0.05 is required, then the critical value for <span class="math inline">\(p\)</span> is 0.025 as the multiple comparisons of treatment means are two tailed <span class="math inline">\(t\)</span>-tests. Therefore the probability used to obtain the ‘reliable’ critical <span class="math inline">\(t\)</span> is 0.025/6 = 0.0041666. Critical tables with such a probability are not usually available. However, the following R code will produce the required value for a given <span class="math inline">\(\alpha\)</span>, degrees of freedom (df), <span class="math inline">\(\nu\)</span>, and number of treatments, <span class="math inline">\(k\)</span>.</p>
<div class="tutorial-exercise" data-label="bonf" data-caption="Code" data-completion="1" data-diagnostics="1" data-startover="1" data-lines="0">
<pre class="text"><code>bonf.val <- function(alpha = 0.05, k, df) qt(1-((alpha/2)/choose(k, 2)), df = df)
# eg alpha = 0.05, 4 treatment groups, error df = 16 from ANOVA table:
bonf.val(k = 4, df = 16)</code></pre>
<script type="application/json" data-opts-chunk="1">{"fig.width":7,"fig.height":5,"fig.retina":2,"fig.align":"default","fig.keep":"high","fig.show":"asis","out.width":672,"warning":true,"error":false,"message":true,"exercise.df_print":"default","exercise.checker":"NULL"}</script>
</div>
<div id="section-using-r-4" class="section level4">
<h4>Using R</h4>
<p>To carry out a Bonferroni test we use the <code>LSD.test()</code> function from the <code>agricolae</code> library with argument: <code>p.adj = “bonferroni”</code> since the Bonferroni test is just an adjusted version of the LSD.</p>
</div>
</div>
<div id="section-exercise" class="section level3">
<h3>Exercise</h3>
<p>A study has been carried out to determine the strontium concentration (mg/ml) in six different bodies of water - it is suspected that this pollutant may become a problem in the future and a base line set of data is required. The data are given below. Note that the names of the water bodies have been altered to protect the innocent.</p>
<table>
<thead>
<tr class="header">
<th align="center">Joe Lake</th>
<th align="center">Wallaby Pond</th>
<th align="center">Teatree Swamp</th>
<th align="center">Rock River</th>
<th align="center">Flo Dam</th>
<th align="center">Edna Bay</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="center">56.3</td>
<td align="center">39.6</td>
<td align="center">28.2</td>
<td align="center">32.5</td>
<td align="center">46.3</td>
<td align="center">41.0</td>
</tr>
<tr class="even">
<td align="center">54.1</td>
<td align="center">40.8</td>
<td align="center">33.2</td>
<td align="center">36.1</td>
<td align="center">42.1</td>
<td align="center">44.1</td>
</tr>
<tr class="odd">
<td align="center">59.4</td>
<td align="center">37.9</td>
<td align="center">36.4</td>
<td align="center">38.1</td>
<td align="center">43.5</td>
<td align="center">46.4</td>
</tr>
<tr class="even">
<td align="center">62.7</td>
<td align="center">37.1</td>
<td align="center">34.6</td>
<td align="center">39.2</td>
<td align="center">48.8</td>
<td align="center">40.2</td>
</tr>
<tr class="odd">
<td align="center">60.0</td>
<td align="center">43.6</td>
<td align="center">29.1</td>
<td align="center">34.2</td>
<td align="center">43.7</td>
<td align="center">38.6</td>
</tr>
<tr class="even">
<td align="center">57.3</td>
<td align="center">42.4</td>
<td align="center">31.0</td>
<td align="center">36.1</td>
<td align="center">40.1</td>
<td align="center">36.3</td>
</tr>
</tbody>
</table>
<p>This data is available in the <code>SciDatAnalysis</code> library in a dataframe called <code>strontium</code>.</p>
<ol style="list-style-type: decimal">
<li><p>Write down the hypotheses you will be testing using this data.</p></li>
<li><p>Use the help function to get information about this data set.</p></li>
<li><p>Do some exploratory data analysis (EDA) on the data.</p></li>
<li><p>Write down the model you propose to fit to this data (one-way ANOVA) in order to test these hypotheses.</p></li>
<li><p>Fit this model to the data using the <code>aov</code> function in R and critique this fit graphically (residual diagnostics).</p></li>
<li><p>If you believe the residual diagnostic plots show the fit is adequate, examine and interpret the ANOVA output using the <code>summary</code> function.</p></li>
<li><p>If you don’t already have it, install the <code>agricolae</code> library. Load this library and run the 4 multiple treatment comparison (MTC) tests on the data to determine where significant mean differences in strontium lie across the 6 water bodies. Compare and contrast your findings across the 4 MTC methods. Make sure you understand the differences in findings you see.</p></li>
</ol>
</div>
</div>
<div id="section-multiple-linear-regression" class="section level2">
<h2>4. Multiple Linear Regression</h2>
<p>Multiple linear regression is a big topic (indeed, you could take several undergraduate courses on this and related topics). We will, of necessity, only scratch the surface here and deal with the most fundamental aspects.</p>
<p>Multiple linear regression refers to the situation where a single response variable, <span class="math inline">\(Y_i, i = 1, \ldots, n\)</span> is being modelled by multiple (say <span class="math inline">\(p\)</span>) explanatory variables, <span class="math inline">\(X_{ji}, i = 1, \ldots, n, j = 1, \ldots, p\)</span>. Importantly, the model is <strong>linear</strong>. In statistics, the word <strong>linear</strong> refers to the model parameters, not the explanatory variables <span class="math inline">\(X_{ij}\)</span>. Thus linear models also refer to, for example, polynomials in the <span class="math inline">\(X_{ij}\)</span>.</p>
<p>The response variable must be numerical, whereas the explanatory variables can be of any type.</p>
<p>Statistically modelling a variable using other variables is done for two main reasons - <em>explanation</em> or <em>prediction</em>. Each brings its own considerations which we will not go into here. Whatever the reason for modelling, you should always bear in mind the following quote by the statistician (and modeller) George E.P. Box:</p>
<blockquote>
<p>Now it would be very remarkable if any system existing in the real world could be exactly represented by any simple model. However, cunningly chosen parsimonious models often do provide remarkably useful approximations. For example, the law PV = RT relating pressure P, volume V and temperature T of an “ideal” gas via a constant R is not exactly true for any real gas, but it frequently provides a useful approximation and furthermore its structure is informative since it springs from a physical view of the behavior of gas molecules. For such a model there is no need to ask the question “Is the model true?”. If “truth” is to be the “whole truth” the answer must be “No”. The only question of interest is “Is the model illuminating and useful?”.</p>
</blockquote>
<p>Or, the more popular version: > Essentially, all models are wrong, but some are useful.</p>
<p>Box uses this notion repeatedly in his academic works, and more often than not goes on to say that the approximate nature of the model must always be borne in mind whenever drawing conclusions from them. He even has something to say on linear models:</p>
<blockquote>
<p>In other words, any model is at best a useful fiction — there never was, or ever will be, an exactly normal distribution or an exact linear relationship. Nevertheless, enormous progress has been made by entertaining such fictions and using them as approximations.</p>
</blockquote>
<div id="section-model-and-assumptions" class="section level3">
<h3>Model and Assumptions</h3>
<p>The model takes the general form</p>
<p><span class="math display">\[\begin{equation*}
Y_i = \beta_0 + \beta_1X_{1i} + \beta_2 X_{2i} + \ldots + \beta_p X_{pi} + \epsilon_i, \text{ }i = 1, \ldots, n.
\label{eq:mlr} \tag{7}
\end{equation*}\]</span></p>
<p>The model parameters <span class="math inline">\(\beta_j, j = 1, \ldots, p\)</span> are known as <em>regression coefficients</em> and measure the average population effect their corresponding explatory variable, <span class="math inline">\(X_j\)</span> has on the response, <span class="math inline">\(Y\)</span>. The parameter <span class="math inline">\(\beta_0\)</span> represents the <em>intercept term</em>: the average value of <span class="math inline">\(Y\)</span> when all the <span class="math inline">\(X_j = 0\)</span>. This is included in the model for mathematical reasons, but its estimate is often nonsensical due to the fact that it is often an extrapolation (ie values of <span class="math inline">\(Y\)</span> were not measured at <span class="math inline">\(X_j = 0, j = 1, \ldots, p\)</span>).</p>
<p>Model <span class="math inline">\(\eqref{eq:mlr}\)</span> is based upon the following assumptions:</p>
<ol style="list-style-type: decimal">
<li><p>Normality: The error terms <span class="math inline">\(\epsilon_i, i = 1, \ldots, n\)</span> are normally distributed with mean zero and constant variance, <span class="math inline">\(\sigma^2\)</span>, <span class="math inline">\(\epsilon_i \sim N(0, \sigma^2)\)</span>. Note: This assumption implies that the <span class="math inline">\(Y_i\)</span> are also normally distributed but <em>this is an unusable assumption in terms of model checking</em>, since by definition the mean of <span class="math inline">\(Y_i\)</span> changes with the <span class="math inline">\(X_j\)</span> for each <span class="math inline">\(i\)</span>. You will often hear people talk about checking <span class="math inline">\(Y\)</span> to see if it is normal - <em>this is absolutely pointless and in fact can be completely misleading!</em>.</p></li>
<li><p>Homoscedasticity: the variance of the error term is constant.</p></li>