03_introduction.tex

In a scientific study, one typically aims for a statistical power of 80\%, a quantity proposed by \citet{Cohen1988}, implying that a true effect in the population is detected with a 80\% chance.  Power computations allow researchers to compute the minimal number of subjects to obtain the desired statistical power.  As such, power calculations avoid spending time and money on studies that are futile, and also prevent wasting time and money adding extra subjects when sufficient power was already available.

Analyses prior to fMRI experiments can optimise power in two ways.  One is the optimisation of the experimental design to ensure maximal statistical power for a given scanning duration and various constraints of behavioural paradigms.  Methods have been developed to find the optimal number and arrangement of stimuli over the duration of the experiment for each subject \citep{Henson2007,Wager2003, Friston1999, Smith2007}.  The second use is to find the necessary number of subjects \citep{Desmond2002,Mumford2008}.

While it is straightforward to compute power for a single, univariate response, determining the power of an fMRI study is a formidable task.  An array of parameters must be specified such as the within- and between-subject variance, the first and second level design, the temporal autocorrelation and the size of the hypothesized effect, all of which may vary voxel-by-voxel.  Many of these parameters may be estimated based on a pilot study, independent of the study to be performed.  The most difficult parameter to specify is the location and configuration of voxels where activations are expected.

In the earliest work on power for neuroimaging, \citet{VanHorn1998} used the noncentral $F$-distribution to visualise voxelwise power for PET data.  \citet{Desmond2002} computed sample sizes for fMRI blocked designs, in a procedure that included within- and between-subject variability and the mean effect.  This model was extended by \citet{Mumford2008}, where arbitrary designs and temporal autocorrelation were taken into account.  This work was intended for voxelwise or Region of Interest (ROI) analyses, where the multiple testing problem was accounted for by suitable adjustment of the alpha level. A more elaborate implementation by \citet{Hayasaka2007} also considered the multiple testing problem by using the non-central random field theory to control the family-wise error rate.

In this work we present a simple way to characterize the spatial signal in a fMRI study, and a direct way to estimate power based on an existing pilot study. Specifically, using (1) the volume of the brain activated and (2) the average effect size in activated brain regions, we can directly calculate power for given sample size, brain volume and smoothness. With such a basic formulation, we hope this will make power analyses prevalent, making better use of scarce research funding and better communicating the potential reproducibility of a study.

The present method is an extension of the procedure presented in \citet{Durnez2014} based on peak statistics.  Peaks, local maxima in the statistic image, are particularly tractable as they are approximately spatially independent and have reliable random field theory results for their  uncorrected and Familywise Error (FWE) corrected $p$-values \citep{Durnez2014}. In contrast, individual voxel values have complex dependency, and clusters have unreliable RFT $p$-values \citep{Woo2014,Hayasaka2003,Durnez2014,Silver2011,Eklund2016}.  In \citet{Durnez2014} we have presented a method to estimate retrospective power for local maxima, using only an estimate of the prevalence of activation and no further distributional assumptions on the effect of interest.  In the present procedure we use a statistic image from a pilot study, and use peaks above a threshold $u$ to fit a mixture model, where a proportion (1-$\pi_1$) of the peaks follow a known null distribution, and the remainder follow a Gaussian distribution with unknown mean and variance. Once the alternative distribution $H_a$ is estimated, the distribution can be transformed to account for a different sample size.  As such, not only can the posthoc power of the pilot study  be estimated, but also power for a new study with the same experiment and a different sample size, allowing general sample size calculations.

In the remainder of this paper, we present our procedure and evaluate it based on simulations that explore different fMRI characteristics, such as spatial extent of the signal and signal intensity.  Next, we present an evaluation using 180 genetically unrelated subjects from the Human Connectome Project (HCP) \citep{VanEssen2012}.  These HCP data are used for a number of reasons. First, these data are very high quality, resulting in a very high power when including all subjects and thus offer a high level of certainty about the location of the effect. Second, with 180 subjects, we can use subsamples of the data to create many smaller fMRI studies.  The sampled results can then be compared to the results of the full dataset.  The added value of using real data is that it possesses various unknown noise sources in fMRI that would be impossible to simulate.  Third, we demonstrate the procedure on a typical example of an fMRI experiment using an fMRI dataset \citep{Seurinck2011}.  Finally, we conclude with a discussion on the topic and the implementation of the procedure in a toolbox.