Is there an error in the design matrix from the second level two sample test example? #4400

YCHuang0610 · 2024-04-25T08:47:52Z

Hello,
I'm trying to follow the tutorial to deal with two-sample t-test using nilearn, but I'm finding something a bit confused.

In the code:

nilearn/examples/05_glm_second_level/plot_second_level_two_sample_test.py

Lines 66 to 78 in 3559d80

    
           condition_effect = np.hstack(([1] * n_subjects, [-1] * n_subjects)) 
        
           # %% 
        
           # The design matrix for the unpaired test doesn't need any more columns 
        
           # For the paired test, we include an intercept for each subject. 
        
           subject_effect = np.vstack((np.eye(n_subjects), np.eye(n_subjects))) 
        
           subjects = [f"S{i:02d}" for i in range(1, n_subjects + 1)] 
        
           # %% 
        
           # We then assemble those into design matrices 
        
           unpaired_design_matrix = pd.DataFrame( 
        
               condition_effect[:, np.newaxis], columns=["vertical vs horizontal"] 
        
           )

and the document page: https://nilearn.github.io/dev/auto_examples/05_glm_second_level/plot_second_level_two_sample_test.html#sphx-glr-auto-examples-05-glm-second-level-plot-second-level-two-sample-test-py

the author conducted an unpaired two samples t-test on two groups of participants (vertical vs horizontal, 16 participants in each group, totaling 32 participants). In setting up the design matrix, the author assigned the vertical group as 1, the horizontal group as -1, and did not include an intercept term：

condition_effect = np.hstack(([1] * n_subjects, [-1] * n_subjects))
unpaired_design_matrix = pd.DataFrame(
    condition_effect[:, np.newaxis], columns=["vertical vs horizontal"]
)

However, in my opinion, the design matrix should be assigned as:

condition_effect = np.hstack(([1] * n_subjects, [0] * n_subjects))
unpaired_design_matrix = pd.DataFrame(
    {"vertical vs horizontal": condition_effect, 
     "intercept": 1}
)

which assign the vertical group and horizontal group as 1 and 0, and add an intercept for the linear model.

The two upon design matrix can be plotted as:

In my understanding, for a GLM model, the design matrix either excludes the intercept with 1, 0, and 0, 1 in conjunction with contrast matrix (like 1 -1) for final comparison (SPM style), or it needs to be designed with 1, 0, and the intercept term added. (https://imaging.mrc-cbu.cam.ac.uk/imaging/PrinciplesStatistics#Statistics:_the_simplest_case_-_a_regression)

Also, I compared the T map with the result from SPM12:

It shows that the spmT_0001.nii is exactly the same as the result from the second design matrix.

So I am wondering if there were some thing wrong with this example?

The text was updated successfully, but these errors were encountered:

Remi-Gau · 2024-04-25T09:28:33Z

Had not noticed this.

Agree that adding -1 in the design matrix can be confusing.

With just 1 and 0 I remember that for 2 samples T-test there is at least 3 designs matrices that one can used, but the interpretation of the betas is different with each. See slides 102 to 122 in here
https://figshare.com/articles/presentation/SPM_and_some_of_its_maths_an_introduction/4578487

Maybe it's better to simplify this example. @bthirion am I missing something obvious here?

YCHuang0610 · 2024-04-25T09:58:52Z

Had not noticed this.

Agree that adding -1 in the design matrix can be confusing.

With just 1 and 0 I remember that for 2 samples T-test there is at least 3 designs matrices that one can used, but the interpretation of the betas is different with each. See slides 102 to 122 in here https://figshare.com/articles/presentation/SPM_and_some_of_its_maths_an_introduction/4578487

Maybe it's better to simplify this example. @bthirion am I missing something obvious here?

Indeed, the dummy coding in the design matrix is depending on the interpretation of the betas. But I think the point is that the intercept term should be added to the design matrix to ensure the T map is right, if nilearn's GLM doesn't automatically add one.

Remi-Gau · 2024-04-25T10:11:47Z

but for second level, SPM does not add intercept automatically, no?

https://www.fil.ion.ucl.ac.uk/spm/docs/tutorials/fmri/group/two_sample_ttest/#viewing-the-results

YCHuang0610 · 2024-04-25T10:31:49Z

but for second level, SPM does not add intercept automatically, no?

https://www.fil.ion.ucl.ac.uk/spm/docs/tutorials/fmri/group/two_sample_ttest/#viewing-the-results

Yes, SPM does not add intercept when calculating two-sample t-test.
As I remember, SPM uses the Model I from the slide you mentioned (page104) when calculate two-sample t-test, which needs two columns for two groups (eg. one column is 111000 represent the first group, and the other column is 000111 represent the second group without the intercept). Like this:

When calculating T map, you should type (1 -1) in SPM's Result button to make a contrast matrix which means the result T map: T = (Beta_1 - Beta_2)/SE.

However, I think nilearn uses the Model II (page110, is that right?). Model II needs a column all is ONE for intercept:

Or even if the Model I is used, the design matrix should still contain two columns to represent two groups rather than a single column in the plot_second_level_two_sample_test.py example:

bthirion · 2024-04-25T20:57:54Z

No, you cannot add an intercept because it would make the design matrix rank deficient. Indeed the intercept is the sum of all subjects indicators. SPM somehow handles rank-deficient designs but we don't.
The Current Nilearn estimate is correct AFAIK.
Does that make sense to you ?

YCHuang0610 · 2024-04-26T04:56:39Z

No, you cannot add an intercept because it would make the design matrix rank deficient. Indeed the intercept is the sum of all subjects indicators. SPM somehow handles rank-deficient designs but we don't. The Current Nilearn estimate is correct AFAIK. Does that make sense to you ?

Hi @bthirion ,

The Nilearn estimation is undoubtful, I just confused about the design matrix setting for the unpaired two-sample t-test in this example: https://nilearn.github.io/dev/auto_examples/05_glm_second_level/plot_second_level_two_sample_test.html#sphx-glr-auto-examples-05-glm-second-level-plot-second-level-two-sample-test-py

The plots below show the design matrices for unpaired design and paired design from the original example document. I find the design matrix of the paired design on the right to be reasonable, but there seems to be an issue with the unpaired design matrix on the left:

You are right when it comes to the paired design (right plot). The design matrix for the paired design in that example is:

$$\begin{bmatrix} 1 & 1 & 0 & \cdots & 0\\\ 1 & 0 & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\\ 1 & 0 & 0 & \cdots & 1\\\ -1 & 1 & 0 & \cdots & 0\\\ -1 & 0 & 1 & \cdots & 0\\\ \vdots & \vdots & \vdots & \ddots & \vdots\\\ -1 & 0 & 0 & \cdots & 1 \end{bmatrix}$$

It can't be added another intercept column which will cause rank deficient.

However, the design matrix for the unpaired design (left plot) is just a single column (Line 66 & 76-78 of plot_second_level_two_sample_test.py):

$$\begin{bmatrix} 1 \\\ 1 \\\ 1 \\\ \vdots \\\ -1 \\\ -1 \\\ -1 \end{bmatrix}$$

Its rank is 1, which is a column full rank matrix.
And when I add a intercept column, it becomes:

$$\begin{bmatrix} 1 & 1 \\\ 1 & 1 \\\ 1 & 1 \\\ \vdots & \vdots \\\ -1 & 1 \\\ -1 & 1 \\\ -1 & 1 \end{bmatrix}$$

The two columns of that matrix are linear independent so its rank is 2, which is also a column full rank matrix and it won't cause rank deficient.

Remi-Gau · 2024-04-26T07:00:35Z

Would it clarify things if the example encoded the conditions in 2 different columns and we modified the contrast computation accordingly?

vertical	horizontal
1	0
1	0
...	...
1	0
0	1
0	1
...	...
0	1

Remi-Gau · 2024-04-26T08:24:07Z

@YCHuang0610 would you be OK opening a pull request to update the example to have designs matrices where each condition is encoded in a different columns?

#4400 (comment)

YCHuang0610 · 2024-04-26T08:36:45Z

Would it clarify things if the example encoded the conditions in 2 different columns and we modified the contrast computation accordingly?

vertical horizontal
1 0
1 0
... ...
1 0
0 1
0 1
... ...
0 1

Yes!

One way is to encode the design matrix as:

$$\begin{bmatrix} 1 & 0 \\\ 1 & 0 \\\ 1 & 0 \\\ \vdots & \vdots \\\ 0 & 1 \\\ 0 & 1 \\\ 0 & 1 \end{bmatrix}$$

and then modified the contrast computation as:

stat_maps_unpaired = second_level_model_unpaired.compute_contrast(
    [1, -1], output_type="all"
)
# OR
stat_maps_unpaired = second_level_model_unpaired.compute_contrast(
    "vertical - horizontal", output_type="all"
)

The other way is to add a intercept column (two columns: "vertical vs horizontal", "intercept") as:

$$\begin{bmatrix} 1 & 1 \\\ 1 & 1 \\\ 1 & 1 \\\ \vdots & \vdots \\\ 0 & 1 \\\ 0 & 1 \\\ 0 & 1 \end{bmatrix}$$

and there is no need to modified the contrast computation, which just use:

stat_maps_unpaired = second_level_model_unpaired.compute_contrast(
    "vertical vs horizontal", output_type="all"
)

I'll try to open a pull request to update the example.

bthirion · 2024-04-26T09:30:04Z

Sorry, the problem is that if you add the subject indicator to it, it makes the design rank deficient.

YCHuang0610 linked a pull request Apr 29, 2024 that will close this issue

[FIX] Refactor design matrix and contrast formula for the two-sample T-test example #4407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there an error in the design matrix from the second level two sample test example? #4400

Is there an error in the design matrix from the second level two sample test example? #4400

YCHuang0610 commented Apr 25, 2024 •

edited

Remi-Gau commented Apr 25, 2024

YCHuang0610 commented Apr 25, 2024

Remi-Gau commented Apr 25, 2024

YCHuang0610 commented Apr 25, 2024 •

edited

bthirion commented Apr 25, 2024

YCHuang0610 commented Apr 26, 2024

Remi-Gau commented Apr 26, 2024

Remi-Gau commented Apr 26, 2024

YCHuang0610 commented Apr 26, 2024

bthirion commented Apr 26, 2024

Is there an error in the design matrix from the second level two sample test example? #4400

Is there an error in the design matrix from the second level two sample test example? #4400

Comments

YCHuang0610 commented Apr 25, 2024 • edited

Remi-Gau commented Apr 25, 2024

YCHuang0610 commented Apr 25, 2024

Remi-Gau commented Apr 25, 2024

YCHuang0610 commented Apr 25, 2024 • edited

bthirion commented Apr 25, 2024

YCHuang0610 commented Apr 26, 2024

Remi-Gau commented Apr 26, 2024

Remi-Gau commented Apr 26, 2024

YCHuang0610 commented Apr 26, 2024

bthirion commented Apr 26, 2024

YCHuang0610 commented Apr 25, 2024 •

edited

YCHuang0610 commented Apr 25, 2024 •

edited