Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an error in the design matrix from the second level two sample test example? #4400

Open
YCHuang0610 opened this issue Apr 25, 2024 · 10 comments · May be fixed by #4407
Open

Is there an error in the design matrix from the second level two sample test example? #4400

YCHuang0610 opened this issue Apr 25, 2024 · 10 comments · May be fixed by #4407

Comments

@YCHuang0610
Copy link

YCHuang0610 commented Apr 25, 2024

Hello,
I'm trying to follow the tutorial to deal with two-sample t-test using nilearn, but I'm finding something a bit confused.

In the code:

condition_effect = np.hstack(([1] * n_subjects, [-1] * n_subjects))
# %%
# The design matrix for the unpaired test doesn't need any more columns
# For the paired test, we include an intercept for each subject.
subject_effect = np.vstack((np.eye(n_subjects), np.eye(n_subjects)))
subjects = [f"S{i:02d}" for i in range(1, n_subjects + 1)]
# %%
# We then assemble those into design matrices
unpaired_design_matrix = pd.DataFrame(
condition_effect[:, np.newaxis], columns=["vertical vs horizontal"]
)

and the document page: https://nilearn.github.io/dev/auto_examples/05_glm_second_level/plot_second_level_two_sample_test.html#sphx-glr-auto-examples-05-glm-second-level-plot-second-level-two-sample-test-py

the author conducted an unpaired two samples t-test on two groups of participants (vertical vs horizontal, 16 participants in each group, totaling 32 participants). In setting up the design matrix, the author assigned the vertical group as 1, the horizontal group as -1, and did not include an intercept term:

condition_effect = np.hstack(([1] * n_subjects, [-1] * n_subjects))
unpaired_design_matrix = pd.DataFrame(
    condition_effect[:, np.newaxis], columns=["vertical vs horizontal"]
)

However, in my opinion, the design matrix should be assigned as:

condition_effect = np.hstack(([1] * n_subjects, [0] * n_subjects))
unpaired_design_matrix = pd.DataFrame(
    {"vertical vs horizontal": condition_effect, 
     "intercept": 1}
)

which assign the vertical group and horizontal group as 1 and 0, and add an intercept for the linear model.

The two upon design matrix can be plotted as:

image

In my understanding, for a GLM model, the design matrix either excludes the intercept with 1, 0, and 0, 1 in conjunction with contrast matrix (like 1 -1) for final comparison (SPM style), or it needs to be designed with 1, 0, and the intercept term added. (https://imaging.mrc-cbu.cam.ac.uk/imaging/PrinciplesStatistics#Statistics:_the_simplest_case_-_a_regression)

Also, I compared the T map with the result from SPM12:

image

It shows that the spmT_0001.nii is exactly the same as the result from the second design matrix.

So I am wondering if there were some thing wrong with this example?

@Remi-Gau
Copy link
Collaborator

Had not noticed this.

Agree that adding -1 in the design matrix can be confusing.

With just 1 and 0 I remember that for 2 samples T-test there is at least 3 designs matrices that one can used, but the interpretation of the betas is different with each. See slides 102 to 122 in here
https://figshare.com/articles/presentation/SPM_and_some_of_its_maths_an_introduction/4578487

Maybe it's better to simplify this example. @bthirion am I missing something obvious here?

@YCHuang0610
Copy link
Author

Had not noticed this.

Agree that adding -1 in the design matrix can be confusing.

With just 1 and 0 I remember that for 2 samples T-test there is at least 3 designs matrices that one can used, but the interpretation of the betas is different with each. See slides 102 to 122 in here https://figshare.com/articles/presentation/SPM_and_some_of_its_maths_an_introduction/4578487

Maybe it's better to simplify this example. @bthirion am I missing something obvious here?

Indeed, the dummy coding in the design matrix is depending on the interpretation of the betas. But I think the point is that the intercept term should be added to the design matrix to ensure the T map is right, if nilearn's GLM doesn't automatically add one.

@Remi-Gau
Copy link
Collaborator

but for second level, SPM does not add intercept automatically, no?

https://www.fil.ion.ucl.ac.uk/spm/docs/tutorials/fmri/group/two_sample_ttest/#viewing-the-results

@YCHuang0610
Copy link
Author

YCHuang0610 commented Apr 25, 2024

but for second level, SPM does not add intercept automatically, no?

https://www.fil.ion.ucl.ac.uk/spm/docs/tutorials/fmri/group/two_sample_ttest/#viewing-the-results

Yes, SPM does not add intercept when calculating two-sample t-test.
As I remember, SPM uses the Model I from the slide you mentioned (page104) when calculate two-sample t-test, which needs two columns for two groups (eg. one column is 111000 represent the first group, and the other column is 000111 represent the second group without the intercept). Like this:
image
When calculating T map, you should type (1 -1) in SPM's Result button to make a contrast matrix which means the result T map: T = (Beta_1 - Beta_2)/SE.

However, I think nilearn uses the Model II (page110, is that right?). Model II needs a column all is ONE for intercept:
image

Or even if the Model I is used, the design matrix should still contain two columns to represent two groups rather than a single column in the plot_second_level_two_sample_test.py example:
image

@bthirion
Copy link
Member

No, you cannot add an intercept because it would make the design matrix rank deficient. Indeed the intercept is the sum of all subjects indicators. SPM somehow handles rank-deficient designs but we don't.
The Current Nilearn estimate is correct AFAIK.
Does that make sense to you ?

@YCHuang0610
Copy link
Author

No, you cannot add an intercept because it would make the design matrix rank deficient. Indeed the intercept is the sum of all subjects indicators. SPM somehow handles rank-deficient designs but we don't. The Current Nilearn estimate is correct AFAIK. Does that make sense to you ?

Hi @bthirion ,

The Nilearn estimation is undoubtful, I just confused about the design matrix setting for the unpaired two-sample t-test in this example: https://nilearn.github.io/dev/auto_examples/05_glm_second_level/plot_second_level_two_sample_test.html#sphx-glr-auto-examples-05-glm-second-level-plot-second-level-two-sample-test-py

The plots below show the design matrices for unpaired design and paired design from the original example document. I find the design matrix of the paired design on the right to be reasonable, but there seems to be an issue with the unpaired design matrix on the left:
image

You are right when it comes to the paired design (right plot). The design matrix for the paired design in that example is:

$$\begin{bmatrix} 1 & 1 & 0 & \cdots & 0\\\ 1 & 0 & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\\ 1 & 0 & 0 & \cdots & 1\\\ -1 & 1 & 0 & \cdots & 0\\\ -1 & 0 & 1 & \cdots & 0\\\ \vdots & \vdots & \vdots & \ddots & \vdots\\\ -1 & 0 & 0 & \cdots & 1 \end{bmatrix}$$

It can't be added another intercept column which will cause rank deficient.

However, the design matrix for the unpaired design (left plot) is just a single column (Line 66 & 76-78 of plot_second_level_two_sample_test.py):

$$\begin{bmatrix} 1 \\\ 1 \\\ 1 \\\ \vdots \\\ -1 \\\ -1 \\\ -1 \end{bmatrix}$$

Its rank is 1, which is a column full rank matrix.
And when I add a intercept column, it becomes:

$$\begin{bmatrix} 1 & 1 \\\ 1 & 1 \\\ 1 & 1 \\\ \vdots & \vdots \\\ -1 & 1 \\\ -1 & 1 \\\ -1 & 1 \end{bmatrix}$$

The two columns of that matrix are linear independent so its rank is 2, which is also a column full rank matrix and it won't cause rank deficient.

@Remi-Gau
Copy link
Collaborator

Would it clarify things if the example encoded the conditions in 2 different columns and we modified the contrast computation accordingly?

vertical horizontal
1 0
1 0
... ...
1 0
0 1
0 1
... ...
0 1

@Remi-Gau
Copy link
Collaborator

@YCHuang0610 would you be OK opening a pull request to update the example to have designs matrices where each condition is encoded in a different columns?

#4400 (comment)

@YCHuang0610
Copy link
Author

Would it clarify things if the example encoded the conditions in 2 different columns and we modified the contrast computation accordingly?

vertical horizontal
1 0
1 0
... ...
1 0
0 1
0 1
... ...
0 1

Yes!

One way is to encode the design matrix as:

$$\begin{bmatrix} 1 & 0 \\\ 1 & 0 \\\ 1 & 0 \\\ \vdots & \vdots \\\ 0 & 1 \\\ 0 & 1 \\\ 0 & 1 \end{bmatrix}$$

and then modified the contrast computation as:

stat_maps_unpaired = second_level_model_unpaired.compute_contrast(
    [1, -1], output_type="all"
)
# OR
stat_maps_unpaired = second_level_model_unpaired.compute_contrast(
    "vertical - horizontal", output_type="all"
)

The other way is to add a intercept column (two columns: "vertical vs horizontal", "intercept") as:

$$\begin{bmatrix} 1 & 1 \\\ 1 & 1 \\\ 1 & 1 \\\ \vdots & \vdots \\\ 0 & 1 \\\ 0 & 1 \\\ 0 & 1 \end{bmatrix}$$

and there is no need to modified the contrast computation, which just use:

stat_maps_unpaired = second_level_model_unpaired.compute_contrast(
    "vertical vs horizontal", output_type="all"
)

I'll try to open a pull request to update the example.

@bthirion
Copy link
Member

Sorry, the problem is that if you add the subject indicator to it, it makes the design rank deficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants