Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues in statistical testing #107

Closed
wants to merge 4 commits into from
Closed

Fix issues in statistical testing #107

wants to merge 4 commits into from

Conversation

daehrlich
Copy link
Collaborator

@daehrlich daehrlich commented Jan 18, 2024

This pull request addresses two issues with the current implementation of the statistical testing:

  1. Wrong unsorting
    In both _perform_fdr_correction and max_statistic_sequential in stats.py, the p-values are first sorted and their argsort is saved. After the statistical testing, the resulting significance-arrays are supposed to be unsorted, i.e. brought back into the original order, in order to filter out the non-significant targets or links.

However, this unsorting is not correctly implemented, as the p-value argsort from before is used in a wrong way. The implementation reads

significance_array = significance_array[argsort_idxs],

where it should instead read

significance_array[argsort_idxs] = significance_array.copy()

where the copy is necessary to avoid concurrent modification.

  1. Incorrect number of total significance tests in fdr test
    To get the correct p-value thresholds for the fdr correction, the formula requires the total number of tests that are performed. In _perform_fdr_correction, this number n is computed from the number of p-values that are submitted. However, this is not equal to the total number of comparisons in this multiple comparison testing problem, since in the function network_fdr, the targets are first filtered by their omnibus significance.
    I believe this is wrong, and the filtering step should be eliminated so that the n in the fdr correction does in fact coincide with the total number of targets (if correct_by_target==True) or links (if correct_by_target==False).

I have implemented fixes to both issues in the branch dev_fdr_fix. Since these fixes affect the results that idtxl produces, I strongly recommend a thorough code review of the changes before a merge is performed.

@mwibral
Copy link
Collaborator

mwibral commented Jan 25, 2024

I agree with the detected bugs.
Sorting should be corrected.

For the case of correct_by_target==True the number of comparisons should indeed be the number of targets (as per Novelli et al., Network Neuroscience, 2019).

For the case correct_by_target==False one should correct by:
number_of_candidates = number_of_unique_combinations_of_timelags_and_source_processes.

This is because at the moment IDTxl only provides p-values per candidate, not per link as fas as I know.
(But this should be double checked! Although I doubt it, there maybe per link mTE and p-values prewsent in the returned results -- if so, correction of the per-link p-values by the number of links in the network would be preferable).

Note: an efficient test at the level of candidates (sources x lags) would require to test all candidates across all targets simultaneously and enter them into the maximum statistics for inclusion. This non-hierachical testing is not supported in IDTxl at the moment and would require a major rework of the full code-path.

@daehrlich daehrlich added the bug label Jan 25, 2024
@daehrlich daehrlich linked an issue Jan 25, 2024 that may be closed by this pull request
@pwollstadt
Copy link
Owner

Merged into develop

@pwollstadt pwollstadt closed this Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issues in statistical testing
4 participants