Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request addresses two issues with the current implementation of the statistical testing:
In both _perform_fdr_correction and max_statistic_sequential in stats.py, the p-values are first sorted and their argsort is saved. After the statistical testing, the resulting significance-arrays are supposed to be unsorted, i.e. brought back into the original order, in order to filter out the non-significant targets or links.
However, this unsorting is not correctly implemented, as the p-value argsort from before is used in a wrong way. The implementation reads
significance_array = significance_array[argsort_idxs],
where it should instead read
significance_array[argsort_idxs] = significance_array.copy()
where the copy is necessary to avoid concurrent modification.
To get the correct p-value thresholds for the fdr correction, the formula requires the total number of tests that are performed. In _perform_fdr_correction, this number n is computed from the number of p-values that are submitted. However, this is not equal to the total number of comparisons in this multiple comparison testing problem, since in the function network_fdr, the targets are first filtered by their omnibus significance.
I believe this is wrong, and the filtering step should be eliminated so that the n in the fdr correction does in fact coincide with the total number of targets (if correct_by_target==True) or links (if correct_by_target==False).
I have implemented fixes to both issues in the branch dev_fdr_fix. Since these fixes affect the results that idtxl produces, I strongly recommend a thorough code review of the changes before a merge is performed.