Add diagnostics ; evaluate row/column p-value separately #9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
They are two loosely related issues.
Support diagnostics by providing ways to access the estimated centroids, smoothed centroids, etc. with new
include_diagnostics
parameter incentroid_test()
function.The
p-value
currently used for centroid shift detection is a mean of p-values of the X and Y centroids. That seems to be problematic.E.g., for TIC 13023738, sector 2 referenced in the associated paper,
centroid_test()
reports no centroid shift detected because the meanp-value
is large (~0.25). But in reality, there is clear centroid shift in X (tinyp-value
~6e-7). In the test, it is overshadowed / averaged out by the largep-value
in Y (~0.50).To handle such cases, the second commit of this PR changes the detection from using a mean of X / Y
p-value
s, to using the minimum of X and Yp-value
s.E.g., for TIC 13023738, sector 2 referenced in the associated paper, the diagnostics could help to pinpoint the X / column centroids have a noticeable shift:
Another use is to manually inspect whether the smoothing is over/under aggressive.
Note: If the PR is to be accepted, some polish probably needs to be done (e.g., documentation, consistent labeling of the planet candidate, deciding what
p-value
(s) to report, etc.). I'm holding off pending feedback.