ultimate saddle by dist, remove deprecated [WIP] #484

sergpolly · 2024-01-19T23:27:25Z

ultimate saddle by distance, where instead of specifying min_diag max_diag one would receive a 3D array of saddle-data (3D arrays of saddle sums and counts), where the 1st dimension is index_of_diag index of the distance-diagonal it corresponds to - same as in #469 ...
So min_diag,max_diag can be achieved then as

np.divide(
   np.nansum(saddle_sum_stack[min_diag:max_diag], axis=0),
   np.nansum(saddle_count_stack[min_diag:max_diag], axis=0)
)

Potential use-case - by-distance saddles seem to be useful overall, short-range interactions vs longer-range interaction can tell slightly different stories . min_diag/max_diag is OK for the purpose, but often time the choice of distance ranges isn't obvious at the time or running the function, so one has to run saddle multiple times which can get annoying ... So instead, using this new functionality, one would receive a 3D stack of saddles and slice it however they desire in an agony of exploratory data analysis !

also remove plotting part from saddles - it hasn't been maintained anyways #313

Update - in the same vein of generalization - one can further imagine splitting saddles even further -> by regions ... the way public API would look like is as follows:

saddle(
    # ... existing parameters with no or little change
    aggregate_by_region = True,
    aggregate_by_distance = True,
)

so, default behavior would not change (maybe retire min_diag/max_diag but that's it), and aggregate flags would work as follows:

aggregate_by_region = False -> would make saddle return a dictionary of 2D(or 3D depending on aggregate_by_distance) ndarrays, with the (region1, region2)-keys
aggregate_by_distance = False -> would make saddle return 3D ndarrays of sums and counts, where 1st index corresponds to the distance(aka diagonal) in case of cis-data and potentially just a fake 1-dimension for trans

Potential use-cases - similarly as for the "by-distance" case - one might want to spot check some chromosomes individually - typical advanced data exploratory stuff ... separate out inter-arm (still not supported by saddles, which is a shame) - that can be closer to mainstream once inter-arm is supported.

This wouldn't be hard to implement in the current, dense-matrix based framework - simply reuse existing _accumulate type functions that keep modifying/accumulating into S,C ndarrays, and make them return S,C per-region instead - then finish aggregation outside (if requested).
Potential sparse-saddle implementation should be straightforward as well - just one more groupby ...

add saddle_stack by dist, deprecate plotting

3c24915

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ultimate saddle by dist, remove deprecated [WIP] #484

ultimate saddle by dist, remove deprecated [WIP] #484

sergpolly commented Jan 19, 2024 •

edited

ultimate saddle by dist, remove deprecated [WIP] #484

Are you sure you want to change the base?

ultimate saddle by dist, remove deprecated [WIP] #484

Conversation

sergpolly commented Jan 19, 2024 • edited

sergpolly commented Jan 19, 2024 •

edited