Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Fix patch sampling for discontiguous case #219

Closed
wants to merge 10 commits into from
Closed

[ENH] Fix patch sampling for discontiguous case #219

wants to merge 10 commits into from

Conversation

j1c
Copy link
Member

@j1c j1c commented Feb 13, 2024

Fixes #215

Changes proposed in this pull request:

sktree/tree/manifold/_morf_splitter.pyx

  • Rewrote the logic for sampling indices of patches (I couldn't really understand the discontiguous loops)
    1. Instead of working in raveled index space, we work in the unraveled index space
    2. For each dimension, if dim_contiguous, then we sample indices sequentially from top_left_seed to the top_left_seed+patch_dim. If not, then we randomly sample indices based on patch_dim.
    3. We take the cartesian product of all the indices across dimensions, then ravel each element in the product.
  • Some of the datatypes changed to using vectors because I was getting Python object/GIL errors on compilation when trying to use arrays and vector[vectors] interchangeably.
  • Removed some uses of class attributes that are rewritten constantly

sktree/tree/_utils.pyx

  • Added cartesian product functions
  • Changed unravel_index_cython to return rather than inplace replacement of one of the inputs
  • Changed intp_t[:] to vector[intp_t] for inputs/outputs for some functions for compatibility with changes to _morf_splitter.pyx.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

@j1c j1c changed the title [ENH] Fix patch sampling for discontiguous case (#215) [ENH] Fix patch sampling for discontiguous case Feb 13, 2024
Copy link

codecov bot commented Feb 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (9792c85) 88.48% compared to head (896ec47) 88.50%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #219      +/-   ##
==========================================
+ Coverage   88.48%   88.50%   +0.01%     
==========================================
  Files          54       54              
  Lines        4891     4896       +5     
==========================================
+ Hits         4328     4333       +5     
  Misses        563      563              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@j1c j1c requested review from adam2392 and SUKI-O February 13, 2024 22:22
@adam2392
Copy link
Collaborator

adam2392 commented Feb 16, 2024

Is it possible to refashion something like this benchmark to see if there is a runtime slowdown? I'm unsure if the vector vs memoryview needs to be careful (i.e. does it create a copy instead of a reference unnecessarily)?

Just run once on main and once on your PR branch w/o anything else running on your computer.

benchmark_morf.txt

@adam2392 adam2392 deleted the branch temp2 April 26, 2024 07:58
@adam2392 adam2392 closed this Apr 26, 2024
@adam2392
Copy link
Collaborator

Apologies @j1c! I messed up the main branch, and as a result this PR auto-closed. Do you mind resubmitting it to main?

I will try to take a look at this issue this upcoming month after NeurIPS submission time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specifying dim_contiguous causes seg faults in PatchObliqueForestClassifier
2 participants