Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Allow sampling feature sets separately in MultiViewDTC #152

Merged
merged 47 commits into from
Nov 15, 2023

Conversation

adam2392
Copy link
Collaborator

@adam2392 adam2392 commented Oct 25, 2023

Changes proposed in this pull request:

  • Allows feature-sets to be sampled with the same scaling factor when max_features is a float, 'sqrt', or 'log'
  • Adds an experiment demonstrating multi-view superiority on a set of simulations when the feature-size of the second view in a 2-view dataset increases. All other simulation parameters are held constant (examples/hypothesis_testing/plot_co_MIGHT_alternative.py and examples/hypothesis_testing/plot_co_MIGHT_null.py)

The main files to take a look at are:

  • sktree/stats/forestht.py
  • sktree/tree/_multiview.py
  • sktree/tree/_oblique_splitter.pxd
  • sktree/tree/_oblique_splitter.pyx

The rest are cosmetic fixes, or small documentation issues I noticed.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <adam2392@gmail.com>
@codecov
Copy link

codecov bot commented Oct 25, 2023

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (030a064) 89.59% compared to head (bb08d5d) 90.52%.
Report is 2 commits behind head on main.

Files Patch % Lines
sktree/stats/forestht.py 79.48% 8 Missing ⚠️
sktree/tree/_multiview.py 83.33% 5 Missing ⚠️
sktree/experimental/monte_carlo.py 95.52% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #152      +/-   ##
==========================================
+ Coverage   89.59%   90.52%   +0.92%     
==========================================
  Files          46       48       +2     
  Lines        3710     4020     +310     
==========================================
+ Hits         3324     3639     +315     
+ Misses        386      381       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 marked this pull request as ready for review October 26, 2023 18:12
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 changed the title [WIP] sample feature sets separately [ENH] sample feature sets separately Nov 15, 2023
@adam2392 adam2392 changed the title [ENH] sample feature sets separately [ENH] Allow sampling feature sets separately in MultiViewDTC Nov 15, 2023
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link
Collaborator Author

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main files to take a look at are:

  • sktree/stats/forestht.py
  • sktree/tree/_multiview.py
  • sktree/tree/_oblique_splitter.pxd
  • sktree/tree/_oblique_splitter.pyx

The rest are cosmetic fixes, or small documentation issues I noticed.

sktree/stats/forestht.py Outdated Show resolved Hide resolved
Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link
Member

@sampan501 sampan501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor things, but looks good to me

sktree/tree/_multiview.py Outdated Show resolved Hide resolved
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 requested a review from PSSF23 November 15, 2023 17:33
@adam2392 adam2392 merged commit 9dcf913 into neurodata:main Nov 15, 2023
26 checks passed
@adam2392 adam2392 deleted the multiview branch November 15, 2023 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add parameter to multi-view splitter to add non_uniform_sampling
3 participants