Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fit transform #145

Open
wants to merge 34 commits into
base: master
Choose a base branch
from
Open

Conversation

tomlincr
Copy link
Contributor

@tomlincr tomlincr commented Aug 10, 2023

Added .fit_transform method to all node embedding algorithms, primarily motivated by desire to use karateclub algorithms in a scikit-learn pipeline.

Adds:

  • y=None argument, for scikit-learn compatibility
  • Passthrough if y is not None to allow passing e.g. node attributes through for a downstream task in the pipeline

Tests:

  • Method tested for each algorithm
  • Generally testing that output matches that of .get_embedding()
  • Unless stochastic method, when testing that shapes match

@tomlincr
Copy link
Contributor Author

Apologies, long day and thought I'd opened this PR on my fork to test coverage, CI etc.

@tomlincr
Copy link
Contributor Author

Interesting, all passes locally.
Seems to be some variation in the embeddings generated by multiple fits when run by actions.
Will test shape matches instead for these offenders

@codecov-commenter
Copy link

codecov-commenter commented Aug 10, 2023

Codecov Report

Merging #145 (c7ceb75) into master (d750b33) will increase coverage by 0.12%.
The diff coverage is 100.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master     #145      +/-   ##
==========================================
+ Coverage   97.41%   97.53%   +0.12%     
==========================================
  Files          63       63              
  Lines        2707     2845     +138     
==========================================
+ Hits         2637     2775     +138     
  Misses         70       70              
Files Changed Coverage Δ
karateclub/estimator.py 100.00% <100.00%> (ø)
karateclub/node_embedding/attributed/ae.py 100.00% <100.00%> (ø)
karateclub/node_embedding/attributed/asne.py 100.00% <100.00%> (ø)
karateclub/node_embedding/attributed/bane.py 100.00% <100.00%> (ø)
...arateclub/node_embedding/attributed/feathernode.py 100.00% <100.00%> (ø)
karateclub/node_embedding/attributed/fscnmf.py 100.00% <100.00%> (ø)
karateclub/node_embedding/attributed/musae.py 100.00% <100.00%> (ø)
karateclub/node_embedding/attributed/sine.py 100.00% <100.00%> (ø)
karateclub/node_embedding/attributed/tadw.py 100.00% <100.00%> (ø)
karateclub/node_embedding/attributed/tene.py 100.00% <100.00%> (ø)
... and 18 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@LucaCappelletti94
Copy link
Collaborator

I have tried to run the test suite of this pull request, but it is currently failing at the HOPE model test. I see that you are comparing the two embeddings - maybe there are numerical instabilities that lead to different results over different runs? I am not familiar with the internals of numpy & scipy that much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants