Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster sparse reachability #416

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

markopy
Copy link
Contributor

@markopy markopy commented Oct 3, 2020

This replaces sparse_mutual_reachability with a version which is 137 times faster for my datasets. The algorithm is essentially identical but it operates on the CSR matrix directly, omitting conversion to LIL and back which is both very slow and uses a lot of unnecessary memory.

The code supports both float32 and float64 natively which allows for more memory savings if double precision is not required. Additionally we save another unnecessary copy by passing overwrite=True to csgraph.minimum_spanning_tree.

Lastly the user can use overwrite=True when calling hdbscan to indicate the distance matrix they pass can be modified in place, saving yet another copy.

Overall memory usage can be cut to 1/4th which makes it possible to deal with very large distance matrices. I have successfully clustered graphs with 10M nodes and 1B edges.

@pep8speaks
Copy link

Hello @markopy, Thank you for submitting the Pull Request !

Line 38:63: W504 line break after binary operator
Line 46:1: E402 module level import not at top of file
Line 48:1: E302 expected 2 blank lines, found 1
Line 52:1: E101 indentation contains mixed spaces and tabs
Line 52:1: W191 indentation contains tabs
Line 52:6: E128 continuation line under-indented for visual indent
Line 53:1: E101 indentation contains mixed spaces and tabs
Line 64:1: E101 indentation contains mixed spaces and tabs
Line 64:1: W191 indentation contains tabs
Line 64:16: E128 continuation line under-indented for visual indent
Line 66:1: E101 indentation contains mixed spaces and tabs
Line 127:25: E128 continuation line under-indented for visual indent
Line 177:25: E128 continuation line under-indented for visual indent
Line 213:25: E128 continuation line under-indented for visual indent
Line 250:25: E128 continuation line under-indented for visual indent
Line 316:25: E128 continuation line under-indented for visual indent
Line 327:101: E501 line too long (104 > 100 characters)
Line 360:1: E101 indentation contains mixed spaces and tabs
Line 360:1: W191 indentation contains tabs
Line 361:1: W191 indentation contains tabs
Line 362:1: E101 indentation contains mixed spaces and tabs
Line 482:76: W291 trailing whitespace
Line 483:1: E101 indentation contains mixed spaces and tabs
Line 483:1: W191 indentation contains tabs
Line 484:1: E101 indentation contains mixed spaces and tabs
Line 640:1: E101 indentation contains mixed spaces and tabs
Line 640:1: W191 indentation contains tabs
Line 640:10: E128 continuation line under-indented for visual indent
Line 641:1: E101 indentation contains mixed spaces and tabs
Line 641:13: E127 continuation line over-indented for visual indent
Line 681:1: E101 indentation contains mixed spaces and tabs
Line 681:1: W191 indentation contains tabs
Line 682:1: E101 indentation contains mixed spaces and tabs
Line 841:76: W291 trailing whitespace
Line 842:1: E101 indentation contains mixed spaces and tabs
Line 842:1: W191 indentation contains tabs
Line 844:1: E101 indentation contains mixed spaces and tabs
Line 974:5: E303 too many blank lines (2)
Line 1005:5: E303 too many blank lines (2)
Line 1041:5: E303 too many blank lines (2)
Line 1074:18: E128 continuation line under-indented for visual indent
Line 1089:18: E128 continuation line under-indented for visual indent
Line 1090:18: E128 continuation line under-indented for visual indent
Line 1110:72: W504 line break after binary operator
Line 1111:76: W504 line break after binary operator
Line 1112:77: W504 line break after binary operator
Line 1113:75: W504 line break after binary operator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants