Torch implementation of ParametricUMAP #1103

jh83775 · 2024-03-23T13:03:41Z

Hi All,
I've popped in a pull request with a version of ParametricUMAP written in PyTorch. It mostly follows the discussions in the threads below:

I'm not quite sure where it best fits in the codebase - currently I've just put it in umap/torch.py so you can import with from umap.torch import ParametricUMAP, which feels intuitive. It should be fairly easy to edit and include a nonparametric UMAP as well, via the torch.nn.Embedding class, but I can put in a fresh pull request for that when I find the time. There could also be a discussion around what parameters are accessible to the user while not bloating the number of optional args. For example, in my implementation the user can manually change the batch size and learning rate, but not the optimizer or negative sample rate. What is included in the optional args is relatively arbitrary based on what I tended to be changing the most for my use case. It will be relatively easy to expose any currently hardcoded parameter choices to the user though.

I've also added a few examples in a Jupyter notebook (see notebooks/Parametric_UMAP/08.0-torch-parametric-umap.ipynb) to give an indication of how to get started.

If any PyTorch experts want to take a look, I'll gladly add in any optimizations (I'm sure there will be a few things here and there that can be changed and improve the runtime).

I've tested it with pytorch==1.12.1 & CUDA==11.6. I think it should work with other versions too, but haven't tested more widely.

pep8speaks · 2024-03-23T13:03:47Z

Hello @jh83775! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file umap/torch.py:

Line 297:1: W293 blank line contains whitespace

Comment last updated at 2024-03-26 11:59:02 UTC

lmcinnes · 2024-03-23T13:20:41Z

Thanks for this. There is currently work underway to support Keras 3 (the initial support is done) such that multiple backends, including pytorch, could be used. You may want to check in on the refactoring discussion on PR #1101

jh83775 · 2024-03-23T13:37:31Z

Thanks for this. There is currently work underway to support Keras 3 (the initial support is done) such that multiple backends, including pytorch, could be used. You may want to check in on the refactoring discussion on PR #1101

That looks helpful, thanks. I'll update to reflect the discussions on that thread. Looks like the keras-pytorch version might supersede this work, although hopefully this can still be useful for those more familiar with torch than keras

lmcinnes · 2024-03-23T13:41:27Z

I think there's still merit in this; I just don't have a good sense of where things will settle to know what the right way to support the various options will be.

…

On Sat, Mar 23, 2024 at 9:37 AM jh83775 ***@***.***> wrote: Thanks for this. There is currently work underway to support Keras 3 (the initial support is done) such that multiple backends, including pytorch, could be used. You may want to check in on the refactoring discussion on PR #1101 <#1101> That looks helpful, thanks. I'll update to reflect the discussions on that thread. Looks like the keras-pytorch version might supersede this work, although hopefully this can still be useful for those more familiar with torch than keras — Reply to this email directly, view it on GitHub <#1103 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC3IUBIK675PBJBNBEXQC53YZWATDAVCNFSM6AAAAABFEUQLBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJWGQ4TOOBQHE> . You are receiving this because you commented.Message ID: ***@***.***>

lmcinnes · 2024-04-04T22:05:03Z

So as far as I can tell this will just sit alongside the existing Keras3 implementation (which should now support a pytorch backed) just fine as an extra pytorch specific ParametricUMAP. I would be happy to just merge this if you think it has sufficient value as-is for a standalone pytorch speciifc implementation.

timsainb · 2024-04-04T22:41:23Z

So as far as I can tell this will just sit alongside the existing Keras3 implementation (which should now support a pytorch backed) just fine as an extra pytorch specific ParametricUMAP. I would be happy to just merge this if you think it has sufficient value as-is for a standalone pytorch speciifc implementation.

I'm also glad to see this implementation. Thanks!

I do think the overlap in functionality between this and the keras implementation is pretty high.

This code is missing some key functionality (it's been a few days since I read it). One of the benefits of parametric UMAP is it allows you to balance the global MDS loss and auto encoder loss, which iirc are not in this implementation.

One thing that the keras code is missing that this code has is the data iterator. Currently to run the keras code, even if you use torch as a backend you still rely on on the tensorflow dataset, meaning you need to have keras, tensorflow, and torch installed. If we pull the iterator from this code into the keras implementation we will overcome that issue (this is something I've been meaning to do but haven't had time yet)

turian · 2024-05-09T17:47:23Z

@timsainb as per discussion on HN by Max Woolf, one big limitation of the current parametric UMAP implementation is OOM. Caused by tensorflow dataset pulling the entire graph into memory.

So a parametric UMAP implementation that works iterating over the dataset would be great.

timsainb · 2024-05-15T16:50:31Z

@turian I think this is out of the scope of the umap python library, but I give an example of how to build your own graph and iterator here: https://colab.research.google.com/drive/1WkXVZ5pnMrm17m0YgmtoNjM_XHdnE5Vp?usp=sharing

You could switch out the graph and the interator for one that supports a graph larger than memory and an iterator that grabs from that graph.

jh83775 added 2 commits March 23, 2024 12:37

Added first pass at torch implimentation + example notebook

4f3f034

moving example to notebooks folder

906d4ee

pep8 compatability

9765057

jh83775 added 2 commits March 23, 2024 13:50

Adjusted optional parameters in ParametricUMAP

7c6366d

convert np to torch in transform method

f7bb5a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch implementation of ParametricUMAP #1103

Torch implementation of ParametricUMAP #1103

jh83775 commented Mar 23, 2024

pep8speaks commented Mar 23, 2024 •

edited

lmcinnes commented Mar 23, 2024

jh83775 commented Mar 23, 2024

lmcinnes commented Mar 23, 2024 via email

lmcinnes commented Apr 4, 2024

timsainb commented Apr 4, 2024

turian commented May 9, 2024

timsainb commented May 15, 2024

Torch implementation of ParametricUMAP #1103

Are you sure you want to change the base?

Torch implementation of ParametricUMAP #1103

Conversation

jh83775 commented Mar 23, 2024

pep8speaks commented Mar 23, 2024 • edited

Comment last updated at 2024-03-26 11:59:02 UTC

lmcinnes commented Mar 23, 2024

jh83775 commented Mar 23, 2024

lmcinnes commented Mar 23, 2024 via email

lmcinnes commented Apr 4, 2024

timsainb commented Apr 4, 2024

turian commented May 9, 2024

timsainb commented May 15, 2024

pep8speaks commented Mar 23, 2024 •

edited