New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore embeddings using tsne #132
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was not yet successful in running this notebook fully. But here some initial comments.
"# Read ALL the files and save it as a pickle.\n", | ||
"\n", | ||
"clay = gpd.GeoDataFrame()\n", | ||
"DIRECTORY_PATH = \"/home/brunosan/data/Clay/embeddings_e2\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also document how to obtain these embeddings. We have previously used the data/
directory within the repo as the place to put notebook related data (such as checkpoints and image chips).
I saw we are getting a source cooperative account. Shall we wait until we published the embeddings there and get the embeddings from there?
"outputs": [], | ||
"source": [ | ||
"import numpy as np\n", | ||
"import openTSNE\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TSNE is not currently in the mamba environment. This makes me think, maybe we could add optional dependencies for running the notebooks. But not sure how to do this using mamba. @weiji14 would that be complicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conda doesn't allow optional dependencies yet, but we can maintain an environment-docs.yml
file for example. Alternatively, we could just put TSNE in the docs GitHub Actions CI if this is only a one-off thing.
" return round(coordinates[0], precision), round(coordinates[1], precision)\n", | ||
"\n", | ||
"\n", | ||
"def get_mapbox_image(polygon, access_token):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this by visualizing the clay image chips instead? There would be less detail visible, but it would work without a mapbox token, which would be nice for our notebooks.
This PR adds a sample notebook to explore the embeddings space using openTSNE locally. Depending on your compute resources it can scale up to the full training set of v0.
It uses Mapbox to pull a RGB context, and also uses -and documents- a few tricks to tSNE such a large corpus with a many dimensions.
This is an example of the output.