Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore embeddings using tsne #132

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Explore embeddings using tsne #132

wants to merge 8 commits into from

Conversation

brunosan
Copy link
Member

This PR adds a sample notebook to explore the embeddings space using openTSNE locally. Depending on your compute resources it can scale up to the full training set of v0.

It uses Mapbox to pull a RGB context, and also uses -and documents- a few tricks to tSNE such a large corpus with a many dimensions.

This is an example of the output.

Screenshot 2024-01-23 at 10 48 32

Copy link
Member

@yellowcap yellowcap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not yet successful in running this notebook fully. But here some initial comments.

"# Read ALL the files and save it as a pickle.\n",
"\n",
"clay = gpd.GeoDataFrame()\n",
"DIRECTORY_PATH = \"/home/brunosan/data/Clay/embeddings_e2\"\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also document how to obtain these embeddings. We have previously used the data/ directory within the repo as the place to put notebook related data (such as checkpoints and image chips).

I saw we are getting a source cooperative account. Shall we wait until we published the embeddings there and get the embeddings from there?

"outputs": [],
"source": [
"import numpy as np\n",
"import openTSNE\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TSNE is not currently in the mamba environment. This makes me think, maybe we could add optional dependencies for running the notebooks. But not sure how to do this using mamba. @weiji14 would that be complicated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conda doesn't allow optional dependencies yet, but we can maintain an environment-docs.yml file for example. Alternatively, we could just put TSNE in the docs GitHub Actions CI if this is only a one-off thing.

" return round(coordinates[0], precision), round(coordinates[1], precision)\n",
"\n",
"\n",
"def get_mapbox_image(polygon, access_token):\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do this by visualizing the clay image chips instead? There would be less detail visible, but it would work without a mapbox token, which would be nice for our notebooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants