Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for above-ground biomass / tree cover assessment? #236

Open
jaanli opened this issue Apr 28, 2024 · 1 comment
Open

Example for above-ground biomass / tree cover assessment? #236

jaanli opened this issue Apr 28, 2024 · 1 comment

Comments

@jaanli
Copy link

jaanli commented Apr 28, 2024

Has anyone has assessed the spatial autocorrelation error of Clay vs standard models in downstream prediction/fine tuning tasks?

Here's an example assessment vs Bayesian models: https://www.mdpi.com/1660-4601/18/13/6856

I'm considering generating embeddings at the patch level and using these to classify tree cover based on this tutorial:

https://clay-foundation.github.io/model/tutorial_digital_earth_pacific_patch_level.html

The tree dataset is here: https://tree-map.nycgovparks.org/

If there is a more appropriate starting point, let me know!

Use case context if it's helpful:

I've been working on health equity metrics at the neighborhood level, and think Clay could be a good fit for applying this framework: https://treesasinfrastructure.com/

To this data: https://jaanli.github.io/new-york-real-estate/

Linked to these demographics that have spatial components: https://jaanli.github.io/american-community-survey/new-york-area/income-by-race

Where every Census Bureau-defined "microdata area" is linked to health outcomes computed from claims datasets such as: https://onefact.github.io/synthetic-healthcare-data/

The hardest part here will be error analysis for looking at spatial autocorrelation of this deep model compared to conventional models like logistic regression. Moran plots are helpful debugging tools (https://connordonegan.github.io/geostan/articles/spatial-me-models.html).

(Before using downstream fine-tuning predictions for resource allocation and public health use cases, need to carefully benchmark against the byzantine Census Bureau methods/spatial lag methods/etc)

@brunosan
Copy link
Member

brunosan commented May 3, 2024

Thanks for creating this Issue @jaanli !
This sounds very interesting and please do keep us in the loop of progress.

I'd say pass on patch-level embeddings. As I explain on #223 I think them are fundamentally skewed by the context in ways that make them less valuable in most cases than chip-level embeddings.

The good news, if you wait a couple of weeks, is that Clay v1 can create embeddings at any chip size. Keep en eye for the v1 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants