Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save the losses when creating embeddings #207

Open
2 tasks
brunosan opened this issue Apr 2, 2024 · 1 comment · May be fixed by #210
Open
2 tasks

Save the losses when creating embeddings #207

brunosan opened this issue Apr 2, 2024 · 1 comment · May be fixed by #210
Assignees

Comments

@brunosan
Copy link
Member

brunosan commented Apr 2, 2024

In the process of generating embeddings from a trained deep learning model, we perform inference on an input image and save the resulting embedding, which captures the semantic reconstructions of salient features as determined by the model. To enhance the utility of these embeddings, I propose saving the reconstruction loss alongside the embedding vector.

The reconstruction loss, calculated as the difference between the input image and the model's reconstructed output, provides valuable insights into the semantic content and anomalies present in the input image:

  1. Images with expected semantics that align well with the model's training data will exhibit a smaller reconstruction loss, indicating that the model can effectively capture and reconstruct the salient features.
  2. Images containing rare, unexpected, or anomalous semantics will result in a larger reconstruction loss, as the model may struggle to accurately reconstruct the input due to the presence of features outside its learned representation.

Real-world applications:

  1. Monitoring changes in satellite imagery: By comparing embeddings and reconstruction losses of a region (e.g., Kiev) before and after significant events (war), we can detect and quantify the extent of semantic changes. Pre-event images will likely have smaller losses, while post-event images containing destruction, damage, and other anomalies will have higher losses.
  2. Anomaly detection in various domains: The reconstruction loss can serve as a valuable metric for detecting anomalies, such as rare events (city floods, locust plagues), unusual semantics (algae blooms, green pools), or "noise" (fog, smog, ships in the ocean, image artifacts). By setting appropriate thresholds on the reconstruction loss, we can flag images containing such anomalies for further analysis.

Implementation: Modify the embedding generation pipeline to calculate and save the reconstruction loss alongside the embedding vector. This can be achieved by comparing the input image with the model's reconstructed output using e.g. the same loss, or others (MSE, ...).

Action Items:

  • @yellowcap to add saving the reconstruction losses in the embedding generation pipeline.
  • @stephen-downs to assess how to surface this in the app.

I think this change no only makes our outputs much more useful, but also of measurable relative confidence, AND highlight of operational bias (making the loss a feature, not a thing to get rid of).

cc @MaceGrim @danhammer for the utility feedback.

@brunosan
Copy link
Member Author

brunosan commented Apr 6, 2024

+1 on this based on customer needs to lean on embedding shift for anomaly detection. Not having the losses messess the value of the anomalies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants