Save the losses when creating embeddings #207

brunosan · 2024-04-02T12:21:18Z

In the process of generating embeddings from a trained deep learning model, we perform inference on an input image and save the resulting embedding, which captures the semantic reconstructions of salient features as determined by the model. To enhance the utility of these embeddings, I propose saving the reconstruction loss alongside the embedding vector.

The reconstruction loss, calculated as the difference between the input image and the model's reconstructed output, provides valuable insights into the semantic content and anomalies present in the input image:

Images with expected semantics that align well with the model's training data will exhibit a smaller reconstruction loss, indicating that the model can effectively capture and reconstruct the salient features.
Images containing rare, unexpected, or anomalous semantics will result in a larger reconstruction loss, as the model may struggle to accurately reconstruct the input due to the presence of features outside its learned representation.

Real-world applications:

Monitoring changes in satellite imagery: By comparing embeddings and reconstruction losses of a region (e.g., Kiev) before and after significant events (war), we can detect and quantify the extent of semantic changes. Pre-event images will likely have smaller losses, while post-event images containing destruction, damage, and other anomalies will have higher losses.
Anomaly detection in various domains: The reconstruction loss can serve as a valuable metric for detecting anomalies, such as rare events (city floods, locust plagues), unusual semantics (algae blooms, green pools), or "noise" (fog, smog, ships in the ocean, image artifacts). By setting appropriate thresholds on the reconstruction loss, we can flag images containing such anomalies for further analysis.

Implementation: Modify the embedding generation pipeline to calculate and save the reconstruction loss alongside the embedding vector. This can be achieved by comparing the input image with the model's reconstructed output using e.g. the same loss, or others (MSE, ...).

Action Items:

@yellowcap to add saving the reconstruction losses in the embedding generation pipeline.
@stephen-downs to assess how to surface this in the app.

I think this change no only makes our outputs much more useful, but also of measurable relative confidence, AND highlight of operational bias (making the loss a feature, not a thing to get rid of).

cc @MaceGrim @danhammer for the utility feedback.

brunosan · 2024-04-06T21:51:03Z

+1 on this based on customer needs to lean on embedding shift for anomaly detection. Not having the losses messess the value of the anomalies.

brunosan assigned yellowcap Apr 2, 2024

brunosan linked a pull request Apr 4, 2024 that will close this issue

Add option to save losses at input or patch level #210

Draft

brunosan mentioned this issue Apr 28, 2024

Generate v1 Embeddings #235

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save the losses when creating embeddings #207

Save the losses when creating embeddings #207

brunosan commented Apr 2, 2024

brunosan commented Apr 6, 2024

Save the losses when creating embeddings #207

Save the losses when creating embeddings #207

Comments

brunosan commented Apr 2, 2024

brunosan commented Apr 6, 2024