Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running SAM on Large Orthomosaic and Implementing Encoder-Decoder Workflow #153

Open
Ankit-Vohra opened this issue Jul 25, 2023 · 4 comments

Comments

@Ankit-Vohra
Copy link

I am currently facing challenges while trying to load and run the Segment Anything model on a large orthomosaic, which is approximately 8-10 GB in size. I would like to request your assistance in understanding the necessary steps to achieve this successfully.

Request 1: Load and Run Segment Anything Model on Large Orthomosaic
I kindly request detailed guidance on how to load and execute the Segment Anything model on a large orthomosaic dataset. Specifically, I am interested in knowing the appropriate steps and configurations required to handle such a large image efficiently. If possible, it would be immensely helpful if a demo or example could be provided on how to run the Segment Anything model on a sample large orthomosaic. This would significantly aid users like me in understanding the workflow better and applying it to our own datasets.

Request 2: Implementing Encoder-Decoder Workflow
Additionally, I would like to explore the implementation of an encoder-decoder workflow using the provided SAM model. I am particularly interested in learning how to pass a large image through the encoder, store the encoded vector, and then perform real-time inference using the decoder model based on the encoded vector.

Guidance Request:
To achieve the encoder-decoder workflow, I request detailed steps or guidelines on how to:

Input a large image into the encoder of the SAM model.
Store the encoded vector generated by the encoder.
Implement the decoder model to run on top of the encoded vector in real-time for semantic segmentation tasks.

@Ankit-Vohra Ankit-Vohra changed the title : Request for Guidance on Running Segment Anything Model on Large Orthomosaic and Implementing Encoder-Decoder Workflow Running SAM on Large Orthomosaic and Implementing Encoder-Decoder Workflow Jul 25, 2023
@giswqs
Copy link
Member

giswqs commented Jul 25, 2023

How big is your GPU? The GPU needs to be several times of the file size. Give a 8-10 GB image, you probably need a GPU with least 40 GB of RAM to be able to process the image in one round. Otherwise, you will need to subdivide the image into small tiles to segment the image. See this example: https://samgeo.gishub.org/examples/text_prompts_batch.

I don't have an example the encoder-decoder workflow, but you are welcome to contribute to it.

@Ankit-Vohra
Copy link
Author

@giswqs I have access to A100 which is having around 80GB GPU RAM. But still it's not working. Also please guide on how we can run this model on large images in real time. We want to run the SAM model in web browser on large satellite images similar to how the model runs in the demo website provided by Meta

@Ankit-Vohra
Copy link
Author

@giswqs I have a couple of queries if I do batch system as it will not have information of the neighbouring tile due to which an input prompt (like a bounding box or multiple points) might not mark the complete object as some part of the object may be present in another tile.
I would like to share the below two links in which the organisation have worked on SAM and have integrated it as a annotation tool to annotate large satellite imagery. Their response time is in milliseconds hence I can say that they are not tiling the images.
Please check the resources:
https://picterra.ch/blog/faster-ml-production-meta-ai-segment-anything-picterra/
https://www.youtube.com/watch?v=usN-5zBm_E0

Please help me understand as how this tool can be used in real time for geospatial imagery.

@Fanchengyan
Copy link

Hi @Ankit-Vohra . According to your description, our project Geo-SAM may meet most of your needs. We have separated the encoding process of images and prompts, which allows millisecond-level interactive segmentation. For large ground objects, lowering the resolution can solve this problem in most cases. We have also developed an encoder copilot for visualizing whether a patch can cover an object when reducing the resolution. For more details,you can refer to the documentation: https://geo-sam.readthedocs.io/en/latest/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants