Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kosmos-2.5 implementation in transformers #30877

Open
2 tasks done
Natyren opened this issue May 17, 2024 · 4 comments
Open
2 tasks done

Kosmos-2.5 implementation in transformers #30877

Natyren opened this issue May 17, 2024 · 4 comments

Comments

@Natyren
Copy link
Contributor

Natyren commented May 17, 2024

Model description

Hello everyone,

The Kosmos-2.5 is a multimodal literate model that can be used for tasks such as OCR and text-rich image comprehension. It includes a ViT encoder, a Resampler, and a shared decoder module. To the best of my knowledge, the architecture of this model is similar to Kosmos-2 but has some differences. Due to these differences, using this model in Transformers requires a standalone implementation.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

Paper: https://arxiv.org/pdf/2309.11419
Code: https://github.com/microsoft/unilm/tree/master/kosmos-2.5
Authors: @Dod-o @wolfshow

@amyeroberts
Copy link
Collaborator

cc @ydshieh

@Natyren
Copy link
Contributor Author

Natyren commented May 17, 2024

I would like to assist with this implementation. If there are any guidelines on how to do it effectively, I would like to join implementation process

@amyeroberts
Copy link
Collaborator

Hi @Natyren, there's a guide in the documentation here: https://huggingface.co/docs/transformers/add_new_model

@ydshieh ydshieh self-assigned this May 21, 2024
@ydshieh
Copy link
Collaborator

ydshieh commented May 29, 2024

Hi @Natyren, sorry for the late reply. I am thinking to talk to the model author about if they are interested in porting this model into transformers. I will come back to you here for the updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants