Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RNA Models to Pretrained Folder #161

Closed
VBHerrenC opened this issue Jan 25, 2024 · 5 comments
Closed

Add RNA Models to Pretrained Folder #161

VBHerrenC opened this issue Jan 25, 2024 · 5 comments

Comments

@VBHerrenC
Copy link

Hello,

We are trying to use Remora based off of a pretrained model, but are working with RNA. Would it be possible to add the RNA models that are currently available in Bonito to the pre-trained models folder in Remora? I'm not sure how to link the Bonito models to Remora like the documentation mentions, since the Bonito models are a folder and doesn't seem to have the PyTorch file that's present for all of the other pretrained Remora models. Also happy to find another workaround if it's not possible to add new pretrained models to Remora, just trying to avoid training a model from scratch. Thanks!

@marcus1487
Copy link
Collaborator

I'm not sure I understand this request. Are you looking for the basecalling models to be made available through remora? Or are you looking for specific modified base models? If it is the latter, please use the remora model list_pretrained command to list the models and remora model download to download a model.

@VBHerrenC
Copy link
Author

Hi Marcus,

Thanks for the reply! I was hoping for the basecalling models to be made available through remora. The output of the remora model list_pretrained command doesn't include any RNA models. I tried to add the RNA004 Bonito model to the Remora paths but haven't had any luck yet. Happy to do it that way as well but not sure how to go about that since the remora pretrained models all seem to have a pytorch file that I don't think the Bonito folder includes. If we are trying to train an RNA model for N1-methyl-pseudouridine, is our best option just to train from scratch if neither of the above options work? Thanks for any advice.

@VBHarrisN
Copy link

Remora has a compiled format .PT for training. Given that we are trying to train a modified basecaller for Pseudouridine, we would like to use the Remora training framework as it is well documented. However, the RNA004 models have only been uploaded in a format readable by the Bonito framework. While I do understand that Bonito has a training framework, the documentation on how to use it is significantly less clear than Remora's. To That end, we were wondering if the RNA 004 model could be uploading if a format compatible with the Remora training framework.

@marcus1487
Copy link
Collaborator

I'm a bit confused by the goal here. Remora is the framework to train modified base detection models. Bonito is the framework for training canonical basecallers. Remora does not have the capability to train a canonical basecaller. If you are intending to train a canonical basecalling model that can call modified bases as the correct canonical base then Bonito is the correct tool. If you then want to identify the position of the modified bases within the sequence a Remora model is required.

Note that Remora does come with a modified base model for m6A in DRACH (remora model list_pretrained --pore rna004_130bps), but this is not going to be helpful if your goal is to train a basecalling model which will work with N1-pseudo-uridine.

I hope this helps clear up the function of the training frameworks. If you have any further questions about training a modified base detection model please post them here.

@marcus1487
Copy link
Collaborator

Closing as this issues seems to be unrelated to modified base calling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants