Replies: 1 comment 1 reply
-
The existing trained models are intended to just be fed sequences as inputs - https://github.com/eternagame/KaggleOpenVaccine/blob/main/scripts/degscore_xgboost_inference.py and https://github.com/eternagame/KaggleOpenVaccine/blob/main/scripts/nullrecurrent_inference.py are scripts that let you do that and https://github.com/eternagame/KaggleOpenVaccine/tree/main/scripts/example_inputs provides some sample input files. None of the models use SHAPE data as an input, so you'd need to add these features and retrain the models (https://github.com/eternagame/KaggleOpenVaccine/tree/main/notebooks) and inference scripts to take that into account. As far as dot-bracket structures, you could theoretically swap out the mfe function calls in the inference scripts to use your values instead, though the training was done using the results from a specific in-silico model, so it may or may not actually improve your results. On both counts, be aware that you'd need to do your own validation of these new models. |
Beta Was this translation helpful? Give feedback.
-
Hello,
Im not a bioinformatician, merely RNA biologist, but I would like to use this pipeline to predict RNA degradation hot-spots and feed SHAPE reactivity data and dot-bracket notations to support better/more accurate predictions. Where do I start (after my pipeline is installed)?
Beta Was this translation helpful? Give feedback.
All reactions