Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow to extract chunks for a randomer dataset #170

Closed
moa4020 opened this issue May 1, 2024 · 1 comment
Closed

Workflow to extract chunks for a randomer dataset #170

moa4020 opened this issue May 1, 2024 · 1 comment

Comments

@moa4020
Copy link

moa4020 commented May 1, 2024

Hi Marcus,

We have finally managed to get a dataset that has good coverage of randomers with and without 8-oxoG at the center surrounded by 4 random bases on either side. I would like to train a model on this dataset by extracting 5-mer chunks and would like your help with extracting these chunks from my dataset.

Do I start off my trimming the reads so I isolate the randomer by itself and segment/extract 5-mer chunks out of each 9-mer? or is there a better way to use remora to do this?

Thanks,
Mohith

@marcus1487
Copy link
Collaborator

Remora does not directly support randomer processing. Randomer processing is quite a bit more involved and thus has been stored in the Betta repository. I would recommend contacting technical/customer support in order to apply for access to Betta.

At a high level though, 5-mers are not likely to be a large enough random context to train a robust model. Remora does not extract chunks of fixed sequence length, but instead extracts fix signal length chunks. These thus contain variable widths of sequence and the constant sequence outside of your randomer would then be included in may chunks. Applying this model to a new chunk of data without the same context may have unexpected results. We would recommend at least 20 and ideal >40 bases of random bases around the focus base of the randomer.

I hope this helps a bit, and would be happy to help further if you are able to gain access to Betta.

@marcus1487 marcus1487 closed this as not planned Won't fix, can't repro, duplicate, stale May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants