Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restrict-regions and short contigs #36

Open
RvV1979 opened this issue Nov 29, 2023 · 2 comments
Open

restrict-regions and short contigs #36

RvV1979 opened this issue Nov 29, 2023 · 2 comments

Comments

@RvV1979
Copy link

RvV1979 commented Nov 29, 2023

Hi Lucas,

I wish to call specific sites and have found that using the restrict-regions option to indicate a intervals.bed file specifying those sites works for this purpose (in combination with HaplotypeCaller-extra: "--output-mode EMIT_ALL_CONFIDENT_SITES").

However, my reference genome includes some very small contigs and in order to avoid overwhelmingly many jobs, I have previously been using the contig-group-size option. This is unfortunately not yet possible in combination with 'restrict-regions`.

Is there a way to specify these sites without generating overwhelmingly many jobs? Perhaps by adding a single job making empty vcf files for contigs that do not occur in the intervals.bed file?

But of course, ensuring compatibility between both contig-group-size and restrict-regions would be ideal...

Thanks

@RvV1979
Copy link
Author

RvV1979 commented Nov 30, 2023

Just to update (for the benefit of others who may want to do something similar):
As expected, the combination of both contig-group-size and restrict-regions resulted in a large number of jobs. However, the jobs for contigs without overlaps with the intervals.bed file finished within seconds. In conclusion: spurious jobs and files but fortunately not too much increase in total computation time.

@lczech
Copy link
Member

lczech commented Dec 2, 2023

Dear @RvV1979,

indeed, that is a combination of features that is a bit tricky to implement, and I will unfortunately not have the time for that in the foreseeable future. Hope it's okay that I cannot promise a timeline on implementing this.

Let's keep this issue open for now. And in the future, anyone else who wants to do something similar, please feel free to comment here to bump this up in priority in my implementation back log :-)

Also, thank you for reporting your experience with just running it as-is. That sounds like it's generally workable, with some overhead in terms of job submissions and files created. That is good to know!

All the best
Lucas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants