Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not use dxfuse throughout the entire regenie workflow #33

Open
iamyingzhou opened this issue Dec 12, 2023 · 6 comments
Open

Why not use dxfuse throughout the entire regenie workflow #33

iamyingzhou opened this issue Dec 12, 2023 · 6 comments

Comments

@iamyingzhou
Copy link

Thanks for your wonderful work.

I've noticed that performing the regenie workflow using dxfuse is quite convenient and can significantly reduce the time spent on data downloading, especially when running step 2. Why doesn't our example code fully incorporate dxfuse?"

@Arkarachai
Copy link

@anastazie-dnanexus

@anastazie-dnanexus
Copy link
Collaborator

Hello,
I am not sure to what code are you referring. Is it this code? https://github.com/dnanexus/UKB_RAP/blob/main/GWAS/regenie_workflow/partF-step2-regenie.sh

@iamyingzhou
Copy link
Author

Thank you for your response. I'd like to suggest that for the regenie_workflow, from Parts C to G, utilizing dxfuse seems more convenient. This method might allow us to skip the -iin arguments and simply use /mnt/project in our commands. I'm not sure if I'm correct; perhaps there are downsides to dxfuse that I'm not aware of?

@anastazie-dnanexus
Copy link
Collaborator

@oklempir-cf Do you have experience with using -iin arguments versus dxfuse?

@oklempir-cf
Copy link
Contributor

-iin arguments versus dxfuse in Swiss Army Knife

Yes, @iamyingzhou I think you are right. Using dxfuse can be also viable working solutionmfor the specific parts of this pipeline. I consider it as an alternative to -iin in terms of functionality and it may be even more useful in some cases, especially for processing larger files that can be read in sequential order (for dxfuse, I observed that non random read access is required - dxfuse might fail when reading in random order and when the program is "jumping from one place to another" in the file being processed). There might be other reasons why to avoid dxfuse and rather use -iin

https://github.com/dnanexus/dxfuse (see section about several limitations)

Therefore, for advanced users who can use dxfuse efficiently, dxfuse might be definitely better choice.

Now, why I would prefer and why I would go first with -iin option, IMO:

  1. it is pretty straightforward to get it used in SAK
  2. it is the same way how you "normally" input data to dnanexus app(lets), after you watch and/or read getting started webinar/doc
  3. I consider it easier way for understanding and explaining the concept during live webinar and in order to learn new concept quickly

In the end, to complicate the things even more :), try to check the so called or similarly called "dx-mount-all-inputs" in SAK which is kind of hybrid of the two solutions above (if I understand and remember it correctly, it has been a while since I have used it in my work).

Ondrej

@iamyingzhou
Copy link
Author

Thank you for your insightful suggestions and detailed explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants