Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Faidx so it can be backed by memory mapped file #95

Open
dhauge opened this issue Aug 4, 2016 · 1 comment
Open

Extend Faidx so it can be backed by memory mapped file #95

dhauge opened this issue Aug 4, 2016 · 1 comment

Comments

@dhauge
Copy link
Contributor

dhauge commented Aug 4, 2016

It would be nice if a Faidx instance could be constructed that accesses the fasta file via a memory mapped file rather than explicit file seek/read. This would have several benefits, including

  • Removing the need to protect reads against multiple threads
  • Avoiding the need to make system calls for each sequence read

I'd imagine adding a parameter 'file_mapped=True' to the 'Faidx' constructor, and leveraging the 'mutable' parameter to determine if the memory mapping is read-only. The 'Faidx.file' object should probably be wrapped in a small object that implements random access read/write of sequences, which could then work for both normal file access and memory mapped access.

If you would consider this an acceptable addition, and would accept a patch, I should be able to provide a potential implementation in the next few days.

Thanks,
Doug

@mdshw5
Copy link
Owner

mdshw5 commented Aug 5, 2016

Thanks for the suggestion! I actually implemented Faidx using a memory-mapped file in very early versions, but didn't see much performance benefit. I think that you only avoid a system call when the data resides in the OS buffer. Otherwise you generate a page fault which causes the OS to read the data from disk, which isn't really faster than reading from disk using a system call. Also I wanted to avoid any issues with mapping larger FASTA files under 32-bit OSes.

However, if you would like to submit a PR which adds the type of wrapper you describe then I'd be all for it, since there are definitely use cases where there could be a performance benefit. I'd prefer to leave the locks in place, unless there is a clear performance benefit to be had from removing them. Also, the locks around writing a new index file should stay in place no matter what.

Looking forward to seeing what you come up with!

@mdshw5 mdshw5 modified the milestone: v0.4.8 Aug 5, 2016
@mdshw5 mdshw5 removed this from the v0.4.9 milestone Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants