Feature request: Improve block management for uncompressed blocks to save memory and enhance deduplication #139

wychen · 2023-04-27T04:33:27Z

I would like to propose optimizing block management for uncompressed blocks in DwarFS. As it currently stands, uncompressed blocks are treated the same way as compressed blocks, meaning they are still loaded into memory and read sequentially from the beginning of the block from disk. This approach can be inefficient, especially when there is frequent access to uncompressed blocks. By allowing random access to the block without reading everything before the segment we need, or even not loading the block into memory at all, we could potentially save a significant amount of private memory.

mmap() could potentially enable efficient random access to uncompressed blocks and possibly eliminate the need to manually load them into memory entirely.

This feature would also be beneficial for the mkdwarfs process. If uncompressed blocks do not occupy private memory, they would not need to be counted toward the --max-lookback-blocks (-B) quota. This approach could effectively enlarge the deduplication lookup window without increasing the memory footprint. This idea is orthogonal to the proposal in #138, and these two methods can be combined to further optimize the deduplication process. For uncompressed blocks, they can still extend with byte granularity since mmap() allows for cheap random access.

I hope this proposal makes sense and I look forward to hearing your thoughts on its feasibility.

The text was updated successfully, but these errors were encountered:

mhx · 2023-05-25T13:27:13Z

This is a great observation and for the first case, it's trivial to implement. I've got it working in a branch and will push the code once I've got a proper internet connection.

mhx added a commit that referenced this issue May 27, 2023

Bypass cache for uncompressed blocks (partially addresses gh #139)

9e88686

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Improve block management for uncompressed blocks to save memory and enhance deduplication #139

Feature request: Improve block management for uncompressed blocks to save memory and enhance deduplication #139

wychen commented Apr 27, 2023

mhx commented May 25, 2023

Feature request: Improve block management for uncompressed blocks to save memory and enhance deduplication #139

Feature request: Improve block management for uncompressed blocks to save memory and enhance deduplication #139

Comments

wychen commented Apr 27, 2023

mhx commented May 25, 2023