TicKV Fragmentation #3758

AndrewImwalle · 2023-12-12T17:07:18Z

As mentioned in the SPEC for TicKV, one huge limitation of TicKV is Fragmentation. This is where a region could be classified as full while only having one valid object inside a region. This issue is extremely prevalent when randomly writing to and erasing from TicKV, as the flash will “fill up” without having many valid entries, preventing the further use of TicKV.

I would like to implement an advanced runtime garbage collection which would solve this problem, and I would like input on the possible solutions.

The first solution I would like to propose would be a per-region operation.

Called when a key is invalidated/zeroized.
Checks how much space of the region would be recovered by performing the operation.
This would likely be a % of the region.
Valid keys that are inside the region would be appended to nearby regions.
This would be done as it would if the current region was already full.
After all valid keys are appended elsewhere, the region is erased.

The benefit of this solution is that it would maintain uniform wear in the regions and would not require a reserved region. The drawback is that the keys will exist outside their original regions, which would raise the lookup time. Additionally, I am unsure as to what threshold would be best. I would appreciate feedback on what the optimal threshold would be.

The second solution I would like to propose would be a part-wide operation. This one is similar to the idea listed in the SPEC, with a few optimizations to limit the amount of wear on the chip.

Move the valid keys across multiple regions to the reserved region
When the reserved region is full, erase the regions that have their valid keys in the reserved region
Move the valid objects back to their original regions
Erase the reserved region

This still ensures that we don’t lose data on power loss, but still has the same problems as the solution in the SPEC. There would have to be a garbage collecting state to restore after power loss, and the requirement on wear leveling would still be broken on the reserved region. I also don’t know when this could be called.

If you can think of another solution that would fix this problem, I would love to discuss it.

alistair23 · 2023-12-15T04:14:28Z

Thanks for raising this.

You are correct that fragmentation is an issue. It's a hard problem to solve though. Two of the key goals of TicKV are:

 * Power loss resilient
 * Maintain data integrity and detect media errors

So implementing a garbage collection is difficult as we want to ensure that no data is lost on power loss.

The first solution I would like to propose would be a per-region operation.
* Called when a key is invalidated/zeroized.

I don't think this is a good idea. This would add over head to removing keys, which the user doesn't necessarily want. If we do add a defrag option it should be something that is explicitly called.

* Checks how much space of the region would be recovered by performing the operation.
  This would likely be a % of the region.

* Valid keys that are inside the region would be appended to nearby regions.
  This would be done as it would if the current region was already full.

* After all valid keys are appended elsewhere, the region is erased.

Do we move the keys back? If not that increases lookup time for future gets. If we do move them back we are fragmenting other parts.

The benefit of this solution is that it would maintain uniform wear in the regions and would not require a reserved region. The drawback is that the keys will exist outside their original regions, which would raise the lookup time. Additionally, I am unsure as to what threshold would be best. I would appreciate feedback on what the optimal threshold would be.

Exactly!

The threshold really is going to be application specific. We could make it an option, but that seems confusing. It might be best to just defrag the first region that can be and then return. Allowing an application to call the function until all regions have been searched. That way applications can do it in the background when required, without taking the entire overhead of going through all of flash.

alistair23 · 2023-12-15T04:21:46Z

The second solution I would like to propose would be a part-wide operation. This one is similar to the idea listed in the SPEC, with a few optimizations to limit the amount of wear on the chip.
* Move the valid keys across multiple regions to the reserved region

* When the reserved region is full, erase the regions that have their valid keys in the reserved region

* Move the valid objects back to their original regions

* Erase the reserved region
This still ensures that we don’t lose data on power loss, but still has the same problems as the solution in the SPEC. There would have to be a garbage collecting state to restore after power loss, and the requirement on wear leveling would still be broken on the reserved region. I also don’t know when this could be called.

That's a good improvement but still lots of drawbacks

alistair23 · 2023-12-15T04:25:47Z

I don't have any other solutions.

Is fragmentation an issue for you? If it's blocking a use case of yours then we should focus on a solution. I would lean towards your first option. One big worry is what happens if we loose power after copying the keys. We end up with two copies of the data on resume.

I do think having a defrag function that is specifically called and only runs on one or a small number of regions is the way to go. That way we don't block flash accesses for long while doing this.

Either way we somehow need to track state to resume the operation if we get interrupted. That might be easier with a reserved region

AndrewImwalle · 2024-01-02T22:04:59Z

Fragmentation is a blocking use case for me.

I am going to assume that we would utilize parts of my first solution proposal, but applying it to one region and allowing it to be explicitly called. For power loss, setting the state and tracking the region that we are defragmenting to restart/resume the process would prevent copying the key as trying to append a key that exists would return an error code KeyAlreadyExists. Similar functionality for region tracking is built into the garbage collect with the State and Rubbish State.

The threshold will have to be application specific, but we will need a threshold else we could end up calling this function every time there in an invalidate/zeroise key which would cause unnecessary wear on the part. This threshold would just be a % of the region that is able to be recovered.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TicKV Fragmentation #3758

TicKV Fragmentation #3758

AndrewImwalle commented Dec 12, 2023

alistair23 commented Dec 15, 2023 •

edited

alistair23 commented Dec 15, 2023

alistair23 commented Dec 15, 2023

AndrewImwalle commented Jan 2, 2024

TicKV Fragmentation #3758

TicKV Fragmentation #3758

Comments

AndrewImwalle commented Dec 12, 2023

alistair23 commented Dec 15, 2023 • edited

alistair23 commented Dec 15, 2023

alistair23 commented Dec 15, 2023

AndrewImwalle commented Jan 2, 2024

alistair23 commented Dec 15, 2023 •

edited