Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TicKV Fragmentation #3758

Open
AndrewImwalle opened this issue Dec 12, 2023 · 4 comments
Open

TicKV Fragmentation #3758

AndrewImwalle opened this issue Dec 12, 2023 · 4 comments

Comments

@AndrewImwalle
Copy link
Contributor

As mentioned in the SPEC for TicKV, one huge limitation of TicKV is Fragmentation. This is where a region could be classified as full while only having one valid object inside a region. This issue is extremely prevalent when randomly writing to and erasing from TicKV, as the flash will “fill up” without having many valid entries, preventing the further use of TicKV.

I would like to implement an advanced runtime garbage collection which would solve this problem, and I would like input on the possible solutions.

The first solution I would like to propose would be a per-region operation.

  • Called when a key is invalidated/zeroized.
  • Checks how much space of the region would be recovered by performing the operation.
    This would likely be a % of the region.
  • Valid keys that are inside the region would be appended to nearby regions.
    This would be done as it would if the current region was already full.
  • After all valid keys are appended elsewhere, the region is erased.

The benefit of this solution is that it would maintain uniform wear in the regions and would not require a reserved region. The drawback is that the keys will exist outside their original regions, which would raise the lookup time. Additionally, I am unsure as to what threshold would be best. I would appreciate feedback on what the optimal threshold would be.

The second solution I would like to propose would be a part-wide operation. This one is similar to the idea listed in the SPEC, with a few optimizations to limit the amount of wear on the chip.

  • Move the valid keys across multiple regions to the reserved region
  • When the reserved region is full, erase the regions that have their valid keys in the reserved region
  • Move the valid objects back to their original regions
  • Erase the reserved region

This still ensures that we don’t lose data on power loss, but still has the same problems as the solution in the SPEC. There would have to be a garbage collecting state to restore after power loss, and the requirement on wear leveling would still be broken on the reserved region. I also don’t know when this could be called.

If you can think of another solution that would fix this problem, I would love to discuss it.

@alistair23
Copy link
Contributor

alistair23 commented Dec 15, 2023

Thanks for raising this.

You are correct that fragmentation is an issue. It's a hard problem to solve though. Two of the key goals of TicKV are:

 * Power loss resilient
 * Maintain data integrity and detect media errors

So implementing a garbage collection is difficult as we want to ensure that no data is lost on power loss.

The first solution I would like to propose would be a per-region operation.

* Called when a key is invalidated/zeroized.

I don't think this is a good idea. This would add over head to removing keys, which the user doesn't necessarily want. If we do add a defrag option it should be something that is explicitly called.

* Checks how much space of the region would be recovered by performing the operation.
  This would likely be a % of the region.

* Valid keys that are inside the region would be appended to nearby regions.
  This would be done as it would if the current region was already full.

* After all valid keys are appended elsewhere, the region is erased.

Do we move the keys back? If not that increases lookup time for future gets. If we do move them back we are fragmenting other parts.

The benefit of this solution is that it would maintain uniform wear in the regions and would not require a reserved region. The drawback is that the keys will exist outside their original regions, which would raise the lookup time. Additionally, I am unsure as to what threshold would be best. I would appreciate feedback on what the optimal threshold would be.

Exactly!

The threshold really is going to be application specific. We could make it an option, but that seems confusing. It might be best to just defrag the first region that can be and then return. Allowing an application to call the function until all regions have been searched. That way applications can do it in the background when required, without taking the entire overhead of going through all of flash.

@alistair23
Copy link
Contributor

The second solution I would like to propose would be a part-wide operation. This one is similar to the idea listed in the SPEC, with a few optimizations to limit the amount of wear on the chip.

* Move the valid keys across multiple regions to the reserved region

* When the reserved region is full, erase the regions that have their valid keys in the reserved region

* Move the valid objects back to their original regions

* Erase the reserved region

This still ensures that we don’t lose data on power loss, but still has the same problems as the solution in the SPEC. There would have to be a garbage collecting state to restore after power loss, and the requirement on wear leveling would still be broken on the reserved region. I also don’t know when this could be called.

That's a good improvement but still lots of drawbacks

@alistair23
Copy link
Contributor

I don't have any other solutions.

Is fragmentation an issue for you? If it's blocking a use case of yours then we should focus on a solution. I would lean towards your first option. One big worry is what happens if we loose power after copying the keys. We end up with two copies of the data on resume.

I do think having a defrag function that is specifically called and only runs on one or a small number of regions is the way to go. That way we don't block flash accesses for long while doing this.

Either way we somehow need to track state to resume the operation if we get interrupted. That might be easier with a reserved region

@AndrewImwalle
Copy link
Contributor Author

Fragmentation is a blocking use case for me.

I am going to assume that we would utilize parts of my first solution proposal, but applying it to one region and allowing it to be explicitly called. For power loss, setting the state and tracking the region that we are defragmenting to restart/resume the process would prevent copying the key as trying to append a key that exists would return an error code KeyAlreadyExists. Similar functionality for region tracking is built into the garbage collect with the State and Rubbish State.

The threshold will have to be application specific, but we will need a threshold else we could end up calling this function every time there in an invalidate/zeroise key which would cause unnecessary wear on the part. This threshold would just be a % of the region that is able to be recovered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants