Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[badgerdb] vlog growing unbounded - consider adding GC and exposing options #370

Open
woodzy opened this issue Feb 19, 2024 · 6 comments
Open
Labels
pending ok Released, awaiting confirmation of resolution

Comments

@woodzy
Copy link

woodzy commented Feb 19, 2024

I noticed that BadgerDB storage started growing unbounded (it was many files and several gigabytes quite quickly). Looks like the Badgerhold/Badger Storage example doesn't provide a way for the user to utilize GC correctly per Badgers docs. https://github.com/dgraph-io/badger/blob/main/db.go#L1225

Fairly sure the option tuning is probably useful to expose as well somehow -- for example ValueLogFileSize has a pretty large default

I didn't plumb up a full fix, just copied the default and started experimenting, but this showed that something like option tuning and GC can resolve the infinite growth modified a copy of the default to include some signalling and a regular GC call.

I would recommend at least adding the GC as a default. I would also suggest lowering some of the default or exposing them.

        go func() {
                ticker := time.NewTicker(5 * time.Minute)
                defer ticker.Stop()
                for { 
                        select {
                                case <- ticker.C:
                                        _ = h.db.Badger().RunValueLogGC(0.7)
                                        continue
                                case <- h.doneCh:
                                        return
                        }
                }
        }()
@werbenhu
Copy link
Member

werbenhu commented Feb 27, 2024

I've checked the official documentation of Badger regarding garbage collection. @woodzy , you're right.

Since neither of us has experience using BadgerDB in a production environment, if you could, it would be great if you could submit a PR. This would be very helpful for most users who use BadgerDB as a persistent database.

Garbage Collection

Badger values need to be garbage collected, because of two reasons:

  • Badger keeps values separately from the LSM tree. This means that the compaction operations that clean up the LSM tree do not touch the values at all. Values need to be cleaned up separately.

  • Concurrent read/write transactions could leave behind multiple values for a single key, because they are stored with different versions. These could accumulate, and take up unneeded space beyond the time these older versions are needed.

DB.RunValueLogGC(): This method is designed to do garbage collection while Badger is online. Along with randomly picking a file, it uses statistics generated by the LSM-tree compactions to pick files that are likely to lead to maximum space reclamation. It is recommended to be called during periods of low activity in your system, or periodically. One call would only result in removal of at max one log file. As an optimization, you could also immediately re-run it whenever it returns nil error (indicating a successful value log GC), as shown below.

ticker := time.NewTicker(5 * time.Minute)
defer ticker.Stop()
for range ticker.C {
again:
 err := db.RunValueLogGC(0.7)
 if err == nil {
       goto again
 }
}

Memory usage

Badger’s memory usage can be managed by tweaking several options available in the Options struct that is passed in when opening the database using DB.Open.

  • Number of memtables (Options.NumMemtables) If you modify Options.NumMemtables, also adjust Options.NumLevelZeroTables and Options.NumLevelZeroTablesStall accordingly.
  • Number of concurrent compactions (Options.NumCompactors)
  • Size of table (Options.BaseTableSize)
  • Size of value log file (Options.ValueLogFileSize)
    If you want to decrease the memory usage of Badger instance, tweak these options (ideally one at a time) until you achieve the desired memory usage.

@werbenhu
Copy link
Member

werbenhu commented Mar 4, 2024

@woodzy Can this #371 fix the issue?

@woodzy
Copy link
Author

woodzy commented Mar 5, 2024

Thanks! These look like better implementations of my local workarounds during experimentation.

mochi-co added a commit that referenced this issue Mar 18, 2024
* For issues #370, #369, and #363, add BadgerDB garbage collection.

* Add default configuration for defaultGcInterval.

* Solve DATA RACE.

* Place Badger's configuration in main.go for users to adjust as needed.

* Add TestGcLoop() for coverage.

* Modify GcInterval to shorten test time.

* Add the GcDiscardRatio option for the Badger hook, and include more detailed comments in the example.

---------

Co-authored-by: JB <28275108+mochi-co@users.noreply.github.com>
@mochi-co
Copy link
Collaborator

This should be fixed in v2.6.0 - please let us know!

@OpenJarvisAI
Copy link

image

It not fix, still growing on mem

@werbenhu
Copy link
Member

@woodzy @OpenJarvisAI Could this issue be closed?

@mochi-co mochi-co added the pending ok Released, awaiting confirmation of resolution label Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending ok Released, awaiting confirmation of resolution
Projects
None yet
Development

No branches or pull requests

4 participants