Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling issueFullFileCheck() does not seem to remove old unreferenced objects #655

Open
andriikovalov opened this issue Sep 15, 2023 · 2 comments

Comments

@andriikovalov
Copy link

Environment Details

  • MicroStream Version: 08.01.01-MS-GA
  • JDK version: 17
  • OS: linux

Describe the bug

When I reset and store the application root, after calling issueFullFileCheck() the old inaccessible data seems to be still in the storage (judging from the size).

To Reproduce

public class App {
    public static void main(String[] args) throws IOException {
        Path storagePath = Files.createTempDirectory("microstream");
        final EmbeddedStorageManager storageManager = EmbeddedStorage.start(storagePath);

        for (int i = 0; i < 50; i++) {
            storageManager.setRoot(new byte[1000000]); // 1 MB
            storageManager.storeRoot();
        }

        storageManager.issueFullFileCheck();
        storageManager.shutdown();

        System.out.println("Storage size " + getSize(storagePath));  // Expected ~1 MB, actual ~50 MB
    }

    public static long getSize(Path dir) throws IOException {
        return Files.walk(dir).map(Path::toFile).filter(File::isFile).mapToLong(File::length).sum();
    }
}

Expected behavior

The storage is shrinked to only contain the current root.

Additional context

I observe the same behaviour when I wrap my byte array into a root object, and repeatedly re-initialize the array and call storeRoot().

class Root {
    public byte[] data;
}
@hg-ms
Copy link
Contributor

hg-ms commented Sep 18, 2023

Hello,
This not a bug. In your example the storage has no time to clean up old data.
The point of time when data gets deleted depends on several factors, the most important ones are:

  • The Java GC
  • The storage GC
  • The storage GC configuration
  • The storage internal cache
  • The write load.
    In your example you create a heavy write load that requires some tweaking of the storage housekeeping.

The Example below should perform better regarding the cleanup.
It ensures that the Java GC and storage GC are executed and sets a very small object live time for the storage cache and increases the time budget for housekeeping.

final EmbeddedStorageManager storageManager = EmbeddedStorage
	.start(Storage.ConfigurationBuilder()
		.setEntityCacheEvaluator(Storage.EntityCacheEvaluator(1000, 10))
		.setHousekeepingController(Storage.HousekeepingController(100, 1000_000_000))
		.setStorageFileProvider(Storage.FileProvider(storagePath)).createConfiguration());

for (int i = 0; i < 50; i++) {
	storageManager.setRoot(new byte[1000000]); // 1 MB
	storageManager.storeRoot();
}

System.gc();
storageManager.issueFullGarbageCollection();
storageManager.issueFullFileCheck();

@andriikovalov
Copy link
Author

Okay, thank you. I thought that explicitly calling housekeeping would delete unreachable data (as mentioned in #179), but in fact it is not guaranteed, and cannot be enforced, right?
Your snippet with "aggressive housekeeping" also gives me the same result (storage not shrinked).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants