Calling `issueFullFileCheck()` does not seem to remove old unreferenced objects #655

andriikovalov · 2023-09-15T10:42:47Z

Environment Details

MicroStream Version: 08.01.01-MS-GA
JDK version: 17
OS: linux

Describe the bug

When I reset and store the application root, after calling issueFullFileCheck() the old inaccessible data seems to be still in the storage (judging from the size).

To Reproduce

public class App {
    public static void main(String[] args) throws IOException {
        Path storagePath = Files.createTempDirectory("microstream");
        final EmbeddedStorageManager storageManager = EmbeddedStorage.start(storagePath);

        for (int i = 0; i < 50; i++) {
            storageManager.setRoot(new byte[1000000]); // 1 MB
            storageManager.storeRoot();
        }

        storageManager.issueFullFileCheck();
        storageManager.shutdown();

        System.out.println("Storage size " + getSize(storagePath));  // Expected ~1 MB, actual ~50 MB
    }

    public static long getSize(Path dir) throws IOException {
        return Files.walk(dir).map(Path::toFile).filter(File::isFile).mapToLong(File::length).sum();
    }
}

Expected behavior

The storage is shrinked to only contain the current root.

Additional context

I observe the same behaviour when I wrap my byte array into a root object, and repeatedly re-initialize the array and call storeRoot().

class Root {
    public byte[] data;
}

The text was updated successfully, but these errors were encountered:

hg-ms · 2023-09-18T06:15:23Z

Hello,
This not a bug. In your example the storage has no time to clean up old data.
The point of time when data gets deleted depends on several factors, the most important ones are:

The Java GC
The storage GC
The storage GC configuration
The storage internal cache
The write load.
In your example you create a heavy write load that requires some tweaking of the storage housekeeping.

The Example below should perform better regarding the cleanup.
It ensures that the Java GC and storage GC are executed and sets a very small object live time for the storage cache and increases the time budget for housekeeping.

final EmbeddedStorageManager storageManager = EmbeddedStorage
	.start(Storage.ConfigurationBuilder()
		.setEntityCacheEvaluator(Storage.EntityCacheEvaluator(1000, 10))
		.setHousekeepingController(Storage.HousekeepingController(100, 1000_000_000))
		.setStorageFileProvider(Storage.FileProvider(storagePath)).createConfiguration());

for (int i = 0; i < 50; i++) {
	storageManager.setRoot(new byte[1000000]); // 1 MB
	storageManager.storeRoot();
}

System.gc();
storageManager.issueFullGarbageCollection();
storageManager.issueFullFileCheck();

andriikovalov · 2023-09-19T14:31:40Z

Okay, thank you. I thought that explicitly calling housekeeping would delete unreachable data (as mentioned in #179), but in fact it is not guaranteed, and cannot be enforced, right?
Your snippet with "aggressive housekeeping" also gives me the same result (storage not shrinked).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling `issueFullFileCheck()` does not seem to remove old unreferenced objects #655

Calling `issueFullFileCheck()` does not seem to remove old unreferenced objects #655

andriikovalov commented Sep 15, 2023

hg-ms commented Sep 18, 2023

andriikovalov commented Sep 19, 2023

Calling issueFullFileCheck() does not seem to remove old unreferenced objects #655

Calling issueFullFileCheck() does not seem to remove old unreferenced objects #655

Comments

andriikovalov commented Sep 15, 2023

Environment Details

Describe the bug

To Reproduce

Expected behavior

Additional context

hg-ms commented Sep 18, 2023

andriikovalov commented Sep 19, 2023

Calling `issueFullFileCheck()` does not seem to remove old unreferenced objects #655

Calling `issueFullFileCheck()` does not seem to remove old unreferenced objects #655