Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failures on CentOS 8 #169

Open
igchor opened this issue Oct 21, 2022 · 3 comments
Open

Test failures on CentOS 8 #169

igchor opened this issue Oct 21, 2022 · 3 comments

Comments

@igchor
Copy link
Contributor

igchor commented Oct 21, 2022

Describe the bug
List of tests failing (some randomly):

  • NvmCacheTest.EvictToNvmGet
I1021 16:07:49.816931 345143 Factory.cpp:289] Cache file: /tmp/nvmcache-cachedir/345143/navy size: 104857600 truncate: 0
E1021 16:07:49.816977 345143 Factory.cpp:305] Failed to open with o-direct, trying without. Error: open("/tmp/nvmcache-cachedir/345143/navy", 040102, 00666) failed: Invalid argument
I1021 16:07:49.818993 345143 NavySetup.cpp:89] metadataSize: 4194304 bigHashCacheOffset: 52428800 bigHashCacheSize: 52428800
I1021 16:07:49.819003 345143 NavySetup.cpp:121] blockcache: starting offset: 4194304, block cache size: 46137344
I1021 16:07:49.819077 345143 LruPolicy.cpp:37] LRU policy: expected 11 regions
I1021 16:07:49.823701 345143 RegionManager.cpp:50] 11 regions, 4194304 bytes each
I1021 16:07:49.823733 345143 Allocator.cpp:43] Enable priority-based allocation for Allocator. Number of priorities: 1
I1021 16:07:49.823761 345143 BlockCache.cpp:143] Block cache created
I1021 16:07:49.823954 345143 BigHash.cpp:106] BigHash created: buckets: 51200, bucket size: 1024, base offset: 52428800
I1021 16:07:49.823960 345143 BigHash.cpp:111] Reset BigHash
I1021 16:07:49.823976 345143 Driver.cpp:65] Max concurrent inserts: 1000000
I1021 16:07:49.823980 345143 Driver.cpp:66] Max parcel memory: 268435456
I1021 16:07:49.823986 345143 Driver.cpp:325] Reset Navy
I1021 16:07:49.823995 345143 BigHash.cpp:111] Reset BigHash
I1021 16:07:49.824006 345143 BlockCache.cpp:665] Reset block cache
I1021 16:07:49.860485 345143 BigHash.cpp:476] Flush big hash
I1021 16:07:49.860508 345143 BlockCache.cpp:659] Flush block cache
/opt/workspace/cachelib/allocator/nvmcache/tests/NvmCacheTests.cpp:189: Failure
Expected: (nullptr) != (hdl), actual: (nullptr) vs nullptr
  • NvmCacheTest.EvictToNvmGetCheckCtime
I1021 16:07:49.899204 345143 Factory.cpp:289] Cache file: /tmp/nvmcache-cachedir/345143/navy size: 104857600 truncate: 0
E1021 16:07:49.899278 345143 Factory.cpp:305] Failed to open with o-direct, trying without. Error: open("/tmp/nvmcache-cachedir/345143/navy", 040102, 00666) failed: Invalid argument
I1021 16:07:49.901315 345143 NavySetup.cpp:89] metadataSize: 4194304 bigHashCacheOffset: 52428800 bigHashCacheSize: 52428800
I1021 16:07:49.901327 345143 NavySetup.cpp:121] blockcache: starting offset: 4194304, block cache size: 46137344
I1021 16:07:49.901403 345143 LruPolicy.cpp:37] LRU policy: expected 11 regions
I1021 16:07:49.905984 345143 RegionManager.cpp:50] 11 regions, 4194304 bytes each
I1021 16:07:49.906016 345143 Allocator.cpp:43] Enable priority-based allocation for Allocator. Number of priorities: 1
I1021 16:07:49.906047 345143 BlockCache.cpp:143] Block cache created
I1021 16:07:49.906245 345143 BigHash.cpp:106] BigHash created: buckets: 51200, bucket size: 1024, base offset: 52428800
I1021 16:07:49.906251 345143 BigHash.cpp:111] Reset BigHash
I1021 16:07:49.906269 345143 Driver.cpp:65] Max concurrent inserts: 1000000
I1021 16:07:49.906273 345143 Driver.cpp:66] Max parcel memory: 268435456
I1021 16:07:49.906279 345143 Driver.cpp:325] Reset Navy
I1021 16:07:49.906288 345143 BigHash.cpp:111] Reset BigHash
I1021 16:07:49.906300 345143 BlockCache.cpp:665] Reset block cache
F1021 16:07:54.940975 345143 NvmCacheTests.cpp:244] Check failed: hdl
Aborted (core dumped)
  • NvmCacheTest.Delete
  • NvmCacheTest.NvmEvicted
  • BaseAllocatorTest/2.LruRecordAccess (fails rarely):
/opt/workspace/cachelib/../cachelib/allocator/tests/BaseAllocatorTest.h:1822: Failure
Expected: (handle) != (nullptr), actual: nullptr vs (nullptr)
/opt/workspace/cachelib/../cachelib/allocator/tests/BaseAllocatorTest.h:1840: Failure
Expected equality of these values:
  evictedKeys.find(hotKey)
    Which is: 8-byte object <E0-0F 75-01 00-00 00-00>
  evictedKeys.end()
    Which is: 8-byte object <58-41 B8-E0 FE-7F 00-00>
00NSc7eMcs09h670JUBO6aGFi400TiH66D5SNV9iY2jUP8PBDf68E0l06O0LXWX1puAt32974RCRbx6C4k09LlTKOg73rM0srRWX
[  FAILED  ] BaseAllocatorTest/2.LruRecordAccess, where TypeParam = facebook::cachelib::CacheAllocator<facebook::cachelib::TinyLFUCacheTrait> (52 ms)

To Reproduce
Steps to reproduce the behavior:

  1. Compile cachelib
  2. Run ./allocator-test-NvmCacheTests and ./allocator-test-AllocatorTypeTest
    I'm using a docker image with following steps (assuming CacheLib is in /root/CacheLib):
docker run --net=host --shm-size=100G --privileged --cap-add=SYS_ADMIN --cap-add=SYS_PTRACE --tmpfs /tmp -v /root/CacheLib/:/opt/workspace:z -w /opt/workspace/ -it ghcr.io/pmem/cachelib:centos-8streams-main /bin/bash
# inside the docker
mkdir build
cd build
cmake ../cachelib/ -DCMAKE_BUILD_TYPE=Debug -DBUILD_TESTS=1 -DCMAKE_INSTALL_PREFIX=/opt
make install -j
/opt/tests/allocator-test-NvmCacheTests
/opt/tests/allocator-test-AllocatorTypeTest

Expected behavior
No test failures

Desktop (please complete the following information):

  • OS: CentOS 8
  • Docker image: ghcr.io/pmem/cachelib:centos-8streams-main
  • Cachelib@ ac52e8e

Additional context
Some of the nvm tests seems to pass when I insert a sleep command after each operation.

@agordon
Copy link
Contributor

agordon commented Oct 25, 2022

Hi @igchor, is there any possibility to check the same on the host machine (outside Docker), to eliminate the docker variable ?

@igchor
Copy link
Contributor Author

igchor commented Oct 25, 2022

Hi, unfortunately, CacheLib dependencies do not compile on that host machine (fedora 34). I tried running ./contrib/build.sh -j -d -t. There are problems either with folly, fizz, or wangle. I also tried different versions but no luck. Do you maybe have some CacheLib commit id that is known to work on fedora 34?

Those dependencies problems are the reason why we created that docker image (to always use the same dependencies versions which we know work fine).

@byrnedj
Copy link
Contributor

byrnedj commented Oct 26, 2022

Hi, I can confirm the following tests are failing on Ubuntu 18.04 with gcc 8.4

  • NvmCacheTest.EvictToNvmGet
  • NvmCacheTest.EvictToNvmGetCheckCtime

image

I did not get BaseAllocatorTest/2.LruRecordAccess to fail.

facebook-github-bot pushed a commit that referenced this issue Mar 1, 2023
Summary:
This change fixes following flaky tests in NvmCacheTests.
* NvmCacheTest.Delete
* NvmCacheTest.NvmEvicted
* NvmCacheTest.EvictToNvmGetCheckCtime

The root cause of the failures are essentially the same as D42443647 (5e7ff9a) which fixed the problem for
NvmCacheTest.EvictToNvmGet; we are inserting enough items that could be spilled to NVM cache, where
the NvmCache::put() can be dropped and the item is evicted completely when the delete operations
(and tombstones) issued as part of the insertion are still outstanding. In order to fix the problem,
this change flushes the NVM cache periodically during the insertions.

Also, since this could cause more regions are used, the size of NVM cache needs to be increased.
This change bumps the default size of NVM cache to 200MB (previous 100MB). Also, the size of persist
storage used in the test PersistenceCache has been bumped by 100MB accordingly, i.e., from 400MB to
500MB.

This change addresses the github issue #169

Reviewed By: therealgymmy

Differential Revision: D43592888

fbshipit-source-id: f0968884eb39fb5728b59129e98345df3240f01e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants