Skip to content
This repository has been archived by the owner on Apr 5, 2024. It is now read-only.

failed run Percona-Lab/sysbench-tpcc #109

Open
Laisky opened this issue Jul 29, 2022 · 6 comments
Open

failed run Percona-Lab/sysbench-tpcc #109

Laisky opened this issue Jul 29, 2022 · 6 comments

Comments

@Laisky
Copy link
Contributor

Laisky commented Jul 29, 2022

Environment

CPU

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 57 bits virtual
CPU(s):                          104
On-line CPU(s) list:             0-103
Thread(s) per core:              2
Core(s) per socket:              26
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           106
Model name:                      Intel(R) Xeon(R) Gold 5320 CPU @ 2.20GHz

Memory: 247 GB

Disk: 2 TB

Reproduce

Steps:

  1. start EdgelessDB in Docker
  2. send manifest to create test user (use password instead of mTLS)
  3. run sysbench-tpcc, db crashed

1. Run EdgelessDB

run by docker compose

version: '3'
services:
  edgelessdb:
    image: ghcr.io/edgelesssys/edgelessdb-sgx-4gb
    restart: always
    network_mode: host
    volumes:
      - /var/lib/edgelessdb:/data
    devices:
      - /dev/sgx_enclave
      - /dev/sgx_provision
    environment:
      - PCCS_ADDR=127.0.0.1:8081

manifest.json:

{
    "sql": [
        "CREATE USER root@'%' REQUIRE ISSUER '/CN=rootCA' SUBJECT '/CN=root'",
        "CREATE USER test@'%' IDENTIFIED BY 'test1234'",
        "GRANT ALL PRIVILEGES ON *.* TO root WITH GRANT OPTION",
        "GRANT ALL PRIVILEGES ON *.* TO test",
        "FLUSH PRIVILEGES",
        "CREATE DATABASE test",
        "CREATE TABLE test.data (i INT)"
    ],
    "ca": "xxx",
    "debug": false,
    "recovery": "xxx"
}

db logs:

debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libsgx-dcap-default-qpl.
(Reading database ... 4980 files and directories currently installed.)
Preparing to unpack .../libsgx-dcap-default-qpl_1.14.100.3-focal1_amd64.deb ...
Unpacking libsgx-dcap-default-qpl (1.14.100.3-focal1) ...
Setting up libsgx-dcap-default-qpl (1.14.100.3-focal1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.9) ...
PCCS_URL: https://127.0.0.1:8081/sgx/certification/v3/
[EDB] 2022/07/29 01:29:53 EdgelessDB v0.3.0 (e712b823a0469e6f96cc57029477323ac0d47e2e)
[EDB] 2022/07/29 01:29:53 DB has not been initialized, waiting for manifest.
[EDB] 2022/07/29 01:29:53 HTTP REST API listening on :8080
[EDB] 2022/07/29 01:31:21 initializing ...
2022-07-29  1:31:21 0 [Note] edb (server 10.6.8-MariaDB) starting as process 35 ...
restarting ...
[EDB] 2022/07/29 01:31:34 EdgelessDB v0.3.0 (e712b823a0469e6f96cc57029477323ac0d47e2e)
[EDB] 2022/07/29 01:31:34 starting up ...
2022-07-29  1:31:34 0 [Note] edb (server 10.6.8-MariaDB) starting as process 35 ...

2. Run TPCC

./tpcc.lua \
    --mysql-host=127.0.0.1 \
    --mysql-user=test \
    --mysql-db=test \
    --mysql-password=test1234 \
    --mysql_storage_engine=rocksdb \
    --time=300 \
    --threads=64 \
    --report-interval=1 \
    --tables=10 \
    --scale=100 \
    --use_fk=0 \
    --mysql_table_options='COLLATE latin1_bin' \
    --trx_level=RC \
    --db-driver=mysql prepare

Ps. After several attempts I found that limiting the number of threads to 8 would not crash the database.

Ps1. I found the magic threshold is 46. edgelessdb will break when number of threads >= 47. maybe is caused by NumTCS?

then db crashed:

sysbench log:

sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)

Initializing worker threads...

Waiting on tables 30 sec

Creating tables: 4

Waiting on tables 30 sec

Waiting on tables 30 sec

Creating tables: 9

FATAL: unable to connect to MySQL server on host '127.0.0.1', port 3306, aborting...
FATAL: error 2013: Lost connection to MySQL server at 'reading initial communication packet', system error: 0

db log:

./edb: line 3:    15 Aborted                 (core dumped) erthost "$DIR/edb-enclave.signed" "$@"

and db cannot restart:

[erthost] loading enclave ...
[erthost] entering enclave ...
PCCS_URL: https://127.0.0.1:8081/sgx/certification/v3/
[EDB] 2022/07/29 01:42:59 EdgelessDB v0.3.0 (e712b823a0469e6f96cc57029477323ac0d47e2e)
[EDB] 2022/07/29 01:42:59 starting up ...
2022-07-29  1:42:59 0 [Note] edb (server 10.6.8-MariaDB) starting as process 15 ...
2022-07-29  1:42:59 0 [Note] RocksDB: 4 column families found
2022-07-29  1:42:59 0 [Note] RocksDB: Column Families at start:
eRocksDB failed to initialize correctly.
This likely failed due to an incorrect key being used to decrypt the database or the database being corrupted.
Make sure you run edb on the same machine as it was initialized on.
edb has exited unexpectedly (exit code: 1).

Another problem

BTW, sometimes I get stuck when starting edgelessdb.

db's last log is PCCS_URL: https://127.0.0.1:8081/sgx/certification/v3/.

screenshop-2022-07-29T01-29-30Z

@Laisky
Copy link
Contributor Author

Laisky commented Jul 29, 2022

I confirmed that the crash problem is indeed caused by NumTCS, I changed NumTCS to 1024 and solved crash problem.

@Laisky
Copy link
Contributor Author

Laisky commented Jul 29, 2022

I guess it's probably because I have too many cpu cores on my machine that the mariadb.thread_pool_size exceeds the NumTCS.

@thomasten
Copy link
Member

Thanks for reporting and investigating the problem! We'll check if we can configure mariadb to not exceed NumTCS. We'll probably also increase NumTCS for the 4GB heap variant of EdgelessDB.

@Laisky
Copy link
Contributor Author

Laisky commented Aug 1, 2022

Another major concern is that after a crashed, sometimes there is a data corupt problem that prevents the database process from restarting

eRocksDB failed to initialize correctly.
This likely failed due to an incorrect key being used to decrypt the database or the database being corrupted.
Make sure you run edb on the same machine as it was initialized on.
edb has exited unexpectedly (exit code: 1).

@thomasten
Copy link
Member

Thanks for pointing this out. I didn't read it carefully enough. We'll investigate.

@Laisky
Copy link
Contributor Author

Laisky commented Aug 2, 2022

One interesting thing I found is that edb's TPCC metrics do not increase linearly with the client threads.

EdgelessDB Vs. MyRocks

(edgelessdb with 1024 NumTCS & 8GB Heapsize)

cc @thomasten

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants