Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPC-H Benchmark: item run limitation does not work in suffled mode #2389

Open
mweisgut opened this issue Jul 30, 2021 · 8 comments
Open

TPC-H Benchmark: item run limitation does not work in suffled mode #2389

mweisgut opened this issue Jul 30, 2021 · 8 comments
Labels

Comments

@mweisgut
Copy link
Collaborator

mweisgut commented Jul 30, 2021

Executing the TPC-H benchmark in shuffled mode (-m Shuffled) with a runs-per-item limitation of x and a time limit that is high enough so that the query can be executed x times results in an incorrect number of executions.


Steps to Reproduce

Execute ./hyriseBenchmarkTPCH -t 9999999 -r 10 -m Shuffled -s 1 -o output.json

Expected Behavior

Each query item is executed 10 times.

Actual Behavior

The total number of query item executions is 10. Each item is executed at most once. With a higher runs-per-item limitation, the number of runs per item can also be higher.

Log (click to expand)
- Writing benchmark results to 'output.json'
- Running in single-threaded mode
- 1 simulated client is scheduling items
- Running benchmark in 'Shuffled' mode
- Encoding is 'Dictionary'
- Chunk size is 65535
- Max runs per item is 10
- Max duration per item is 9999999 seconds
- No warmup runs are performed
- Caching tables as binary files
- Not tracking SQL metrics
- Benchmarking Queries: [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 ]
- TPC-H scale factor is 1
- Using prepared statements: no
- Loading/Generating tables
-  Loading table 'supplier' from cached binary "tpch_cached_tables/sf-1.000000/supplier.bin" (3 ms 20 µs)
-  Loading table 'lineitem' from cached binary "tpch_cached_tables/sf-1.000000/lineitem.bin" (559 ms 440 µs)
-  Loading table 'orders' from cached binary "tpch_cached_tables/sf-1.000000/orders.bin" (157 ms 628 µs)
-  Loading table 'region' from cached binary "tpch_cached_tables/sf-1.000000/region.bin" (34 µs 209 ns)
-  Loading table 'part' from cached binary "tpch_cached_tables/sf-1.000000/part.bin" (24 ms 364 µs)
-  Loading table 'customer' from cached binary "tpch_cached_tables/sf-1.000000/customer.bin" (40 ms 356 µs)
-  Loading table 'nation' from cached binary "tpch_cached_tables/sf-1.000000/nation.bin" (64 µs 127 ns)
-  Loading table 'partsupp' from cached binary "tpch_cached_tables/sf-1.000000/partsupp.bin" (123 ms 406 µs)
- Loading/Generating tables done (908 ms 616 µs)
- Encoding tables (if necessary) and generating pruning statistics
-  Encoding 'nation' - no encoding necessary (1 ms 178 µs)
-  Encoding 'region' - no encoding necessary (2 ms 982 µs)
-  Encoding 'supplier' - no encoding necessary (5 ms 707 µs)
-  Encoding 'part' - no encoding necessary (19 ms 369 µs)
-  Encoding 'customer' - no encoding necessary (21 ms 407 µs)
-  Encoding 'partsupp' - no encoding necessary (38 ms 36 µs)
-  Encoding 'orders' - no encoding necessary (151 ms 94 µs)
-  Encoding 'lineitem' - no encoding necessary (473 ms 42 µs)
- Encoding tables and generating pruning statistic done (474 ms 89 µs)
- Writing tables into binary files if necessary
- Writing tables into binary files done (33 µs 780 ns)
- Adding tables to StorageManager and generating table statistics
-  Added 'nation' (5 ms 17 µs)
-  Added 'region' (7 ms 503 µs)
-  Added 'supplier' (105 ms 547 µs)
-  Added 'customer' (348 ms 481 µs)
-  Added 'part' (406 ms 82 µs)
-  Added 'partsupp' (1 s 338 ms)
-  Added 'orders' (2 s 166 ms)
-  Added 'lineitem' (8 s 104 ms)
- Adding tables to StorageManager and generating table statistics done (8 s 105 ms)
- No indexes created as --indexes was not specified or set to false
- Starting Benchmark...
[PERF] Unresolved iterator created for AbstractPosList at src/lib/storage/pos_lists/abstract_pos_list.cpp:5
	Performance can be affected. This warning is only shown once.

[PERF] ColumnVsColumnTableScan using type-erased iterators at src/lib/operators/table_scan/column_vs_column_table_scan_impl.cpp:113
	Performance can be affected. This warning is only shown once.

[PERF] Using type-erased accessor as the ReferenceSegmentIterable is type-erased itself at src/lib/storage/reference_segment/reference_segment_iterable.hpp:93
	Performance can be affected. This warning is only shown once.

- Results for TPC-H 01
  -> Executed 1 times
- Results for TPC-H 02
  -> Executed 0 times
- Results for TPC-H 03
  -> Executed 1 times
- Results for TPC-H 04
  -> Executed 0 times
- Results for TPC-H 05
  -> Executed 1 times
- Results for TPC-H 06
  -> Executed 0 times
- Results for TPC-H 07
  -> Executed 1 times
- Results for TPC-H 08
  -> Executed 0 times
- Results for TPC-H 09
  -> Executed 1 times
- Results for TPC-H 10
  -> Executed 0 times
- Results for TPC-H 11
  -> Executed 1 times
- Results for TPC-H 12
  -> Executed 1 times
- Results for TPC-H 13
  -> Executed 0 times
- Results for TPC-H 14
  -> Executed 0 times
- Results for TPC-H 15
  -> Executed 1 times
- Results for TPC-H 16
  -> Executed 0 times
- Results for TPC-H 17
  -> Executed 1 times
- Results for TPC-H 18
  -> Executed 0 times
- Results for TPC-H 19
  -> Executed 1 times
- Results for TPC-H 20
  -> Executed 0 times
- Results for TPC-H 21
  -> Executed 0 times
- Results for TPC-H 22
  -> Executed 0 times

JSON output: output.json.log

Build Information

CMake command

cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -GNinja -DCMAKE_CXX_FLAGS=-fcolor-diagnostics

LLVM

➜  ~ brew info llvm
llvm: stable 12.0.1 (bottled), HEAD [keg-only]
@mweisgut mweisgut added the Bug label Jul 30, 2021
@mweisgut
Copy link
Collaborator Author

Or is it actually expected behavior?

@Bouncner
Copy link
Collaborator

Or is it actually expected behavior?

I would definitely say no. I am not even sure if we have ever considered using runs with multiple clients.

@Bensk1
Copy link
Collaborator

Bensk1 commented Aug 2, 2021

Or is it actually expected behavior?

Even if it would be, the documentation would be wrong: Maximum number of runs per item

I am not even sure if we have ever considered using runs with multiple clients.

This issue is independent of multiple clients but an issue of shuffled, isn't it?

@mweisgut
Copy link
Collaborator Author

mweisgut commented Aug 2, 2021

This issue is independent of multiple clients but an issue of shuffled, isn't it?

Right

@Bouncner
Copy link
Collaborator

Bouncner commented Aug 3, 2021

Yes, but not counting it as runs per client doesn’t make sense imo.

@Bouncner
Copy link
Collaborator

What about not supporting runs when the shuffled mode is used? Simply an assert and a message such as The shuffled mode does not support limiting the number of benchmark runs. Use --time to set a time limit for the benchmark run.

@Bouncner
Copy link
Collaborator

And while we're at it: we could change the following output for shuffled runs - Max duration per item is 2400 seconds.

@Bensk1
Copy link
Collaborator

Bensk1 commented Aug 13, 2021

What about not supporting runs when the shuffled mode is used?

Sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants