Skip to content
This repository has been archived by the owner on May 9, 2024. It is now read-only.

Aggregation, executed on a table projection, is much slower, than on an imported table #696

Open
AndreyPavlenko opened this issue Oct 11, 2023 · 4 comments

Comments

@AndreyPavlenko
Copy link
Contributor

AndreyPavlenko commented Oct 11, 2023

Code to reproduce:

import pyhdk
from time import time
from numpy.random import random_integers

hdk = pyhdk.hdk.HDK()
ht = hdk.import_pydict({"a": random_integers(0, 1000, 300_000_000)})
t = time()
result1 = ht.proj("a").agg("a", "count").run()
print(f"Imported table time: {time() - t}")

ht = ht.proj("a").run()
t = time()
result2 = ht.agg("a", "count").run()
print(f"Projected table time: {time() - t}")
assert result1.to_arrow() == result2.to_arrow()

Output:

Imported table time: 0.17155766487121582
Projected table time: 18.890812397003174
@ienkovich
Copy link
Contributor

When you run a projection with default options, you get a row-wise lazily fetched result set. It then takes some time to fetch such data. You can improve timings using
hdk = pyhdk.init(enable_columnar_output=True, enable_lazy_fetch=False)
I guess we use these options in Modin by default. There still will be some differences because metadata will have to be re-computed for the projection result.

@AndreyPavlenko
Copy link
Contributor Author

hdk = pyhdk.init(enable_columnar_output=True, enable_lazy_fetch=False)

These options make the query faster, but still it's much slower:

Imported table time: 0.12773728370666504
Projected table time: 7.614111661911011

I guess we use these options in Modin by default

It depends on the query - https://github.com/modin-project/modin/blob/master/modin/experimental/core/execution/native/implementations/hdk_on_native/partitioning/partition_manager.py#L261

@ienkovich
Copy link
Contributor

I don't see such a difference. Here is what I got for this test:

Imported table time: 0.07821106910705566
Projected table time: 0.7270689010620117

There is ~600ms difference and according to debug timers, it is all due to metadata computation for the projection result. Can you check debug timers to reveal what causes the difference in your case? (you can add enable_debug_timer=True to init call).

@AndreyPavlenko
Copy link
Contributor Author

AndreyPavlenko commented Oct 12, 2023

Debug timers:

[2023-10-12 21:13:54.133309] [0x00007f75b3f12740] [info]    0 0 RelAlgExecutor.cpp:162 DEBUG_TIMER thread_id(0)
442ms total duration for executeRelAlgQuery
  442ms start(0ms) executeRelAlgQueryNoRetry RelAlgExecutor.cpp:218
    0ms start(0ms) Query pre-execution steps RelAlgExecutor.cpp:219
    442ms start(0ms) execute RelAlgExecutor.cpp:386
      442ms start(0ms) executeStep RelAlgExecutor.cpp:935
        442ms start(0ms) executeWorkUnit RelAlgExecutor.cpp:1429
          12ms start(0ms) compileWorkUnit NativeCodegen.cpp:1423
            0ms start(1ms) markDeadRuntimeFuncs NativeCodegen.cpp:812
            3ms start(1ms) generateNativeCPUCode Backend.cpp:95
              3ms start(1ms) optimize_ir HelperFunctions.cpp:163
          0ms start(12ms) ExecutionKernel::run ExecutionKernel.cpp:126
          0ms start(12ms) fetchChunks Execute.cpp:2975
          0ms start(12ms) create QueryExecutionContext.cpp:94
              New thread(1)
                0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                0ms start(0ms) fetchChunks Execute.cpp:2975
                0ms start(0ms) create QueryExecutionContext.cpp:94
                15ms start(0ms) executePlan Execute.cpp:3344
                  15ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
              End thread(1)
            New thread(2)
              0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
              0ms start(0ms) fetchChunks Execute.cpp:2975
              0ms start(0ms) create QueryExecutionContext.cpp:94
              15ms start(0ms) executePlan Execute.cpp:3344
                15ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
            End thread(2)
          5ms start(12ms) executePlan Execute.cpp:3344
            5ms start(12ms) launchCpuCode QueryExecutionContext.cpp:565
                New thread(3)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  15ms start(0ms) executePlan Execute.cpp:3344
                    15ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                End thread(3)
                New thread(4)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  8ms start(0ms) executePlan Execute.cpp:3344
                    8ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                End thread(4)
                New thread(5)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  15ms start(0ms) executePlan Execute.cpp:3344
                    15ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                End thread(5)
                New thread(6)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  15ms start(0ms) executePlan Execute.cpp:3344
                    15ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                End thread(6)
                New thread(7)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  7ms start(0ms) executePlan Execute.cpp:3344
                    7ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                End thread(7)
                New thread(8)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  15ms start(0ms) executePlan Execute.cpp:3344
                    15ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                End thread(8)
                New thread(9)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  11ms start(0ms) executePlan Execute.cpp:3344
                    11ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                End thread(9)
          0ms start(28ms) collectAllDeviceResults Execute.cpp:2612
            0ms start(28ms) reduceMultiDeviceResults Execute.cpp:1164
              0ms start(28ms) reduceMultiDeviceResultSets Execute.cpp:1231
                0ms start(28ms) finalizeAggregates ResultSet.cpp:1213
          15ms start(28ms) compileWorkUnit NativeCodegen.cpp:1423
            0ms start(29ms) markDeadRuntimeFuncs NativeCodegen.cpp:812
            4ms start(29ms) generateNativeCPUCode Backend.cpp:95
              3ms start(29ms) optimize_ir HelperFunctions.cpp:163
          14ms start(44ms) compileWorkUnit NativeCodegen.cpp:1423
            0ms start(45ms) markDeadRuntimeFuncs NativeCodegen.cpp:812
            3ms start(45ms) generateNativeCPUCode Backend.cpp:95
              3ms start(45ms) optimize_ir HelperFunctions.cpp:163
          0ms start(58ms) ExecutionKernel::run ExecutionKernel.cpp:126
          0ms start(58ms) fetchChunks Execute.cpp:2975
              New thread(10)
                0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                0ms start(0ms) fetchChunks Execute.cpp:2975
                0ms start(0ms) create QueryExecutionContext.cpp:94
                357ms start(0ms) executePlan Execute.cpp:3344
                  357ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                  0ms start(358ms) getRowSet QueryExecutionContext.cpp:192
              End thread(10)
          0ms start(58ms) create QueryExecutionContext.cpp:94
              New thread(11)
                0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                0ms start(0ms) fetchChunks Execute.cpp:2975
                0ms start(0ms) create QueryExecutionContext.cpp:94
                326ms start(0ms) executePlan Execute.cpp:3344
                  326ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                  0ms start(326ms) getRowSet QueryExecutionContext.cpp:192
              End thread(11)
              New thread(12)
                0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                0ms start(0ms) fetchChunks Execute.cpp:2975
                0ms start(0ms) create QueryExecutionContext.cpp:94
                382ms start(0ms) executePlan Execute.cpp:3344
                  382ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                  0ms start(383ms) getRowSet QueryExecutionContext.cpp:192
              End thread(12)
              New thread(13)
                0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                0ms start(0ms) fetchChunks Execute.cpp:2975
                0ms start(0ms) create QueryExecutionContext.cpp:94
                349ms start(0ms) executePlan Execute.cpp:3344
                  349ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                  0ms start(349ms) getRowSet QueryExecutionContext.cpp:192
              End thread(13)
            New thread(14)
              0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
              0ms start(0ms) fetchChunks Execute.cpp:2975
              0ms start(0ms) create QueryExecutionContext.cpp:94
              379ms start(0ms) executePlan Execute.cpp:3344
                378ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                0ms start(379ms) getRowSet QueryExecutionContext.cpp:192
            End thread(14)
            New thread(15)
              0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
              0ms start(0ms) fetchChunks Execute.cpp:2975
              0ms start(0ms) create QueryExecutionContext.cpp:94
              308ms start(0ms) executePlan Execute.cpp:3344
                308ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                0ms start(308ms) getRowSet QueryExecutionContext.cpp:192
            End thread(15)
            New thread(16)
              0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
              0ms start(0ms) fetchChunks Execute.cpp:2975
              0ms start(0ms) create QueryExecutionContext.cpp:94
              376ms start(0ms) executePlan Execute.cpp:3344
                376ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                0ms start(376ms) getRowSet QueryExecutionContext.cpp:192
            End thread(16)
            New thread(17)
              0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
              0ms start(0ms) fetchChunks Execute.cpp:2975
              0ms start(0ms) create QueryExecutionContext.cpp:94
              376ms start(0ms) executePlan Execute.cpp:3344
                376ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                0ms start(376ms) getRowSet QueryExecutionContext.cpp:192
            End thread(17)
            New thread(18)
              0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
              0ms start(0ms) fetchChunks Execute.cpp:2975
              0ms start(0ms) create QueryExecutionContext.cpp:94
              380ms start(0ms) executePlan Execute.cpp:3344
                380ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                0ms start(380ms) getRowSet QueryExecutionContext.cpp:192
            End thread(18)
          235ms start(59ms) executePlan Execute.cpp:3344
            234ms start(59ms) launchCpuCode QueryExecutionContext.cpp:565
            0ms start(294ms) getRowSet QueryExecutionContext.cpp:192
          0ms start(442ms) resultsUnion Execute.cpp:1119
          0ms start(442ms) put ResultSetRegistry.cpp:102
[2023-10-12 21:14:05.293148] [0x00007f75b3f12740] [info]    0 0 RelAlgExecutor.cpp:162 DEBUG_TIMER thread_id(0)
11159ms total duration for executeRelAlgQuery
  11159ms start(0ms) executeRelAlgQueryNoRetry RelAlgExecutor.cpp:218
    11042ms start(0ms) Query pre-execution steps RelAlgExecutor.cpp:219
      1609ms start(0ms) synthesizeMetadata ResultSetMetadata.cpp:74
      1041ms start(1610ms) synthesizeMetadata ResultSetMetadata.cpp:74
      1106ms start(2652ms) synthesizeMetadata ResultSetMetadata.cpp:74
      1163ms start(3758ms) synthesizeMetadata ResultSetMetadata.cpp:74
      879ms start(4922ms) synthesizeMetadata ResultSetMetadata.cpp:74
      1249ms start(5802ms) synthesizeMetadata ResultSetMetadata.cpp:74
      1225ms start(7051ms) synthesizeMetadata ResultSetMetadata.cpp:74
      1135ms start(8276ms) synthesizeMetadata ResultSetMetadata.cpp:74
      914ms start(9412ms) synthesizeMetadata ResultSetMetadata.cpp:74
      716ms start(10326ms) synthesizeMetadata ResultSetMetadata.cpp:74
    116ms start(11042ms) execute RelAlgExecutor.cpp:386
      116ms start(11042ms) executeStep RelAlgExecutor.cpp:935
        116ms start(11042ms) executeWorkUnit RelAlgExecutor.cpp:1429
          20ms start(11042ms) compileWorkUnit NativeCodegen.cpp:1423
            0ms start(11044ms) markDeadRuntimeFuncs NativeCodegen.cpp:812
            5ms start(11044ms) generateNativeCPUCode Backend.cpp:95
              5ms start(11044ms) optimize_ir HelperFunctions.cpp:163
          0ms start(11063ms) ExecutionKernel::run ExecutionKernel.cpp:126
          0ms start(11063ms) fetchChunks Execute.cpp:2975
          0ms start(11063ms) create QueryExecutionContext.cpp:94
              New thread(12)
                0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                0ms start(0ms) fetchChunks Execute.cpp:2975
                0ms start(0ms) create QueryExecutionContext.cpp:94
                95ms start(0ms) executePlan Execute.cpp:3344
                  95ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                  0ms start(95ms) getRowSet QueryExecutionContext.cpp:192
              End thread(12)
              New thread(18)
                0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                0ms start(0ms) fetchChunks Execute.cpp:2975
                0ms start(0ms) create QueryExecutionContext.cpp:94
                95ms start(0ms) executePlan Execute.cpp:3344
                  95ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                  0ms start(95ms) getRowSet QueryExecutionContext.cpp:192
              End thread(18)
          40ms start(11063ms) executePlan Execute.cpp:3344
            40ms start(11063ms) launchCpuCode QueryExecutionContext.cpp:565
                New thread(17)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  95ms start(0ms) executePlan Execute.cpp:3344
                    95ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                    0ms start(95ms) getRowSet QueryExecutionContext.cpp:192
                End thread(17)
                New thread(13)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  95ms start(0ms) executePlan Execute.cpp:3344
                    95ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                    0ms start(95ms) getRowSet QueryExecutionContext.cpp:192
                End thread(13)
                New thread(7)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  75ms start(0ms) executePlan Execute.cpp:3344
                    74ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                    0ms start(75ms) getRowSet QueryExecutionContext.cpp:192
                End thread(7)
                New thread(15)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  95ms start(0ms) executePlan Execute.cpp:3344
                    95ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                    0ms start(95ms) getRowSet QueryExecutionContext.cpp:192
                End thread(15)
                New thread(19)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  83ms start(0ms) executePlan Execute.cpp:3344
                    83ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                    0ms start(83ms) getRowSet QueryExecutionContext.cpp:192
                End thread(19)
                New thread(14)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  95ms start(0ms) executePlan Execute.cpp:3344
                    95ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                    0ms start(95ms) getRowSet QueryExecutionContext.cpp:192
                End thread(14)
                New thread(10)
                  0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:126
                  0ms start(0ms) fetchChunks Execute.cpp:2975
                  0ms start(0ms) create QueryExecutionContext.cpp:94
                  75ms start(0ms) executePlan Execute.cpp:3344
                    75ms start(0ms) launchCpuCode QueryExecutionContext.cpp:565
                    0ms start(76ms) getRowSet QueryExecutionContext.cpp:192
                End thread(10)
            0ms start(11103ms) getRowSet QueryExecutionContext.cpp:192
          0ms start(11158ms) collectAllDeviceResults Execute.cpp:2612
            0ms start(11158ms) reduceMultiDeviceResults Execute.cpp:1164
              0ms start(11158ms) reduceMultiDeviceResultSets Execute.cpp:1231
                0ms start(11159ms) finalizeAggregates ResultSet.cpp:1213
          0ms start(11159ms) put ResultSetRegistry.cpp:102
Projected table time: 11.159830570220947

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants