Skip to content
This repository has been archived by the owner on May 9, 2024. It is now read-only.

Joins fail in heterogeneous mode #577

Open
kurapov-peter opened this issue Jul 7, 2023 · 0 comments
Open

Joins fail in heterogeneous mode #577

kurapov-peter opened this issue Jul 7, 2023 · 0 comments

Comments

@kurapov-peter
Copy link
Contributor

kurapov-peter commented Jul 7, 2023

To reproduce the case when join fails when enable_heterogeneous=True, run the following python script:

import pyhdk 

config = pyhdk.buildConfig(enable_heterogeneous=True,
                           force_heterogeneous_distribution=False)
pyhdk.initLogger(log_severity="DEBUG2")
storage = pyhdk.storage.ArrowStorage(1)
data_mgr = pyhdk.storage.DataMgr(config)
data_mgr.registerDataProvider(storage)
calcite = pyhdk.sql.Calcite(storage, config)
executor = pyhdk.Executor(data_mgr, config)

table_1_name = "taxi"
table_2_name = "numbers"
# Assuming you are in hdk/examples/
storage.importCsvFile("../omniscidb/Tests/ArrowStorageDataFiles/taxi_sample_header.csv", table_1_name, pyhdk.storage.TableOptions(5))
storage.importCsvFile("../omniscidb/Tests/ArrowStorageDataFiles/numbers_header.csv", table_2_name, pyhdk.storage.TableOptions(2))

# Perfect hash table OneToOne
sql = f"SELECT * FROM {table_1_name} a JOIN {table_1_name} b ON a.trip_id = b.trip_id"
# sql = f"SELECT * FROM {table_1_name} a JOIN {table_2_name} b ON a.trip_id = b.col1"

ra = calcite.process(sql)
rel_alg_executor = pyhdk.sql.RelAlgExecutor(executor, storage, data_mgr, ra)
res = rel_alg_executor.execute().to_arrow()

A simple hash join on primary key that is done via a perfect hash table is crashing. The output of gdb is not very informative with regards to the location, but seems to nullptr related:

Thread 1 "python3" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352709952) at ./nptl/pthread_kill.c:44
...
#8  0x00007fff5ce5f0ae in VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*) ()
   from .../anaconda3/envs/omnisci-dev/lib/jvm/lib/server/libjvm.so
#9  0x00007fff5cd06d69 in JVM_handle_linux_signal ()
   from .../anaconda3/envs/omnisci-dev/lib/jvm/lib/server/libjvm.so
#10 <signal handler called>
#11 0x00007ffff7e3e354 in ?? ()
#12 0x00007ffff7e3e2a0 in ?? ()
#13 0x0000555557cd67d0 in ?? ()
#14 0x0000555557be5060 in ?? ()
#15 0x0000000000000002 in ?? ()
#16 0x0000000000000001 in ?? ()
#17 0x0000000000000000 in ?? ()

The last lines DEBUG2 logs show (shortened):

2023-07-12T08:08:41.525587 1 1573099 0 0 NvidiaKernel.cpp:154 Generated GPU binary code size: 469152 bytes
2023-07-12T08:08:41.526317 1 1573099 0 0 Execute.cpp:2881 Launching 1 kernels for query on: 
2023-07-12T08:08:41.526336 1 1573099 0 0 Execute.cpp:2883 	0 &CPU.
2023-07-12T08:08:41.526623 2 1573099 0 0 Execute.cpp:3476 bool(ra_exe_unit.union_all)=false ra_exe_unit.input_descs=(InputDescriptor(table_id(1),nest_level(0)) InputDescriptor(table_id(1),nest_level(1))) ra_exe_unit.input_col_descs=(InputColDescriptor(table_id=1, nest_level=0, col_id=1000) InputColDescriptor(table_id=1, nest_level=0, col_id=1001) ... MANY COL IDS ... 
ra_exe_unit.scan_limit=0 num_rows=((20 20)) frag_offsets=((0 0)) query_exe_context->query_buffers_->num_rows_=-1 query_exe_context->query_mem_desc_.getEntryCount()=1 device_id=0 outer_table_id=-1 scan_limit=-1 start_rowid=0 num_tables=2

The same error happens when we try to join on a different table (you can use the commented sql).


Interestingly, sometimes in ipython notebook the kernel crashes with the following last log lines:

2023-07-12T08:20:16.399125 W 1577588 0 0 Backend.cpp:833 Failed to generate PTX: NVVM IR ParseError: generatePTX: invalid redefinition of function 'pi'
declare double @pi();
               ^
. Switching to CPU execution target.
2023-07-12T08:20:16.399529 F 1577588 0 0 RelAlgExecutor.cpp:433 Check failed: co.device_type == ExecutorDeviceType::GPU 
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant