Intermitted SIGSEGV errors crashing heavyDB #819

anirudh-here-com · 2023-12-08T13:43:47Z

Version: 6.4.0
While running some queries against heavydb, SIGSEGV errors occur randomly causing the DB to crash and create outages.
Any way to debug/fix this?
HeavyDB.cpp:332 Interrupt signal (11) received.

cdessanti · 2023-12-09T13:52:36Z

Could you please share the product logs? The logs can be found in the storage directory, typically located at /var/log/heavyai/storage/log. They are named as heavydb.INFO.*

It is essential to check the logs to identify the problem. Is there a specific reason why you're using version 6.4 when versions 7.0 and 7.1 are available?

anirudh-here-com · 2023-12-11T12:05:41Z

I have done detailed analysis of this issue and found the issue..
This happens when I do a select_ipc_gpu on the database and it returns 0 records..

anirudh-here-com · 2023-12-11T13:32:33Z

This can be easily replicated by using heavyai lib's select_ipc_gpu function to

import heavyai
conn=heavyai.connect(user=<user>, password=<pass>, dbname=<dbname>)
conn.select_ipc_gpu(<any select query which returns 0 rows>)
//thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

The reason we're using 6.4 is because we have some custom patches for our usecases

Is this fixed on the latest version 7.1?
If so, we might migration to the latest version

Thanks,

cdessanti · 2023-12-11T14:27:39Z

Hi,

Thanks for reporting the issue. I will try to reproduce it on our end. If I am successful, I will create an internal case for our engineering team to fix the problem.

Can you try running your application without using the GPU-shared memory as a temporary solution?

Also, I am interested in your modifications to the database to support your application. Could you please share what they are doing?

Best regards,
Candido

anirudh-here-com · 2023-12-12T05:02:53Z

Thanks for your reply.
Unfortunately using gpu shared memory is required and cannot be discarded.
Regarding the modifications, I plan to raise a pull request for the same.

Please let me know if you're able to replicate the issue on your end.
Thanks,
Anirudh

cdessanti · 2023-12-12T14:37:30Z

Hi,

Using CUDA 11.8 and the latest version of GA (7.2.1), I was able to reproduce the issue on my end. I have created an internal ticket to resolve the issue.

I'll come back here whenthe problem is fixed. (

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermitted SIGSEGV errors crashing heavyDB #819

Intermitted SIGSEGV errors crashing heavyDB #819

anirudh-here-com commented Dec 8, 2023

cdessanti commented Dec 9, 2023

anirudh-here-com commented Dec 11, 2023

anirudh-here-com commented Dec 11, 2023 •

edited

cdessanti commented Dec 11, 2023

anirudh-here-com commented Dec 12, 2023

cdessanti commented Dec 12, 2023

Intermitted SIGSEGV errors crashing heavyDB #819

Intermitted SIGSEGV errors crashing heavyDB #819

Comments

anirudh-here-com commented Dec 8, 2023

cdessanti commented Dec 9, 2023

anirudh-here-com commented Dec 11, 2023

anirudh-here-com commented Dec 11, 2023 • edited

cdessanti commented Dec 11, 2023

anirudh-here-com commented Dec 12, 2023

cdessanti commented Dec 12, 2023

anirudh-here-com commented Dec 11, 2023 •

edited