Java py4j gateway server for trino queries still be open even after query finishes and in later makes queris Hang #3223
Labels
BUG
Issue type for reporting failure due to bug in functionality
good first issue
This issue is suitable for beginner developers or first time contributors
Is there an existing issue for this?
Description
The default connector for trino is jdbc as there is no sqlalchemy support as per my knowledge from sqlalchemy community.
When running hue with multiple clients, the jdbc class calls the py4j server, which still exist it does not close after we get the results, this leads to memory usage and hence the queries get slower as time passes and after certain time it still hangs . eg
For executing the trino query - code flow goes like this -
I have added various debugging points , to check where the bottleneck is
the bottle neck is this point -
data = curs.fetchmany(n) this line is the bottleneck usually.
The issue should be solved if add the following lines in the close fxn of this file in jdbc class -
desktop/libs/librdbms/src/librdbms/jdbc.py
After adding these lines the child process of py4j gets killed. Verified by the ps auxxx and pstree command.
This can be a good first issue to solve to raise pr . Nowadays work on Hadoop so not that band width to raise here . Just trying to contribute to oss. This flow is also same for presto.
@Harshg999 @bjornalm
Regards
Vinay Devadiga
Steps To Reproduce
As stated in descritption use trino with hue , create multiple hue clients and fire huge trino queries . In some time, the py4j servers will take the memory , hence queries get hangs.
Logs
Attached above.
Hue version
Open Source 4.10
The text was updated successfully, but these errors were encountered: