Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance #8

Open
daroczig opened this issue Nov 11, 2015 · 3 comments
Open

performance #8

daroczig opened this issue Nov 11, 2015 · 3 comments

Comments

@daroczig
Copy link

Fetching more than 100 rows seem to be rather slow compared to RJDBC (with the same jar files) or exporting data to CSV via impala-shell, so I suspect there might be some performance issues when converting the results returned by the JDBC driver to R. See some related benchmarks at http://datascience.la/r-and-impala-its-better-to-kiss-than-using-java

@austincv
Copy link
Contributor

Thanks for letting me know. Yes, I agree with you. I think a huge amount of time is spent converting the JDBC result set to R data types. I assume giving an option in the function to return the JDBC result as such and letting the user figure out how to use it will give similar data fetch times as RJDBC. In our use case we didn't really need to transfer huge data sets back to R which was probably why we overlooked this. I am a bit tied up right now. I will look into it and get back to you late next week.
Brilliant blog post by the way :)

@daroczig
Copy link
Author

Thank you very much, @austincv, and keep up the good work!

@ankurmitujjain
Copy link

Hello Team,

Any update on this?

Or please let me know direction, I can push pull request for this....

Thanks
Ankur

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants