Skip to content
This repository has been archived by the owner on Nov 8, 2020. It is now read-only.

austincv/RImpala

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#RImpala

RImpala is an R package that helps you to connect and execute distributed queries using Cloudera Impala. Impala supports jdbc integration and this feature is used by RImpala to establish a connection between R and Impala.

##Installating RImpala

To use this package you must also have access to a Hadoop cluster running Cloudera Impala with at least one populated table defined in the Hive Metastore.

###Install JDBC jars for RImpala

  • Download the Impala JDBC zip fileto the client machine that you will use to connect to Impala servers.
  • Extract the contents of the zip file to a location of your choosing. For example:
    • On Linux, you might extract this to a location such as /opt/jars/.
    • On Windows, you might extract this to a folder such as C:\Program Files\impala-jars.
  • We will use this location in rimpala.init()

###Install RImpala

  1. Compressed package: R CMD INSTALL RImpala-0.1.6.tar.gz

  2. Source code: R CMD INSTALL ./RImpala ##Loading RImpala and connecting to Impala

  3. Find the ip of the machine and the port where the Impala service is running.

  4. Find the location where you have unziped the JDBC jars in the above section.

  5. Launch R

  6. library("RImpala") rimpala.init(libs="/path/to/JDBC/jars/") result = rimpala.query("your query"); by default rimpala.init() searches "/usr/lib/impala" for the JDBC jars.

Here are links to more information on Cloudera Impala:

##Requirements

About

RImpala is an R package that helps you to connect and execute distributed queries using Cloudera Impala

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 53.4%
  • Java 46.0%
  • Shell 0.6%