Skip to content

Mu-Sigma/RImpala

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RImpala

RImpala is an R package that helps you to connect and execute distributed queries using Cloudera Impala. Impala supports jdbc integration and this feature is used by RImpala to establish a connection between R and Impala.

Installating RImpala

To use this package you must also have access to a Hadoop cluster running Cloudera Impala with at least one populated table defined in the Hive Metastore.

Install RImpala

  1. Clone the repository
  2. The Impala JDBC zip file present in the repository is required by the client machine to connect to Impala Servers.
    • Extract the contents of the zip file to a location of your choosing. For example:
      • On Linux, you might extract this to a location such as /opt/jars/.
      • On Windows, you might extract this to a folder such as C:\Program Files\impala-jars.
    • We will use this location in rimpala.init()
  3. Extract the package installer by decompressing the contents of RImpala-0.1.6.tar.gz present inside install directory
    • tar -xvf install/RImpala_0.1.6.tar.gz
  4. Then Install the package using the following command:
  • R CMD INSTALL ./RImpala

Loading RImpala and connecting to Impala

  1. Find the ip of the machine and the port where the Impala service is running.
  2. Find the location where you have unziped the JDBC jars in the above section.
  3. Launch R
  4. library("RImpala") rimpala.init(libs="/path/to/JDBC/jars/") result = rimpala.query("your query"); by default rimpala.init() searches "/usr/lib/impala" for the JDBC jars.

Here are links to more information on Cloudera Impala:

Requirements

About

RImpala is an R package that helps you to connect and execute distributed queries using Cloudera Impala

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •