GitHub - nwrs/spark-sql-server: Query Parquet backed tables in Spark using SQL over JDBC . Manage with REST.

Spark SQL Server

A Spark driver application with an embedded Hive Thrift server facilitating the querying of Parquet files as tables via a JDBC SQL endpoint.

Runs as a standalone local Spark application, a spark driver application or is submittable to a Spark cluster.
Exposes a JDBC SQL endpoint using an embedded Hive Thrift server.
Registers Parquet files as tables to allow SQL querying by non Spark applications over JDBC.
Simple HTTP REST interface for table registration, de-registration and listing of registered tables.

Motivation: To allow non Spark based applications fluent in SQL to query Parquet backed tables in Spark over JDBC with the minimum of fuss.

Build

$ git clone https://github.com/nwrs/spark-sql-server.git
$ cd spark-sql-server
$ mvn clean install

Command Line Options

$ java -jar spark-sql-server-1.0-SNAPSHOT-packaged.jar --help

Custom Spark Hive SQL Server [github.com/nwrs/spark-sql-server]

Usage:

  -n, --appName  <app_name>                        Spark application name.
  -j, --jdbcPort  <n>                              Hive JDBC endpoint port, defaults to 10000.
  -m, --master  <spark://host:port>                Spark master, defaults to 'local[*]'.
  -p, --restPort  <n>                              Rest port, defaults to 8181.
  -k, --sparkOpts  <opt=value,opt=value,...>       Additional Spark\Hive options.
  -t, --tableConfig  </path/to/TableConfig.conf>   Table configuration file.
  -h, --help                                       Show help message

Running as Standalone Spark Driver Application

$ java -jar spark-sql-server-1.0-SNAPSHOT-packaged.jar \
    --appName ParquetSQLServer \
    --master spark://spark-master-server:7077 \
    --jdbcPort 10001 \
    --restPort 8181 \
    --sparkOpts spark.executor.memory=4g, spark.default.parallism=50 \
    --tableConfig /Users/nwrs/parquet-tables.config

Example Table Config

$ cat parquet-tables.config

#Example table config file
tweets=hdfs://localhost:9000/tweets/tweets.parquet
users=hdfs://localhost:9000/users/users.parquet

REST Interface

Verb	Path	Action	Request Body	Response Body	Success Code
GET	/api/v1/tables	List all registered tables	None	JSON	200
GET	/api/v1/table/{table}	Get registration for a table	None	JSON	200
DELETE	/api/v1/table/{table}	De-register (drop) a table	None	None	202
POST	/api/v1/table	Register a table	JSON	None	202

Example JSON to register a table via POST:

{
    "table": "my_table_name",
    "file": "hdfs://hadoop-server:9000/data/tweet/tweets.parquet"
}

Connecting to the JDBC Endpoint

The default JDBC endpoint connection string is: "jdbc:hive2://localhost:10000/default"
Driver class name: org.apache.hive.jdbc.HiveDriver
Use at least version 2.1.0 of the Hive driver jar file, available here
In some circumstances hadoop-common-xxx.jar may required on the class path, choose the correct version to match your HDFS installation here

See here for more information on connecting to Hive via the JDBC driver.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
src		src
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

README.md

README.md

pom.xml

pom.xml

Repository files navigation

Spark SQL Server

Build

Command Line Options

Running as Standalone Spark Driver Application

Example Table Config

REST Interface

Connecting to the JDBC Endpoint

About

Releases

Packages

Languages

nwrs/spark-sql-server

Folders and files

Latest commit

History

Repository files navigation

Spark SQL Server

Build

Command Line Options

Running as Standalone Spark Driver Application

Example Table Config

REST Interface

Connecting to the JDBC Endpoint

About

Topics

Resources

Stars

Watchers

Forks

Languages