Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EEL DSL for a CLI shell #211

Open
hannesmiller opened this issue Jan 12, 2017 · 2 comments
Open

EEL DSL for a CLI shell #211

hannesmiller opened this issue Jan 12, 2017 · 2 comments

Comments

@hannesmiller
Copy link
Contributor

hannesmiller commented Jan 12, 2017

EEL DSL for a CLI shell

  • A Scala DSL for EEL commands.
  • The Scala REPL to be used as an interactive shell
  • Scala variables, loops and conditional statements can be combined with DSL commands to easily script tasks
  • A bootstrap shell script called eel-shell similar to spark-shell - will automatically import the eel DSL packages
  • Also automatically import OS packages to allow for OS commands to be run and combined with the DSL

Import

Import data with options from an EEL source to a sink

import from jdbc
with driver=blah,url=blah,sql=blah
to hive
with db=blah,table=blah

Import from jdbc
with driver=blah,url=blah,sql=blah
to Parquet
with path=blah

Export

Export data with options to an EEL sink from a source

export to hive
with db=blah,table=blah
from jdbc
with driver=blah,url=blah,sql=blah

export to hive
with db=blah,table=blah
from Parquet
with path=blah

  • Note allow a transform sub-command at the appropriate place to do custom transformations on the underlying rows in the frame.

More commands to follow....

@hannesmiller
Copy link
Contributor Author

DDL

Display the DDL from create table command for JDBC query:

ddl from JDBC
with driver=blah,
url=jdbc:blah,
sql=blah,
dialect=Parquet,
location=blah,
sql=blah
partitions=p1:string,p2:int

  • partitions are specified using the notation name:type

@hannesmiller
Copy link
Contributor Author

File Compaction

Reduce several small files into a single file for a folder or hive table.

Note for Parquet or other similar types the new file's schema should be the union of all schemas.

In addition you may have to add columns to the HiveMetasore.

  • compact Parquet path
  • compact Orc path
  • compact Hive dbName table

@sksamuel sksamuel self-assigned this Feb 8, 2017
@sksamuel sksamuel added this to the 1.3 milestone Feb 8, 2017
@sksamuel sksamuel modified the milestones: 1.3, 1.5 Oct 13, 2017
@garyfrost garyfrost modified the milestones: 1.5, 1.4 Feb 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants