Skip to content

gurunrao/spark-oracle

 
 

Repository files navigation

Spark_On_Oracle

  • Currently, data lakes comprising Oracle Data Warehouse and Apache Spark have these characteristics:
    • They have separate data catalogs, even if they access the same data in an object store.
    • Applications built entirely on Spark have to compensate for gaps in data management.
    • Applications that federate across Spark and Oracle usually suffer from inefficient data movement.
    • Operating Spark clusters are expensive because they lack administration tooling and they have gaps in data management. Therefore, price-performance advantages of Spark are overstated.

current deployments

This project fixes those issues:

  • It provides a single catalog: Oracle Data Dictionary.
  • Oracle is responsible for data management, including:
    • Consistency
    • Isolation
    • Security
    • Storage layout
    • Data lifecycle
    • Data in an object store managed by Oracle as external tables
  • It provides support for a full Spark programming model.
  • Spark on Oracle has these characteristics:
    • Full pushdown on SQL workloads: Query, DML on all tables, DDL for external tables.
    • Push SQL operations of other workloads.
    • Surface Oracle capabilities like machine learning and streaming in the Spark programming model.
    • Co-processor on Oracle instances to run certain kinds of Scala code. Co-processors are isolated and limited and therefore are easy to manage.
  • Enable simpler, smaller Spark clusters.

spark on oracle

Feature summary:

See Project Wiki for complete documentation.

Installation

Spark on Oracle can be deployed on any Spark 3.1 or above environment. See the Quick Start Guide.

Documentation

See the wiki.

Examples

The demo script walks you through the features of the library.

Help

Please file Github issues.

Contributing

This project welcomes contributions from the community. Before submitting a pull request, please review our contribution guide.

Security

Please consult the security guide for our responsible security vulnerability disclosure process.

License

Copyright (c) 2022 Oracle and/or its affiliates.

Released under the Universal Permissive License v1.0 as shown at https://oss.oracle.com/licenses/upl/.

About

On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 99.3%
  • Shell 0.7%