diff --git a/.readme-partials.yaml b/.readme-partials.yaml index 275ee9ae..2c506be2 100644 --- a/.readme-partials.yaml +++ b/.readme-partials.yaml @@ -1,51 +1,44 @@ custom_content: | ## Requirements - ### Enable the PubSub Lite API + ### Creating a new subscription or using an existing subscription - Follow [these instructions](https://cloud.google.com/pubsub/lite/docs/quickstart#before-you-begin). + Follow [the instruction](https://cloud.google.com/pubsub/lite/docs/quickstart#create_a_lite_subscription) to create a new subscription or use an existing subscription. If using an existing subscription, the connector will read from the oldest unacknowledged message in the subscription. - ### Create a new subscription or use existing subscription + ### Creating a Google Cloud Dataproc cluster (Optional) - Follow [the instruction](https://cloud.google.com/pubsub/lite/docs/quickstart#create_a_lite_subscription) to create a new - subscription or use existing subscription. If using existing subscription, the connector will read message from the - oldest unacknowledged. + If you do not have an Apache Spark environment, you can create a [Cloud Dataproc](https://cloud.google.com/dataproc/docs) cluster with pre-configured auth. The following examples assume you are using Cloud Dataproc, but you can use `spark-submit` on any cluster. - ### Create a Google Cloud Dataproc cluster (Optional) - - If you do not have an Apache Spark environment you can create a Cloud Dataproc cluster with pre-configured auth. The following examples assume you are using Cloud Dataproc, but you can use `spark-submit` on any cluster. - - ``` - MY_CLUSTER=... - gcloud dataproc clusters create "$MY_CLUSTER" - ``` + ``` + MY_CLUSTER=... + gcloud dataproc clusters create "$MY_CLUSTER" + ``` ## Downloading and Using the Connector - The latest version connector of the connector (Scala 2.11) is publicly available in - gs://spark-lib/pubsublite/spark-pubsublite-latest.jar. + The latest version connector of the connector (Scala 2.11) will be publicly available in `gs://spark-lib/pubsublite/spark-pubsublite-latest.jar`. - The connector is also available from the Maven Central - repository. It can be used using the `--packages` option or the - `spark.jars.packages` configuration property. Use the following value + The connector will also be available from the Maven Central repository. It can be used using the `--packages` option or the `spark.jars.packages` configuration property. - | Scala version | Connector Artifact | - | --- | --- | - | Scala 2.11 | `com.google.cloud.pubsublite.spark:pubsublite-spark-sql-streaming-with-dependencies_2.11:0.1.0` | + - ## Usage + ## Usage - ### Reading data from PubSub Lite + ### Reading data from Pub/Sub Lite - ``` + ```python df = spark.readStream \ - .option("pubsublite.subscription", "projects/123456789/locations/us-central1-a/subscriptions/test-spark-subscription") - .format("pubsublite") \ - .load + .option("pubsublite.subscription", "projects/$PROJECT_NUMBER/locations/$LOCATION/subscriptions/$SUBSCRIPTION_ID") + .format("pubsublite") \ + .load ``` Note that the connector supports both MicroBatch Processing and [Continuous Processing](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#continuous-processing). @@ -76,51 +69,31 @@ custom_content: | | publish_timestamp | TimestampType | | | event_timestamp | TimestampType | Nullable | - ## Compiling with the connector - - To include the connector in your project: - - ### Maven - - ```xml - - com.google.cloud.pubsublite.spark - pubsublite-spark-sql-streaming-with-dependencies_2.11 - 0.1.0 - - ``` - - ### SBT - - ```sbt - libraryDependencies += "com.google.cloud.pubsublite.spark" %% "pubsublite-spark-sql-streaming-with-dependencies_2.11" % "0.1.0" - ``` - ## Building the Connector - The connector is built using Maven. Following command creates a jar with shaded dependencies: + The connector is built using Maven. Following command creates a JAR file with shaded dependencies: - ``` + ```sh mvn package ``` - ## FAQ + ## FAQ - ### What is the Pricing for the PubSub Lite? + ### What is the cost for the Pub/Sub Lite? - See the [PubSub Lite pricing documentation](https://cloud.google.com/pubsub/lite/pricing). + See the [Pub/Sub Lite pricing documentation](https://cloud.google.com/pubsub/lite/pricing). - ### Can I configure the number of spark partitions? + ### Can I configure the number of Spark partitions? - No, the number of spark partitions is set to be the number of PubSub Lite partitions of the topic that the supplied subscription is for. + No, the number of Spark partitions is set to be the number of Pub/Sub Lite partitions of the topic that the subscription is attached to. - ### How do I authenticate outside GCE / Dataproc? + ### How do I authenticate outside Cloud Compute Engine / Cloud Dataproc? - Use a service account JSON key and `GOOGLE_APPLICATION_CREDENTIALS` as described [here](https://cloud.google.com/docs/authentication/getting-started). + Use a service account JSON key and `GOOGLE_APPLICATION_CREDENTIALS` as described [here](https://cloud.google.com/docs/authentication/getting-started). - Credentials can be provided with `gcp.credentials.key` option, it needs be passed in as a base64-encoded string directly. + Credentials can be provided with `gcp.credentials.key` option, it needs to be passed in as a base64-encoded string. Example: - ``` + ```java spark.readStream.format("pubsublite").option("gcp.credentials.key", "") - ``` \ No newline at end of file + ```