Skip to content

Commit

Permalink
docs: Update readme about jar name. (#73)
Browse files Browse the repository at this point in the history
* update

* update
  • Loading branch information
jiangmichaellll committed Feb 11, 2021
1 parent b0eb9e0 commit 7f24f1e
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 15 deletions.
2 changes: 1 addition & 1 deletion .readme-partials.yaml
Expand Up @@ -22,7 +22,7 @@ custom_content: |
<!--
| Scala version | Connector Artifact |
| --- | --- |
| Scala 2.11 | `com.google.cloud.pubsublite.spark:pubsublite-spark-sql-streaming-with-dependencies_2.11:0.1.0` |
| Scala 2.11 | `com.google.cloud.pubsublite.spark:pubsublite-spark-sql-streaming:0.1.0:with-dependencies` |
-->
<!--- TODO(jiangmichael): Add exmaple code and brief description here -->
Expand Down
34 changes: 21 additions & 13 deletions samples/README.md
Expand Up @@ -19,15 +19,24 @@ PARTITIONS=1 # or your number of partitions to create
CLUSTER_NAME=waprin-spark7 # or your Dataproc cluster name to create
BUCKET=gs://your-gcs-bucket
SUBSCRIPTION_PATH=projects/$PROJECT_NUMBER/locations/$REGION-$ZONE_ID/subscriptions/$SUBSCRIPTION_ID
PUBSUBLITE_SPARK_SQL_STREAMING_JAR_LOCATION= # downloaded pubsublite-spark-sql-streaming-with-dependencies jar location
CONNECTOR_VERSION= # latest pubsublite-spark-sql-streaming release version
PUBSUBLITE_SPARK_SQL_STREAMING_JAR_LOCATION= # downloaded pubsublite-spark-sql-streaming-$CONNECTOR_VERSION-with-dependencies jar location
```

## Running word count sample

To run the word count sample in Dataproc cluster, follow the steps:

1. `cd samples/`
2. Create the topic and subscription, and publish word count messages to the topic.
2. Set the current sample version.
```sh
SAMPLE_VERSION=$(mvn -q \
-Dexec.executable=echo \
-Dexec.args='${project.version}' \
--non-recursive \
exec:exec)
```
3. Create the topic and subscription, and publish word count messages to the topic.
```sh
PROJECT_NUMBER=$PROJECT_NUMBER \
REGION=$REGION \
Expand All @@ -37,32 +46,31 @@ To run the word count sample in Dataproc cluster, follow the steps:
PARTITIONS=$PARTITIONS \
mvn compile exec:java -Dexec.mainClass=pubsublite.spark.PublishWords
```
3. Create a Dataproc cluster
4. Create a Dataproc cluster
```sh
gcloud dataproc clusters create $CLUSTER_NAME --region=$REGION --zone=$REGION-$ZONE_ID --image-version=1.5-debian10 --scopes=cloud-platform
```
4. Package sample jar
5. Package sample jar
```sh
mvn clean package -Dmaven.test.skip=true
```
<!-- TODO: set up bots to update jar version, also provide link to maven central -->
5. Download `pubsublite-spark-sql-streaming-with-dependencies-0.1.0.jar` from Maven Central and set `PUBSUBLITE_SPARK_SQL_STREAMING_JAR_LOCATION` environment variable.
<!-- TODO: set up bots to update jar version -->
6. Create GCS bucket and upload both `pubsublite-spark-sql-streaming-with-dependencies-0.1.0.jar` and the sample jar onto GCS
<!-- TODO: provide link to maven central -->
6. Download `pubsublite-spark-sql-streaming-$CONNECTOR_VERSION-with-dependencies.jar` from Maven Central and set `PUBSUBLITE_SPARK_SQL_STREAMING_JAR_LOCATION` environment variable.
7. Create GCS bucket and upload both `pubsublite-spark-sql-streaming-$CONNECTOR_VERSION-with-dependencies.jar` and the sample jar onto GCS
```sh
gsutil mb $BUCKET
gsutil cp snapshot/target/pubsublite-spark-snapshot-1.0.21.jar $BUCKET
gsutil cp snapshot/target/pubsublite-spark-snapshot-$SAMPLE_VERSION.jar $BUCKET
gsutil cp $PUBSUBLITE_SPARK_SQL_STREAMING_JAR_LOCATION $BUCKET
```
7. Set Dataproc region
8. Set Dataproc region
```sh
gcloud config set dataproc/region $REGION
```
<!-- TODO: set up bots to update jar version -->
8. Run the sample in Dataproc
9. Run the sample in Dataproc
```sh
gcloud dataproc jobs submit spark --cluster=$CLUSTER_NAME \
--jars=$BUCKET/pubsublite-spark-snapshot-1.0.21.jar,$BUCKET/pubsublite-spark-sql-streaming-with-dependencies-0.1.0.jar \
--jars=$BUCKET/pubsublite-spark-snapshot-$SAMPLE_VERSION.jar,$BUCKET/pubsublite-spark-sql-streaming-$CONNECTOR_VERSION-with-dependencies.jar \
--class=pubsublite.spark.WordCount -- $SUBSCRIPTION_PATH
```

Expand All @@ -74,7 +82,7 @@ To run the word count sample in Dataproc cluster, follow the steps:
```
2. Delete GCS bucket.
```sh
gsutil -m rm -rf $BUCKET_NAME
gsutil -m rm -rf $BUCKET
```
3. Delete Dataproc cluster.
```sh
Expand Down
1 change: 0 additions & 1 deletion samples/pom.xml
Expand Up @@ -3,7 +3,6 @@
<modelVersion>4.0.0</modelVersion>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-pubsublite-spark-samples</artifactId>
<version>0.0.1-SNAPSHOT</version><!-- This artifact should not be released -->
<packaging>pom</packaging>
<name>Google Pub/Sub Lite Spark Connector Samples Parent</name>
<url>https://github.com/googleapis/java-pubsublite-spark</url>
Expand Down

0 comments on commit 7f24f1e

Please sign in to comment.