Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Hudi SQL Based Transformer Fails when trying to provide SQL File as input #11258

Closed
soumilshah1995 opened this issue May 19, 2024 · 4 comments

Comments

@soumilshah1995
Copy link

Here is Delta Streamer

spark-submit \
  --class org.apache.hudi.utilities.streamer.HoodieStreamer \
  --packages org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0 \
  --properties-file spark-config.properties \
  --master 'local[*]' \
  --executor-memory 1g \
   /Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
  --table-type COPY_ON_WRITE \
  --op UPSERT \
  --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
  --source-ordering-field replicadmstimestamp \
  --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
  --target-base-path file:///Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/silver/ \
  --target-table invoice \
  --props hudi_tbl.props

Hudi prop

hoodie.streamer.source.hoodieincr.missing.checkpoint.strategy=READ_UPTO_LATEST_COMMIT
hoodie.streamer.source.hoodieincr.path=s3a://warehouse/default/table_name=orders
hoodie.datasource.write.recordkey.field=order_id
hoodie.datasource.write.partitionpath.field=
hoodie.datasource.write.precombine.field=ts


Tried following options


hoodie.streamer.transformer.sql.file=join.sql
OR
hoodie.streamer.transformer.sql.file=file:///Users/soumilshah/IdeaProjects/SparkProject/deltastreamerBroadcastJoins/join.sql
OR
hoodie.streamer.transformer.sql.file=/Users/soumilshah/IdeaProjects/SparkProject/deltastreamerBroadcastJoins/join.sql

Error Message


FO BaseHoodieTableFileIndex: Refresh table orders, spent: 15 ms
24/05/19 12:38:37 ERROR HoodieStreamer: Shutting down delta-sync due to exception
java.lang.IllegalArgumentException: Property hoodie.streamer.transformer.sql not found
	at org.apache.hudi.common.util.ConfigUtils.getStringWithAltKeys(ConfigUtils.java:334)
	at org.apache.hudi.common.util.ConfigUtils.getStringWithAltKeys(ConfigUtils.java:308)
	at org.apache.hudi.utilities.transform.SqlQueryBasedTransformer.apply(SqlQueryBasedTransformer.java:52)
	at org.apache.hudi.utilities.transform.ChainedTransformer.apply(ChainedTransformer.java:105)
	at org.apache.hudi.utilities.streamer.StreamSync.lambda$fetchFromSource$0(StreamSync.java:530)
	at org.apache.hudi.common.util.Option.map(Option.java:108)
	at org.apache.hudi.utilities.streamer.StreamSync.fetchFromSource(StreamSync.java:530)
	at org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:495)
	at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:405)
	at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.lambda$startService$1(HoodieStreamer.java:757)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
24/05/19 12:38:37 INFO HoodieStreamer: Delta Sync shutdown. Error ?true
24/05/19 12:38:37 INFO HoodieStreamer: Ingestion completed. Has error: true
24/05/19 12:38:37 INFO StreamSync: Shutting down embedded timeline server
24/05/19 12:38:37 ERROR HoodieAsyncService: Service shutdown with error
java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Property hoodie.streamer.transformer.sql not found
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2005)
	at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
	at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:65)
	at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
	at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
	at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hudi.exception.HoodieException: Property hoodie.streamer.transformer.sql not found
	at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.lambda$startService$1(HoodieStreamer.java:796)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Property hoodie.streamer.transformer.sql not found
	at org.apache.hudi.common.util.ConfigUtils.getStringWithAltKeys(ConfigUtils.java:334)
	at org.apache.hudi.common.util.ConfigUtils.getStringWithAltKeys(ConfigUtils.java:308)
	at org.apache.hudi.utilities.transform.SqlQueryBasedTransformer.apply(SqlQueryBasedTransformer.java:52)
	at org.apache.hudi.utilities.transform.ChainedTransformer.apply(ChainedTransformer.java:105)
	at org.apache.hudi.utilities.streamer.StreamSync.lambda$fetchFromSource$0(StreamSync.java:530)
	at org.apache.hudi.common.util.Option.map(Option.java:108)
	at org.apache.hudi.utilities.streamer.StreamSync.fetchFromSource(StreamSync.java:530)
	at org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:495)
	at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:405)
	at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.lambda$startService$1(HoodieStreamer.java:757)
	... 4 more
24/05/19 12:38:37 INFO SparkContext: SparkContext is stopping with exitCode 0.
24/05/19 12:38:37 INFO SparkUI: Stopped Spark web UI at http://soumils-mbp:8090
24/05/19 12:38:37 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
24/05/19 12:38:37 INFO MemoryStore: MemoryStore cleared
24/05/19 12:38:37 INFO BlockManager: BlockManager stopped
24/05/19 12:38:37 INFO BlockManagerMaster: BlockManagerMaster stopped
24/05/19 12:38:37 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/05/19 12:38:37 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.hudi.utilities.ingestion.HoodieIngestionException: Ingestion service was shut down with exception.
	at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:67)
	at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
	at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
	at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
	at org.apache.spark.de

Delta Streamer works fine


hoodie.streamer.transformer.sql=SELECT a.customer_id, c.name AS customer_name, c.state AS state, c.city, c.email, a.order_id, a.name AS order_name, a.order_value, a.priority, a.order_date, a.ts FROM <SRC> a JOIN hudi_db.customers c ON c.customer_id = a.customer_id

tried all option it fails when I am proving .sql file

image

Here is my Joined.sql

SELECT
    a.customer_id,
    c.name AS customer_name,
    c.state AS state,
    c.city,
    c.email,
    a.order_id,
    a.name AS order_name,
    a.order_value,
    a.priority,
    a.order_date,
    a.ts
FROM
        <SRC> a
    JOIN
    hudi_db.customers c
ON
    c.customer_id = a.customer_id;

@soumilshah1995
Copy link
Author

when providing a sql file as I/p

java.lang.IllegalArgumentException: Property hoodie.streamer.transformer.sql not found

looks like it still looking for hoodie.streamer.transformer.sql

@ad1happy2go
Copy link
Contributor

@soumilshah1995 Your transformer class should be --transformer-class org.apache.hudi.utilities.transform.SqlFileBasedTransformer

@soumilshah1995
Copy link
Author

really let me try

@soumilshah1995
Copy link
Author

Thanks man

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants