Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type Spark’s Structured Streaming #232

Open
OlivierBlanvillain opened this issue Jan 20, 2018 · 2 comments
Open

Type Spark’s Structured Streaming #232

OlivierBlanvillain opened this issue Jan 20, 2018 · 2 comments

Comments

@OlivierBlanvillain
Copy link
Contributor

We are currently missing these two Dataset method:

  • DataStreamWriter writeStream()
  • Dataset withWatermark(String eventTime, String delayThreshold)

That require some understanding of Spark streaming to be properly typed and tested. Here is the relevant documentation if anyone is interested and getting started on that:

https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
https://databricks.com/blog/2017/05/08/event-time-aggregation-watermarking-apache-sparks-structured-streaming.html

@etspaceman
Copy link
Contributor

etspaceman commented Jul 30, 2019

+1 - This was a big blocker for us adopting Frameless, as most of our jobs are structured streaming jobs.

@kyprifog
Copy link

kyprifog commented Sep 18, 2019

I'm curious why this never took off, my guess is that most typelevel people are using fs2 instead of spark streaming, but its still limited in that it can't out of the box do distributed streaming. Maybe typelevel people are using flink instead but seems doubtful from how flink is engineered.

This article is interesting, has anyone tried to extend this approach into the fs2/frameless world?

http://mandubian.com/2014/02/13/zpark/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants