Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SS]《4.2 Structured Streaming 之 Watermark 解析》讨论区 #35

Open
lw-lin opened this issue Jan 1, 2017 · 3 comments
Open

[SS]《4.2 Structured Streaming 之 Watermark 解析》讨论区 #35

lw-lin opened this issue Jan 1, 2017 · 3 comments

Comments

@lw-lin
Copy link
Owner

lw-lin commented Jan 1, 2017

如需要贴代码,请复制以下内容并修改:

public static final thisIsJavaCode;
val thisIsScalaCode

谢谢!

@lecssmi
Copy link

lecssmi commented Dec 22, 2019

文章里面提到,如果将watermark的生成放到source端,那么会更好。目前最新版本确实已经支持了。
但是,watermark的存在,本身是为了解决window操作中的数据迟到问题。如果在source端就将watermark生成,但是后面没有用到window操作,或者是window操作很少,生成的大量watermark就不会被利用起来,导致性能损失。那为啥在source端生成watermark要好一些呢?不解。

@judyzhoubaby
Copy link

您好,有一个疑问,文章里提到:“再次强调,(a+) 在对 event time 做 window() + groupBy().aggregation() 即利用状态做跨执行批次的聚合,并且 (b+) 输出模式为 Append 模式或 Update 模式时,才需要 watermark,其它时候不需要;”
但其实只要做基于event_time的filter,例如MapGroupsWithState中的GroupStateTimeout.EventTimeTimeout,也需要使用watermark。

@xza-m
Copy link

xza-m commented Sep 17, 2020

您好 如果我需要对当天全部数据进行groupBy+agg聚合操作,此时不使用window但是设置了watermark,会是什么样的情况?我不明白的是window不设置的情况下,会是无限增长的嘛

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants