Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SS]《3.1 Structured Streaming 之状态存储解析》讨论区 #33

Open
lw-lin opened this issue Jan 1, 2017 · 8 comments
Open

[SS]《3.1 Structured Streaming 之状态存储解析》讨论区 #33

lw-lin opened this issue Jan 1, 2017 · 8 comments

Comments

@lw-lin
Copy link
Owner

lw-lin commented Jan 1, 2017

如需要贴代码,请复制以下内容并修改:

public static final thisIsJavaCode;
val thisIsScalaCode

谢谢!

@junhero
Copy link

junhero commented Feb 16, 2017

@lw-lin
如果计算count distinct这种算uv的场景statestore方式不能做吧?

@lw-lin
Copy link
Owner Author

lw-lin commented Feb 19, 2017

@junhero

这个跟数据集大小有关。如果数据集非常小,如 user id 的空间很小,那么 statestore 是没有问题的。如果 user id 的空间很大,但每天的 distinct user id 很小,那么 statestore 也是没有问题的。但如果 user id 空间很大,每天的 distinct user id 又很多,那 statestore 就有问题了。可以考虑其它方法如 hyperloglog 等。

@junhero
Copy link

junhero commented Feb 20, 2017

谢谢

@KevinZwx
Copy link

KevinZwx commented Aug 30, 2017

您好,我想请教一下stateStore里具体存储的是什么内容?我看到在statefulOperators里的一些对state的put操作如下:

val thisIsScalaCode
val getKey = GenerateUnsafeProjection.generate(keyExpressions, child.output)
...
while (iter.hasNext) {
                val row = iter.next().asInstanceOf[UnsafeRow]
                val key = getKey(row)
                store.put(key, row)
                numUpdatedStateRows += 1
              }

@lw-lin
Copy link
Owner Author

lw-lin commented Aug 31, 2017

@KevinZwx 是 UnsafeRow;key 和 value 都是 UnsafeRow。UnsafeRow 在 SparkSQL 模块里相当于 Object 在 Java 里的作用。UnsafeRow 里包含各种类型(数值、字符串等)的具体数据。

@KevinZwx
Copy link

好的谢谢

@LinMingQiang
Copy link

您好,我想请教下,是不是每次批次的数据在做状态更新的时候都要去hdfs拉一遍对应的stateStore,然后更新完之后再放回hdfs。

@lecssmi
Copy link

lecssmi commented Mar 10, 2020

请问一个可能不算是state的问题。在structured streaming中,两个流之间Join,
但是两个流join的时间范围比较大,比如几个小时。那这部分缓存数据,如果内存存不下,会溢写到磁盘吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants