Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StateStore的实现以及exactly-once #54

Open
lecssmi opened this issue Sep 16, 2019 · 1 comment
Open

StateStore的实现以及exactly-once #54

lecssmi opened this issue Sep 16, 2019 · 1 comment

Comments

@lecssmi
Copy link

lecssmi commented Sep 16, 2019

根据介绍,默认的实现是将state存在hdfs,如果某个算子的某个分区的某个版本失败,那么会重新读取存档的分片数据,进行重写。但是如果在end端,如果没有幂等性和事务,一个分区的数据写入一部分后失败了,应该是会重试整个分片吧。那之前写入的那部分还是会出现重复。请问里面提到的end-to-end exactly-once 是怎么得来的呢?

@lecssmi
Copy link
Author

lecssmi commented Sep 16, 2019

spark的执行粒度都在batch,如果需要保证里面的每条record,那就需要失败后,每次都去遍历batch,代价也比较大吧。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant