Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于SparkStreaming的join操作 #28

Open
ddc496601562 opened this issue Dec 21, 2016 · 2 comments
Open

关于SparkStreaming的join操作 #28

ddc496601562 opened this issue Dec 21, 2016 · 2 comments

Comments

@ddc496601562
Copy link

看到sparkStreaming官网上介绍的join

Here, in each batch interval, the RDD generated by stream1 will be joined with the RDD generated by stream2. You can also do leftOuterJoin, rightOuterJoin, fullOuterJoin. Furthermore, it is often very useful to do joins over windows of the streams. That is pretty easy as well.

具体的实现细节是说这个join只是的那个批次内的多个stream的join,暂时还无法做到跨批次的?
如果sparkstream暂时不能做到跨批次的join,那么若是我们自己做的话,一般的思路是怎样的?

@AntikaSmith
Copy link

@ddc496601562 一个思路是自己实现自定义的receiver吧,啥时需要数据来做join了才把相应的数据送过去。话说你后来是怎么做的?

@351zyf
Copy link

351zyf commented Oct 12, 2017

跨批次 你可以放一个窗口出来 窗口里边的就都能join上了 我们是一个小时内join

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants