Skip to content
This repository has been archived by the owner on Aug 9, 2021. It is now read-only.

Propagate pin requests between 3box pinning nodes #279

Open
msterle opened this issue Jan 28, 2020 · 20 comments
Open

Propagate pin requests between 3box pinning nodes #279

msterle opened this issue Jan 28, 2020 · 20 comments
Milestone

Comments

@msterle
Copy link
Contributor

msterle commented Jan 28, 2020

Because the pinning nodes are not peered with each other, pubsub messages sent to one don't propagate to the others. An ipfs peer bootstrapping against the exposed hostname/ip is being directed by the load balancer to one of the pinning nodes. Because of this, it will only receive a subset of pin request messages. For the purposes of an external pinning node, we'd like to receive all pinning messages. To do this:

  • Forward pin request ipfs pubsub messages over redis (similarly to messageBroker implementation)
  • Forward pin requests received on redis over ipfs pubsub
  • Do not handle these pin requests unless the pinning node is the original recipient

The eventual goal is to use libp2p messaging instead of redis, but this will provide similar functionality in the meantime while minimizing changes to the architecture.

@oed oed added this to the Sprint 35 milestone Jan 28, 2020
@michaelsena michaelsena assigned msterle and zachferland and unassigned msterle Jan 29, 2020
@oed
Copy link
Member

oed commented Feb 3, 2020

Still doubt this is the best solution. Brain dump below:

Propagating pin requests won't help much if client isn't directly connected to pinning node that wants to pin data. Since each orbitdb database has a separate pubsub topic in which updates are shared, even if the pinning-node that is not directly connected to the client gets the pin request, it wont be able to pin any data since its not directly connected to the peer that has the data. So propagating the messages in this case will just be a waste of resources.

Propagating pin requests when needed. We can start using ipfs-pubsub-room, with it we can better detect when other peers come online, it also allows to send direct messages without sending to the entire room. With it we can detect when a peer comes online and always share the pin request with the new peer directly. Now this requires some reengineering on both 3box-js and 3box-pinning-node side, and potentially specing out exactly how this would function. I realize that this might be out of scope, but it seems like something closer to the correct answer.

@zachferland
Copy link
Contributor

yeah but this iteration assumes that everyone still connects our pinning node and that they are redundant to it, so data is always available there. But yeah it is definitely not a general solution. But other cases when this is not true i assume for some time clients will be able directly connect to the nodes that they want to pin data with

need to think through the second part, but yeah think we will want to change how we deal with pin messages overall

@oed
Copy link
Member

oed commented Feb 3, 2020

yeah but this iteration assumes that everyone still connects our pinning node and that they are redundant to it, so data is always available there.

I don't think that's true, consider the following:

  1. We have pinning node A and B
  2. There is a pinning node C outside of our cluster
  3. C is connected to B
  4. Client connects to A, pin req routed to C
  5. C opens db of client, but client not connected to C
  6. C is also not connected to A which opens the DB for the user
  7. C won't replicate any of the data of the user

@zachferland
Copy link
Contributor

yeah! thanks for the clear steps, its simple to relay orbit updates through nodes which the dbs are not open, but yeah would still not do initial handshake on new peer where existing heads are sent in 1on1 channel, dont have a follow up yet

@zachferland
Copy link
Contributor

would still fix very specific short term problem, since could assume that c connects to client eventually (current imp, although not a reliable assumption), but c at least doesn't miss pin db message on client open, since relayed before, then in brief time will get heads. But yeah not wholistic and can be weighed against longer term solutions vs time for quick fix.

@oed
Copy link
Member

oed commented Feb 3, 2020

Yeah, eventually they might connect. I guess the whole things relies on this anyhow.

@michaelsena
Copy link

closing, pursuing another solution

@msterle
Copy link
Contributor Author

msterle commented Feb 28, 2020

I am suggesting we reopen this ticket and apply it next.


After the discussion this week, it seems that we won't, in fact, be immediately pursuing ipfs nodes with unique peer ids, and in fact whether we want to do it at all is still an open question and a consensus unlikely to be reached soon given the current information.

For the below reasons, I suggest we prioritise this ticket over 3box/3box#943 and 3box/3box#944, which depended on this decision to reach production. Without a concrete plan on how to make ipfs HA and scalable, they would be of low value at this time, as a single ipfs node would not be deployed in production.

Reasons for pursuing this now:

  1. The ipfs deployment question delaying the planned work above is difficult to resolve because both options rely heavily on assumptions, and I believe testing those assumptions is critical in guiding our decision-making, as well as helping us gain a deeper understanding of ipfs, libp2p and orbitdb. One assumption we are making is that it is relatively simple to replace ipfs/libp2p functionality with distributed equivalents. This ticket will allow us to test this assumption if we first clearly update the scope of work and acceptance criteria, as well as estimate it, and once the work is done, explore any unexpected issues that arose, and determine if more are likely (@zachferland @oed suggest we do the first part on Monday)

  2. If we decide to proceed with a single ipfs id, this will have set up the first version of the pubsub forwarding layer, and would allow a relatively short path for the two tickets being deprioritised (split ipfs and pinning, with pinning workers)

  3. If successful, this would allow us to show immediate value to our partners who want to run their own pinning nodes as part of the 3box network.

@msterle msterle reopened this Feb 28, 2020
@zachferland
Copy link
Contributor

yes this sounds good, thats a good summary of the drivers here 👍

Can help create some well defined stories here, just want to sync on one last part first that i didnt have a chance to today. Basically whether to do iteration 1, or iteration 2 (+1) in one go, as referenced in notion. Can start here, since not well defined in notion yet:

One option for Iteration 1 is entirely at application level vs libp2p layer. It could relay messages as as new messages from our node. Alternatively to forwarding messages from other peers, not a new message, but same original payload and sender (this would be more native libp2p, and would allows peers to handle them a lot more intelligently).

But the second is not necessary yet (and current messages not done this way yet). It does effect the the seen message cache implementation though. If solely application level, yeah as you said in discord you could simplify even further by just adding bool to relayed message as a form of memory. With that we would just have to rely on assumption that we are the only node relaying pin db messages like this. Couldnt have other providers/nodes do it, otherwise same message would change many times over and be handled redundantly.

At more libp2p layer after, where we do forward messages instead, we would have to have a shared seen message cache (since cant change messages, just forwarding). libp2p pubsub does this already, but pindb messages are the only part of message space where this cache has to be shared.

After writing this out, seems like doing application layer is good for first iteration, since very simple and similar to current message handling, and it not adding anything that would need to be backwards compatibility later. Yes has some overhead, but close to same as current orbit messages, and known thing to improve after in iteration 1.5/2.

iter1: So relay messages like now (as new messages at app layer), but flag. Only handle/relay pin db messages not flagged. Assume/configure other pinning nodes to not relay pindb messages.

iter1.5: move to forwarding pin db message through libp2p, have seen message cache based on unique message ids.

iter2: forward all messages, including orbit messages through libp2p hooks

Overall, think either iter1, or straight to iteration iter1.5 is valid path for first iteration.

@msterle
Copy link
Contributor Author

msterle commented Mar 3, 2020

Thanks for condensing the above; I'd aim for the shorter amount of work for this. Because I still disagree with following the "supernode" path at the stage 3box is at (because it goes against recommended IPFS usage, which involves risk and unknowns that are being underestimated), I think we should aim for a tighter scope than 3-6 weeks of work.

I'm recommending we do the work primarily because we're blocked by this decision, so I'm hoping it will be useful in surfacing examples of the types of unexpected issues I'm going to encounter, and how those affect the amount of work involved. Do you think your 1 iteration version, or an even smaller chunk of work, would be useful for this kind of test, and do you know what kind of results you would find convincing one way or the other?

@zachferland
Copy link
Contributor

yup, i think with that framing iter1.5 is the most minimally "interesting" work to answer some of those questions, and also less than 3 weeks easily. iter1 is only days, and would not give any insight about thinking about actual pubsub layer.

Don't think anything here will invalidate directions, but think it will give idea of magnitude of work for each direction at this layer.

@zachferland
Copy link
Contributor

zachferland commented Mar 3, 2020

As said above, application level only would not lead to much insight, so suggesting second option (1.5) above. Where we hook in at libp2p pubsub layer to consume/relay messages. Will rough outline implementation here, could be some nuances still or hooked in at other levels maybe.

  1. Extend and wrap https://github.com/ChainSafe/gossipsub-js. Primarily _processRpcMessage function https://github.com/ChainSafe/gossipsub-js/blob/master/src/index.js#L147.
class GossipSubWrap extends GossipSub {
  // contructor to pass redis pubsub and redis seen cache, or other creator function
  
  _processRpcMessage (msg) {
    // probably parse util for msg first
    if (msg.topicIDs.includes('3box-pinning')) {
       const id = libp2p.utils.msgId(msg.from, msg.seqno)
       if (this.pinningSeenCache.has(id)) {
         // an internal node already consumed from external network, ignore and don't relay internal
         return
       } 
       this.pinningSeenCache.put(id)
       // cache as seen and relay to all other nodes to relay to their clients
       this.redis.publish('3box-pinning', msg)
    }
    // now emit  to node, so orbit/3box can handle pindb req, relay to clients with gossibsub logic
    super._processRpcMessage(msg)
  }

  _processInternalMessage(msg) {
    // not sure shortest implementation here yet
    // relay through logic in super._processRpcMessage(msg)
    // just dont emit with this line https://github.com/ChainSafe/gossipsub-js/blob/master/src/index.js#L148
  }
 }
  1. Use existing redis pubsub layer, subscribe to "3box-pinning" on start or other shared topic between all instances.
redis.subscribe('3box-pinning', GossipSubWrap._processInternalMessage)
  1. pass configured gossipsubWrap to libp2p and ipfs instance on init

Notes:
pinning message seen cache can also just be in redis, where values have a ttl of a few minutes
seems like 5pt story, more if there are any unknowns still, or does not behave as expected, maybe 8 depending how much testing or what test case we use

@msterle
Copy link
Contributor Author

msterle commented Mar 4, 2020

@zachferland what benefit would there be in doing all this rather than adding a flag to the payload before forwarding it, such as seenBy3box: true?

I'm assuming that we would also send out the pinning room messages received over redis externally over pubsub so that external nodes receive them, as I understood from earlier discussion?

* CLIENT *             * COMM. NODE *
    |                        ^
 pubsub                      |
    |                      pubsub
    v                        |
* PINNING1 * --redis--> * PINNING2 *

This solution will end up sending out X-1 copies of each pubsub message to the swarm (where X is the number of pinning nodes we have up), where these copies will look different because they will have unique seqno and from entries (how ipfs pubsub tracks sender and message identity).

This also won't work with the pubsub workers built this sprint, because they use these properties of pubsub (sender and message identifier)

@zachferland
Copy link
Contributor

it can, but as said above doing that alone won't lend much insight to future work, and some pitfalls already known. But can be done as step to last comment to test some parts.

  1. they wont be "forwarded" but new messages, so you will get some amplification of redundant messages across network, rather than letting pubsub handle same messages naturally

  2. no on else can forward messages like this, same as 1, except now what was same message to start will be consumed as new messages.

  3. doesnt give insight to relaying all messages in a similar way

@zachferland
Copy link
Contributor

oh see comment changed! reading now

@zachferland
Copy link
Contributor

ok think we are saying the same thing, yeah thats mainly why you would want to forward them like pubsub does (relay forward with same message id) and not create new messages

current orbit messages are creating new messages to relay, and this was before peers where directly connected. So this assumption was okay, but ideally they would be change to same improvement here.

dont know how context for pubsub workers this sprint

@msterle msterle assigned msterle and unassigned zachferland Mar 5, 2020
@msterle
Copy link
Contributor Author

msterle commented Mar 5, 2020

@zachferland The above sounds like it might work, so I've been trying it out. The first big hurdle I've encountered is that libp2p's pubsub process messages in a synchronous manner, whereas any shared structure like redis involves async in some way. This would at least involve rewriting more of gossipsub to be async.

https://github.com/ChainSafe/gossipsub-js/blob/1852ec82f1/src/index.js#L134

@zachferland
Copy link
Contributor

hmm right, see that, but looks like nothing looks for a return val from _processRpcMessage and would be find if we had promise in there that was not awaited to return, or i guess made that async?

@zachferland
Copy link
Contributor

Also maybe possible at other func layer like message validate

https://github.com/ChainSafe/gossipsub-js/blob/97515f7022b305867440f61acde365c61c802159/src/pubsub.js#L183

which is async function, called on every message and if not true, will drop message before calling _processRpcMessage. That may become a good spot to hook in for extensions

@msterle msterle closed this as completed Mar 10, 2020
@msterle msterle reopened this Mar 10, 2020
@zachferland
Copy link
Contributor

Yeah i think worth keeping in backlog, as its an option worth considering later still, maybe testing with test infra we have later and maybe short term option still depending on timelines of other paths/work

Also implementation note for later, if we made small pr to gossipsub to pull forwarding logic out into a _forwardRpcMessage function (really minor change there) then the code example above would be almost all the code needed to implement this.

@msterle msterle removed their assignment Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants