Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Networking] FLIP: Message Forensic (MF) System #195

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

yhassanzadeh13
Copy link

@yhassanzadeh13 yhassanzadeh13 commented Sep 8, 2023

#259

Summary

This FLIP discusses and compares two potential solutions for the Message Forensic (MF) system in the Flow protocol — a system that identifies and attributes protocol violations to the original malicious sender. The two solutions under consideration are: (1) GossipSub Message Forensic (GMF), and (2) Enforced Flow-level Signing Policy For All Messages. We delve into both, listing their pros and cons, to determine which would be more feasible given the considerations of ease of implementation, performance efficiency, and security guarantees.

Our analysis finds the "Enforced Flow-level Signing Policy For All Messages" to be the more promising option, offering a generalized solution that doesn’t hinge on the protocol utilized to send the message, steering clear of the complexities tied to maintaining GossipSub envelopes and dodging the necessity of duplicating GossipSub router’s signature verification procedure at the engine level. Furthermore, it meshes well with the Flow protocol’s existing state.

Review Guide

This FLIP is presented as a Pull Request (PR) in the flow-go repository. We welcome reviewers to express their opinions and share feedback directly on the PR page, aiming for a structured and productive discussion. To aid this, please adhere to one of the following response frameworks:

  1. I favor the "Enforced Flow-level Signing Policy For All Messages" and here are my thoughts:
  2. I support the "GossipSub Message Forensic (GMF)" approach, articulating my views as follows.
  3. I find both propositions unsatisfactory, elucidating my stance with.

router’s signature verification procedure at the engine level. Furthermore, it meshes well with the Flow protocol’s existing state.

## Review Guide
This FLIP is presented as a Pull Request (PR) in the `flow-go` repository. We welcome reviewers to express their opinions and share feedback directly on the PR page, aiming for a structured and productive discussion. To aid this, please adhere to one of the following response frameworks:
Copy link

@gomisha gomisha Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced PR in flow-go is closed. Should this be updated to ask for feedback on the current PR (195) instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gomisha
Copy link

gomisha commented Sep 13, 2023

I favor the "Enforced Flow-level Signing Policy For All Messages" (I have suggested to rename this to FLS) and here are my thoughts:

  • more generic solution vs GMF - I like that the solution can be used for generic forensics that aren't only for GossipSub messages
  • less overhead from having to carry around the GossipSub envelope by the engines as well as having to duplicate GossipSub verification logic in the engines

My biggest concern in adopting FLS is the backward compatibility issues this will cause as it will be a major breaking change. But from a design and maintainability perspective, I like this appraach over GMF.

yhassanzadeh13 and others added 4 commits September 13, 2023 15:10
Co-authored-by: Misha <15269764+gomisha@users.noreply.github.com>
Co-authored-by: Misha <15269764+gomisha@users.noreply.github.com>
Co-authored-by: Misha <15269764+gomisha@users.noreply.github.com>
Co-authored-by: Misha <15269764+gomisha@users.noreply.github.com>
yhassanzadeh13 and others added 6 commits September 13, 2023 15:11
Co-authored-by: Misha <15269764+gomisha@users.noreply.github.com>
Co-authored-by: Misha <15269764+gomisha@users.noreply.github.com>
Co-authored-by: Misha <15269764+gomisha@users.noreply.github.com>
Co-authored-by: Misha <15269764+gomisha@users.noreply.github.com>
@bluesign
Copy link
Collaborator

bluesign commented Sep 14, 2023

I support the "GossipSub Message Forensic (GMF)" approach, articulating my views as follows:

  • Network layer is clearly separated from internal messaging logic.
  • By carrying GossipSub envelope without data we can prevent all the overhead. (without data GossipSub envelope should be pretty small )
  • I think slashing / protocol violation is a very rare case, signing in gossipSub is pretty basic, duplicate GossipSub verification is only needed when protocol violation is raised and only required for the parts that are responsible for deciding on protocol violations. ( otherwise we have 2 signature verifications, one on network layer, one on application layer )

I raised before (on discord) about moving all communication on GossipSub ( unicast messages too ), but turned out there are some messages assumed to be sent on a direct 1:1 connection. I think solving that is also important. ( I don't have much information about current topology, but I think it maybe possible to have direct peering between components that require 1:1 connections ) Eventually I think objective is to allow anyone to join network (let's say any AN), making it over GossipSub security and scalability guarantees and would benefit in the long run.

@yhassanzadeh13
Copy link
Author

yhassanzadeh13 commented Sep 14, 2023

I support the "GossipSub Message Forensic (GMF)" approach, articulating my views as follows:

  • Network layer is clearly separated from internal messaging logic.
  • By carrying GossipSub envelope without data we can prevent all the overhead. (without data GossipSub envelope should be pretty small )
  • I think slashing / protocol violation is a very rare case, signing in gossipSub is pretty basic, duplicate GossipSub verification is only needed when protocol violation is raised and only required for the parts that are responsible for deciding on protocol violations. ( otherwise we have 2 signature verifications, one on network layer, one on application layer )

I raised before (on discord) about moving all communication on GossipSub ( unicast messages too ), but turned out there are some messages assumed to be sent on a direct 1:1 connection. I think solving that is also important. ( I don't have much information about current topology, but I think it maybe possible to have direct peering between components that require 1:1 connections ) Eventually I think objective is to allow anyone to join network (let's say any AN), making it over GossipSub security and scalability guarantees and would benefit in the long run.

@bluesign, it is imperative in GMF for an engine to retain the full integrity of the GossipSub envelope; the function hinges fundamentally on the wholeness of the data present. This is not merely a procedural stipulation but a requisite grounded in the necessity of verifying the signature, which encapsulates the entire envelope, thereby making it infeasible to relay the envelope devoid of its data contents.

By carrying GossipSub envelope without data we can prevent all the overhead. (without data GossipSub envelope should be pretty small )

⚠️ The perspective you depicted seems to bypass a critical phase of the verification process that involves:

  1. Ascertaining the correlation between the event and the envelope, i.e., the event belongs to the envelope (requires data part of the envelope).
  2. Authenticating the signature of the envelope, i.e., the envelope is attributable to the sender through its networking key (requires the entire envelope).

This is not about entrusting the networking layer blindly; it is about endorsing a methodologically sound practice that employs cryptographic primitives to furnish verifiable proofs, fostering a trust environment among Flow nodes that should not build trust but rather on cryptographically verifiable primitives. Otherwise, we may not even need a forensic mechanism in the first place.

By reducing the envelope to a fragmentary state, the engine's self-sufficient capacity for verification is impeded and essentially undermines the core objective of establishing irrefutable proofs. It is important to not overlook the cardinal principles that govern the GMF, which pivot on the complete and unaltered state of the data encompassed in the envelope. Thus, the GMF solution must consider the integral role of the intact envelope in sustaining the very foundation of the system we are discussing, ensuring the accurate verification of both the event’s association with the envelope and the envelope's signature.

@bluesign
Copy link
Collaborator

What I was proposing is something like this:

Some struct like message.Message as the Flow message. ( enriched with seqno, topic. signature, key from pb.Message, and decoded event as interface{} ) passed to engine. So technically engine will have everything to prove (with little overhead). But as gossipSub is defending against impersonation etc, I don't think engine needs to do a signature check here.

In case of conflict, this new struct is self sufficient to raise a claim. Then who is responsible checking for this violation can get the this struct from the node, reconstruct pb.Message and do the signature verification. if signature verification succeed, then it can punish the offender ( if not can punish the claimer )

@yhassanzadeh13
Copy link
Author

yhassanzadeh13 commented Sep 14, 2023

What I was proposing is something like this:

Some struct like message.Message as the Flow message. ( enriched with seqno, topic. signature, key from pb.Message, and decoded event as interface{} ) passed to engine. So technically engine will have everything to prove (with little overhead). But as gossipSub is defending against impersonation etc, I don't think engine needs to do a signature check here.

In case of conflict, this new struct is self sufficient to raise a claim. Then who is responsible checking for this violation can get the this struct from the node, reconstruct pb.Message and do the signature verification. if signature verification succeed, then it can punish the offender ( if not can punish the claimer )

I don't think engine needs to do a signature check here.

@bluesign yes, technically the engine doesn't have to check the signature itself and can rely on the data from its networking layer. But, the core idea behind the FLIP is to make sure that any proof it gives when reporting a rule-breaking move is fully self-standing. This means that if node A is saying that node B did something wrong, using evidence E, then anyone else should be able to see that node B was indeed in the wrong just by looking at E.

For this to work, node A needs to share the original message as it was, signature and all. Even a tiny change to the message stops others from being able to confirm the signature is real. The signature covers everything in the entire envelope, not just parts of it. Below is a copy of the entire pb.Message and based on the signing code snippet, everything except XXX_sizecache and XXX_NoUnkeyedLiteral fields are required for the signature verification. The XXX_sizecache and XXX_NoUnkeyedLiteral fields are currently unused, so they won't add any overhead and skimming them off will not save any substantial overhead. Moreover, including the Seqno, From, Data, Topic, Signature, Key and XXX_unrecognized fields in the forensic data that is passed to engine is literally the same as sharing the pb.Message itself with the engine. Not having any of these fields for the engine means inability to build self-standing evidence. Notably, we must share the Data with the engine, hence "without data GossipSub envelope should be pretty small" will not stand valid.

type Message struct {
	From                 []byte   `protobuf:"bytes,1,opt,name=from" json:"from,omitempty"`
	Data                 []byte   `protobuf:"bytes,2,opt,name=data" json:"data,omitempty"`
	Seqno                []byte   `protobuf:"bytes,3,opt,name=seqno" json:"seqno,omitempty"`
	Topic                *string  `protobuf:"bytes,4,opt,name=topic" json:"topic,omitempty"`
	Signature            []byte   `protobuf:"bytes,5,opt,name=signature" json:"signature,omitempty"`
	Key                  []byte   `protobuf:"bytes,6,opt,name=key" json:"key,omitempty"`
	XXX_NoUnkeyedLiteral struct{} `json:"-"`
	XXX_unrecognized     []byte   `json:"-"`
	XXX_sizecache        int32    `json:"-"`
}

In conclusion, when we talk about “reducing overhead,” creating a new structure with only some details from the pb.Message isn't going to help. The proof needs the full pb.Message to be reliable. So, it's not really about more or less "overhead", it's about keeping the proof verifiable and trustworthy.

Then who is responsible checking for this violation can get the this struct from the node, reconstruct pb.Message and do the signature verification.

It appears you are advocating for the alteration of the current one-step process into a more interactive two-step procedure, though the advantages of this aren't entirely clear. Let's take a closer look:

  1. As outlined in the original GMF proposal, when the networking layer sends an event to an engine, it should include all the necessary forensic data (i.e., pb.Message). This approach empowers the engine to craft evidence that is fully self-sufficient, meaning other nodes can validate it independently without further input or clarification from the originating engine.

  2. Your approach seems to suggest that not all forensic data is essential in building evidence. It implies we should select and use only certain parts of the data to form what might essentially be partial evidence. Subsequently, nodes wishing to authenticate this evidence would need to reach out for additional details. This fundamentally changes the protocol to one where pb.Message details must be stored and retrieved, creating an environment ripe for increased complexity and engineering challenges (as previously described here) without an obvious enhancement to the current system.

Copy link
Contributor

@peterargue peterargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns with both options.

I agree that that GossipSub specific option isn't universal enough to cover all of our usecases, but I worry about the overhead of adding an extra layer of signatures to every message. I wonder if there is a hybrid approach we could take that avoids generating and verifying 2 signatures for every/most messages, plus the additional network data.

At its core, the requirement is to collect cryptographic evidence that can be used by a 3rd party to prove that a protocol violation occurred, and that a specific node sent the message. The evidence itself is raw bytes of data making up the original message and the signature signed by the misbehaving node (and probably a block id/height to get the node's public keys from protocol state).

What if instead of adding a new signature to all messages, we use a usecase specific method, and encapsulate it in a shared interface to reduce complexity.

something like:

  • Have a high level envelope interface with methods to get the underlying event and forensic data.
  • Implementations hold references to the decoded event plus source data/signature needed for attribution
  • Engines are updated to accept this new interface
  • If a misbehavior is detected, the envelope is passed along to a slashing system

Then, there could be different implementations for each of the message types.

  • Gossibsub would include the pb.Message.
  • Unicast would include the new mechanism.
  • Future optimizations could use the actual event itself as the evidence if it already contained a signature from the sender.
  • The implementation could produce some consistent format that the slashing system used. I assume that would be something like NodeID plus 2 []byte, one for the raw data and one for the signature.

I think this addresses most of the disadvantages for both options (except for being a breaking change), while also avoiding additional signatures.

@AlexHentschel
Copy link
Member

This is going to be a bit of a longer reply, sorry 😅. I think there is quite a few nuances and applications patterns that need to be considered.

Central design goals

  • We design for a scenario, where over longer periods of time, 99.9% messages are honest (or higher). In the ideal case, none of the code paths leading to slashing of a node are ever executed. Resource consumption and runtime impact should be optimized for the happy path scenario. We are willing to accept significant performance deterioration, in case of a attackers committing slashable protocol violations - as long as on the order of some minutes, the network has slashed the offending nodes and ejected them. By virtue of being implemented, slashing will be enough of a deterrent.

  • For the overwhelming amount of messages, the additional information needed for message forensics is emphemeral. More concretely, for nearly all messages, their validity and protocol compliance will be confirmed within milliseconds. At this point, the additional message forensics information can be discarded.

    Under normal operational scenarios (overwhelmingly dominant), there are zero to few unchecked messages in the engine's inbound queues. It is completely tractable to keep the additional information needed for message forensics temporarily in memory and let the garbage collector clean it up, once we know the message is honest and can drop the respective in-memory reference to the forensics data. My gut feeling is that this might cost us less than 200MB of extra ram per node, for almost all messages combined. There might be 1 or 2 exceptions (most prominently ChunkDataResponse messages), but there we can add special purpose optimizations. But for the large majority of messages, we can temporarily keep both the deserialized event in ram as well as pb.Message (that is including the event again in a serialized form). We can always optimize later.

  • It is generally highly desired to support the rotation of keys. We don't have to implement it now, but we also want to avoid designs making key rotation harder or impossible.

Thoughts on the metrics to rank designs by

  • runtime impact
  • reliability and strength of protection
  • complexity (implementation modularity, depth of required context to maintain code, probability of future extensions introducing subtle security vulnerabilities)

Analyzing the proposals

Regarding proposal-2: Flow-level Signing Policy (FSP)

  • runtime impact:
    BLS signatures (staking key is BSL key) are computationally much more costly to generate verify an ECDSA signatures (networking key).

    With the FSP proposal, we would incur the additional cost of an added BSL signature consistently on the happy path of the protocol.

  • strength of protection:
    It is important to note that the FSP reduces the surface for attributable but not provable protocol violations, but does not eliminate it. Essentially, we are still (with a very small surface) violating Moxie Marlinspike's Cryptographic Doom Principle [1] -- we are leaving some surface, where a node expends resources but cannot prove protocol violations.

I would argue that with the peer ranking system, we already have a a good foundational defense. But it leaves that gap of attributable but not provable protocol violations. FSP makes this gap smaller, but FSP doesn't close it entirely. Therefore, I questioning whether FSP is impactful enough compared to the required engineering time and the runtime cost -- especially since I believe there are ways to close this gap entirely.

  • complexity: I agree that that the engineering work is probably not too big of a lift.

Regarding proposal-1: GossipSub Message Forensic (GMF)

Generally I think this is the direction to go. Lets look at our ranking criteria:

  • runtime impact:
    If we keep pb.Message in memory for the short time we need it and then just garbage collect it, we would only expend a bit of ram. No latency or significant computational cost.

  • strength of protection:

    No gap in security surface. Every attributable protocol violation is conceptually provable. Very strong security guarantee!.

  • complexity:
    While I understand the concerns about Implementation Complexities and Disadvantages, I think there are quite pragmatic solutions for those concerns. Overall, I am optimistic that we can achieve a more comprehensive solution without significantly extending the engineering cost compared to FSP.

My thoughts on complexity

  • In my mind, the responsibility to adjudicate slashing challenges is completely at the protocol layer. Slashing challenges are separate messages that are exchanged within the protocol. They are self-contained and should either reference other message, which the protocol has already embedded in blocks (e.g. execution receipts). Or the slashing challenge should contain the offending messages itself (the entire message, incl. all envelopes).

  • I am of the opinion, that adjudicating slashing evidence is entirely out of the scope of the networking layer or the engines. They are focused on processing the newest messages necessary for extending the chain. The networking layer and the core protocol logic underlying the engines must be modular enough and provide the primitives allowing to validate slashing evidence outside of the happy path logic (engines). In other words, the networking layer and core protocol layer should expose very low-level functions that can be called by the adjudication logic when processing slashing challenges and the evidence contained therein.

Suggestions:

  • I think the networking layer should continue what it is doing already: authenticates inbound messages before handing them to the engines (protocol layer).

  • separate VerifyGossipSubMessage from EngineRegistry . Reasoning:

    The adjudication logic should provide the networking key (maybe better Identity?) for each message it is requesting to be verified. This is necessary to allow key rotation (e.g. at epoch boundaries), but still allowing to verify authenticity of a message from the past epoch.

    Therefore, the API of interfacing with the engines is very different from interfacing with the adjudication logic. Therefore, I think both should be separate interfaces.

  • Encapsulate all the auxiliary forensics information and sub cases for unicast vs multicast into a ForensicsContext. The following sketch compares the current implementation to my flavour of the GMF proposal Message Forensics (7)

    The ForensicsContext could look something like this:

    type ForensicsContext interface {
    	Channel() channels.Channel
    	OriginID() flow.Identifier
    
    	// BinaryMessage returns the binary representation of the message from the perspective of the networking layer.
    	// This representation should contain the necessary information to:
    	//  * deserialize the Protocol-Layer message (e.g. for proving a protocol violation to another staked node)
    	//  * cryptographically confirm authenticity and integrity of the Protocol-Layer message
    	//    via the origin's networking key.
    	//  * The origin's public networking key is _not_ contained in the BinaryMessage
    	// In a nutshell, the returned value should present the necessary evidence to prove to a third party that the
    	// message was really sent by the origin. The protocol guarantees that the party inspecting this evidence knows
    	// the origin's public networking key. We cannot include the origin's public networking key here, 
    	// as this would _not_ be BFT.
    	BinaryMessage() []byte // TBD: suitable return type here. Not sure whether byte is the best.
    }

My take on the Disadvantages

We need to extend the signature verification mechanism to account for translation of originId from flow.Identifier to networking key and peer.ID (i.e., LibP2P level identifier). As the engines are operating based on the flow.Identifier, while the GossipSub signatures are generated using the Networking Key of the node.

  • I think verifying networking signatures is not the responsibility of engines. The adjudication logic might need to bridge flow.Identifier and peer.ID. Though, the adjudication logic presumably already knows which node is being accused of a protocol violation (that information would be part of the slashing challenge) and has looked up its networking key.

    Not really sure I follow why this is outside of the implementation that we already have. Potentially our existing code needs to be refactored to be more modular, but it is hard for me to see why this would be super complex.

The first step is to ensure the event is wrapped in a GossipSub envelope. If not, the verification fails. For this we need to replicate the entire encoding path down to the GossipSub level as wrapping the Flow message in the GossipSub envelope is done internally at the GossipSub and is not exposed to the Flow codebase. The replication may also cause another layer of coupling that causes breaking changes in the future upgrades of GossipSub.

  • I don't understand why we need to "replicate the entire encoding path down to the GossipSub level". The way I understood the description, we have the raw GossipSub message, which contains all information. We can decode the protocol-layer event from that raw GossipSub message, can't we?

    In a nutshell, this is all that Charlie needs to do when he want to adjudicate a slashing challenge raised by Alice (see my picture above for context) :

    1. verify that pb.Message (contained in Alice's slashing challenge) is in fact originating from Edward. From what I understood, I think that is possible.
    2. from pb.Message, decode the event and verify that it violates the protocol in accordance with Alice's complaint

@jordanschalm
Copy link
Member

jordanschalm commented Sep 18, 2023

I support the "GossipSub Message Forensic (GMF)" approach, with some of the extensions suggested by Peter & Alex.

Interface Changes

To accommodate a ForensicsContext accompanying messages, we need to modify the MessageProcessor interface. The FLIP proposes this interface, where the envelope is added as a separate parameter.

Process(channel channels.Channel, originID flow.Identifier, event interface{}, envelope *pb.Message) error

I agree with Peter's suggestion of instead including the event itself and forensic data within a single higher-level structure:

Have a high level envelope interface with methods to get the underlying event and forensic data.

There are benefits to passing a Message or envelope interface type rather than an any type, for example:

  • We can uniformly associate metric IDs, resource names, etc. to the Message type, rather than manually selecting the label at each callsite, which is much more susceptible to human error (example)
  • We can uniformly assign Messages identifiers to be able to trace their progress through different components.

Since we need to change the interface for this proposal, we should change it in a way that opens up some of these options in the future without further API changes.

Process(channel channels.Channel, originID flow.Identifier, message flow.Message) error

type Message interface {
    Event() any
    ForensicsCtx() ForensicsContext
    // ...
}

Side Note: As the ForensicsContext will include the origin ID, we could remove this parameter from Process as well.

Adding some colour to the comparison of runtime impact

To explicitly state what Alex touched on in his comment: To enable linking messages to their corresponding network signature and envelope, we need to retain a reference to already-allocated memory for (slightly) longer. We do not need to allocate additional memory for each message.

What is the memory impact of retaining pb.Message references for the complete processing duration of a message?

Here's some back-of-the-envelope calculations:

Average GC cycles on Mainnet averages around 0.5-2 per minute, depending on the node role. (metrics link)
image

Suppose it takes 100ms on average to process a message. Then the memory impact would be upper-bounded by the proportion of messages we receive within 100ms of a GC cycle. If we assume the larger 2 GCs/min, then about 0.3% of the memory associated with pb.Message allocations would be retained for an extra GC cycle as a result of the longer retention. Even if all the suggested 16GB RAM were used for pb.Messages, we would incur only a maximum of 50MB additional cost by retaining the message reference as suggested. In practice, this should be much much lower (likely <10MB in practice)

@kc1116
Copy link

kc1116 commented Sep 19, 2023

I support the "GossipSub Message Forensic (GMF)" approach, articulating my views as follows.

  • I am afraid enforcing the message signing policy at the application layer will add non-negligible overhead to message processing.
  • Although the flow signing policy will add support for signature verification of messages via Unicast the functionality will not be used effectively since we are not doing any "message forwarding" with Unicast.
  • Alex suggestion to keep a reference in memory to the original gossip sub-envelope solves the issue of additional overhead due to "passing the full original envelope" to the engine layer.

@tarakby
Copy link
Contributor

tarakby commented Sep 20, 2023

I do not support the "Enforced Flow-level Signing Policy For All Messages", I support the idea of "GossipSub Message Forensic (GMF)" but some changes may be necessary.

First, I wanted to clarify two concepts or cryptographic services, since they will be used twice in my comment:

  • authentication: the ability to attribute a message to a certain party. For instance Alice is convinced that a message it received was sent by Bob.
  • non-repudiation: the ability to prove to other parties the origin of a message. For instance, Alice is able to prove to Oscar that a message she received was sent by Bob.

These definitions may have social/legal aspects so we can assume for simplicity that we are not attributing messages to Bob as a party, but we are attributing messages to a party controlling a private key corresponding to some public key shared with the public.
Non-repudiation is a stronger concept than authentication, and it prevents Bob from denying being the origin of some message. I believe @AlexHentschel has mentioned the same concepts and called them attributability (for authentication) and provability (for non-repudiation).
As an example in cryptographic primitives, signature schemes offer both properties while message authentication code (MAC) between 2 parties only offers authentication.
The purpose of this FLIP is to provide protocol-level non-repudiation, since authentication seems to be already implemented on the network layer.

I do not support "Enforced Flow-level Signing Policy For All Messages" because:

  1. It seems to me that it does not provide non-repudiation (I believe @AlexHentschel also pointed this out). If a message has a valid network authentication, but invalid protocol level signature, there is no way the protocol can attribute the message to the original sender, because the protocol does not recognize the network level authentication. I believe this is the issue we wanted to solve in the first place.
  2. It is a redundant level of authentication IMO as @yhassanzadeh13 pointed out. The networking layer is already authenticating messages, and some protocol messages are already being authenticated using the staking key (consensus votes for instance). We could disable the extra Flow-level signature for those "already-signed" messages but that may mess up with the engine modularity. Note that the protocol signatures are not always signing the message envelope (some signed messages are even omitted from the payload itself).
  3. a less important point is about the signature scheme chosen. Protocol level authentication is currently using BLS. BLS is only relevant when multi-signature is needed (for instance aggregation and batch verification), but is not optimized for basic signatures (on my laptop, one BLS verification is 13x slower than ECDSA and 16.5x slower than EdDSA - considering our new fast BLS implementation).

While I support "GossipSub Message Forensic (GMF)", I wanted to clarify a few points, in particular about our current implementation using libp2p:

  1. I think it makes sense to differentiate the protocol specs from its implementation. We may want to base our specs on libp2p (since it is used in the current and only implementation of Flow nodes), but we should be able to describe the attribution data in a spec. The attribution data is part of a challenge that would eventually be posted in blocks and the protocol state, and should therefore be described regardless of libp2p or any other implementation.
  2. we are using lip2p2 unicast (for 1-1) and PubSub (for 1-many). I am going to only consider unicast messages (I didn't dig deep in libp2p's pubsub):
    1. libp2p isn't simply signing every network payload by the sender's networking private key and then verifying it on the receiving end by the sender public key. This is the complexity I believe @yhassanzadeh13 mentioned (also answering your question @AlexHentschel). The networking key isn't used to sign any network message, it only signs a second level key called a "static key", and therefore delegates authentication to the static key. This is a choice libp2p took to allow multiple signature schemes for the networking key (Flow doesn't need this choice). The static keys are then used in a Noise handshake involving other ephemeral keys to perform a key agreement. The symmetric keys are then used in an authenticated encryption (Flow doesn't need encryption here, but it is enabled by default in libp2p). The symmetric keys are what provide authentication, not the networking keys node share when staking. I have discussed this long time ago when we switched from SECIO to Noise (low level security protocols used by libp2p). Ok all this sounds complicated, but can we still make libp2p export all the data needed for attribution and trace it back to the networking key?
    2. the key agreement above is a Diffie-Hellman (DH) involving both static and ephemeral keys. Proving correctness of the shared key to third parties includes exporting the static private key of the node ⚠️
    3. the shared key, as it names says, is shared between both ends of the communication. Authentication is provided by authenticated encryption (AEAD). You can think of this as encryption mixed with a MAC (if you're not familiar with a MAC, it is the symmetric version of a signatures). As I mentioned in the beginning of my message, MAC does not provide non-repudiation. Evil Bob can always claim they never signed the payload (even though honest Alice knows he did), and that because the signing key is shared ⚠️

I didn't look at PubSub in details, it seems that the networking key (not libp2p static key) is used to sign original payloads (which is great news) but we still need to confirm it.

If we want to implement GMF with libp2p we would need to make some changes to libp2p and its use of Noise for 1-1 (maybe through a fork). I will stop my reply here and we can get into ideas on how to update libp2p later.

Copy link
Contributor

@tarakby tarakby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice write-up 👏🏼

Within the Flow protocol, nodes converse through a networking layer, a medium which undertakes the dispatch and receipt of messages among nodes
via different communication methods: unicast, multicast, and publish. Unicast messages are transmitted to a single recipient over a direct connection.
The multicast and publish, on the other hand, utilize a pub-sub network constructed with the LibP2P GossipSub protocol for message
dissemination. However, this protocol encounters challenges in message attribution, particularly in determining the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context here is great. It would be also clearer if we recall the Flow protocol state assumptions (for each node, the protocol state tracks 2 public keys, a staking and a networking key)

2. This approach adds a computation overhead to the Flow protocol engines, as the engines must sign all the messages that are sent through the `Conduit` interface, and on the receiving side, the
networking layer must verify the signature of the message against the Staking Key of the sender. This overhead is not negligible, as the Flow protocol engines are the most performance critical components of the Flow blockchain.
Hence, we must carefully evaluate the performance overhead of this approach. Moreover, with this approach, we are extending the size of data sent over the wire by piggybacking the signature.
Assuming that we are using ECDSA with `secp521r1` curve and SHA-512 for hashing, the signature size is ~100 bytes (in theory). Hence, we are adding ~100 bytes to the size of data sent over the wire.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is correct, but using secp256r1 as an example gives a closer number to what we have in Flow (secp521 provides higher security than that we require for Flow, while sec256r1 is what the networking layer is using).
We can also omit the hashing algorithm because it doesn't impact the signature length.

Suggested change
Assuming that we are using ECDSA with `secp521r1` curve and SHA-512 for hashing, the signature size is ~100 bytes (in theory). Hence, we are adding ~100 bytes to the size of data sent over the wire.
Assuming that we are using ECDSA with `secp256r1` curve, the signature size is 64 bytes. Hence, we are adding 64 bytes to the size of data sent over the wire.

@AlexHentschel
Copy link
Member

thanks @tarakby for your detailed comment. Lets try to use the established terminology of authentication and non-repudiation going forward. Thanks Tarak for clearly explaining the concepts.

@KshitijChaudhary666
Copy link
Contributor

Hi @yhassanzadeh13 - this FLIP is not reflected on FLIP project tracker. Did you follow the process outlined in https://github.com/onflow/flips? Specifically please remember to do the following without which the FLIP won't get visibility on the project tracker-

Create an issue by using one of the FLIP issue templates based on the type of the FLIP - application, governance, cadence or protocol. The title of the issue should be the title of your FLIP, e.g., "Dynamic Inclusion fees". Submit the issue. Note the issue number that gets assigned.
Then, Create your FLIP as a pull request to this repository (onflow/flips). Use the issue number generated in step 2 as the FLIP number. And mention the FLIP issue by copying the GitHub URL or the issue in the comment section.

Thank you!

@KshitijChaudhary666
Copy link
Contributor

KshitijChaudhary666 commented Apr 11, 2024

Hi @yhassanzadeh13 - following up on my message above.

This FLIP is not reflected on FLIP project tracker. Can you pls follow the process outlined in https://github.com/onflow/flips? Specifically the following without which the FLIP won't get visibility on the project tracker-

  • Create an issue by using one of the FLIP issue templates based on the type of the FLIP - application, governance, cadence or protocol
  • The title of the issue should be the title of your FLIP, e.g., "Dynamic Inclusion fees". Submit the issue. Note the issue number that gets assigned
  • Use the issue number generated in step 2 as the FLIP number. And mention the FLIP issue by copying the GitHub URL or the issue in the comment section

@AlexHentschel
Copy link
Member

AlexHentschel commented Apr 11, 2024

@KshitijChaudhary666 Yahya has moved on to a new professional opportunity.
This Flip is currently iceboxed (at least from the perspective of Flow foundation resourcing it - which does not preclude community contributors from picking it up, though they would need to align tech details with us). In summary it is up to us now to do with this flip whatever we do with flips currently on ice (please let me know).

Co-authored-by: Yahya Hassanzadeh, Ph.D. <yhassanzadeh@ieee.org>
Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering what state we should be targeting for this PR. There is a lot of important and valuable discussion here in the PR, which might still lead to changes in the proposed FLIP.

@vishalchangrani
Copy link
Contributor

I am wondering what state we should be targeting for this PR. There is a lot of important and valuable discussion here in the PR, which might still lead to changes in the proposed FLIP.

hey @AlexHentschel - we will not close the PR.

Just to be clear,
The issue to track this FLIP is: #259
The FLIP ID is the same as the issue ID (as per the new process): 259.
This PR can remain open till there is agreement on the FLIP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants