Skip to content
Mihai Schiopu edited this page Jan 13, 2023 · 31 revisions

Architecture and Components of the Node

Overview of the tasks performed by the Node

The MultiversX Network is a peer-to-peer network of machines called Nodes. The Nodes perform all the tasks required by the MultiversX protocol:

Because of the complicated tasks listed above must be performed reliably, in a reproducible manner and as efficient as possible, the Node must be a complex piece of software.


The tasks performed by the Node, in more detail

Receiving Transactions

The main purpose of the Nodes in the MultiversX Network is to execute the Transactions requested by its users. When a user wants to request the execution of a Transaction, they submit it to any Node's REST API, as doing this causes the Node to propagate the Transaction throughout the rest of the Network, which will lead to its execution and its eventual inclusion in the Blockchain. This means that the REST API of the Nodes collectively represents the "entry point" through which the Network receives information and requests from the outside world.

Note that people who run Nodes on their own machines may choose to disable the REST API, allowing their Nodes to focus strictly on producing blocks for the Network. On the other hand, there are Nodes in the Network that are dedicated "entry points" and never produce Blocks (but contribute meaningfully in other ways). See below for details.

The Wallet

While any user can submit a desired Transaction to any Node directly through its REST API, there is a convenient method of submitting Transactions to the Network at large: MultiversX provides a Web application which, among other features, enables its users to easily submit Transactions to the Network without having to worry about technicalities such as composing a valid Transaction, signing it and selecting an appropriate Node to submit it to. This Web application is the MultiversX Wallet, currently submitting Transactions to the Testnet and is accessible at https://testnet-wallet.multiversx.com/.

To create a Transaction, the Wallet takes user input, such as the destination Account and the sum of EGLD to transfer, then uses this input to perform the operations described in the section Creating a Transaction. The resulting Transaction is passed to the Proxy, which is an intermediary application that handles incoming Transaction requests from the Wallet and submits them to certain dedicated Nodes in the MultiversX Network.

The Proxy

While any Node in the Network can accept Transaction requests, the MultiversX Wallet submits Transactions to the Proxy application, which maintains a list of Nodes to forward Transaction requests to - these Nodes are selected in such manner that any Transaction submitted to them will be processed by the Network as soon and as efficiently as possible.

The Proxy will submit a Transaction on behalf of the user to the REST API of one of its listed Nodes, selected for (1) being online at the moment and (2) being located within the Shard to which the Sender's Account belongs (see Executing Transactions for the reason of this second criterion). After receiving the Transaction on its REST API, that specific Node will propagate the Transaction throughout the Network, which will lead to its execution.

Note that the Nodes enlisted by the Proxy for Transaction submission are not just some random Nodes - they are specific Nodes which do not ever participate in Consensus, also known as Observer Nodes (as opposed to the normal Nodes, called Validator Nodes). Observer Nodes thus act as a default dedicated "entry point" into the Network. Moreover, Observer Nodes play an important role in the health and stability of the Network, because they act as fast propagators of information and have a lot of storage space. This means that the performance requirements of running a Node can be lowered for normal Nodes, and the machines that run normal Nodes are not as stressed as they would be without Observers.

It is worth repeating here, though, that submitting a Transaction through the Wallet, and implicitly to the Observer Nodes, is completely optional - any Node of the Network will accept Transactions to propagate, given it has not disabled its REST API.

Executing Transactions

In its very simplest form, a Transaction is the atomic transfer of EGLD between two specific Accounts. After a Transaction is executed, committed to the Blockchain and declared final, the Transaction takes full effect and cannot ever be reverted anymore. More complex Transactions also require executing SmartContracts and storing the results, but the principle remains the same for all Transactions:

  • Transactions must arrive at Nodes through Gossip
  • Transactions involve two Accounts (Sender and Destination)
  • Transactions are executed atomically (a Transaction has no effect until it is executed completely)
  • Transactions are incremental changes of the global State
  • Transactions are stored in bulk in the form of Blocks
  • Transactions must be first declared final before taking full effect
  • Transactions cannot be reverted once declared final

Nodes receive Transactions from the outside world on their REST API, which they propagate throughout the Network for execution. The execution of a Transaction involves a set of verifications against any faults it may contain, then its inclusion in a Block that was proposed and validated by the Nodes participating in a Consensus process.

Executing any kind of Transaction requires a non-negligible computational effort to be expended by the Nodes. This effort involves the consumption of time, computational power and electrical energy - real-world resources which cost real-world money. To compensate people who run Nodes on their machines, the MultiversX Protocol dictates that Nodes will receive compensation for their work, paid in EGLD.

But the exact amount of real-world resources consumed by a Node to execute a Transaction is difficult to quantify. A solution to this problem, popular across many Blockchain platforms, is to use an abstract unit called Gas for quantifying computational effort. Gas is then used to calculate the amount of EGLD received by Nodes as compensation for their work. Measuring computational effort in Gas instead of CPU cycles, Joules or the market price of hardware is a simplification, of course, but it is very useful and flexible. The exact amounts of Gas consumed by Nodes for the tasks it performs is described in the section Gas consumed by operations.

In principle, executing a Transaction would be a straightforward process, regardless of its type, but in practice, it is complicated by the fact that the MultiversX Network is sharded (i.e. fragmented) in order to achieve higher efficiency. More specifically, Sharding affects the execution of Transactions due to State Sharding, which separates and distributes the information required to execute Transactions across a carefully controlled number of Shards. As a direct consequence, information about the Accounts themselves is divided throughout the Network. And to execute Transactions, a Node requires information about both the Sender Account and Destination Account. But because a Node is assigned to a single Shard at a given moment in time, it might find itself in the situation where it must execute a Cross-Shard Transaction, i.e. a Transaction where the Sender Account and Destination Account are located in different Shards. In contrast, Transactions where the Sender Account and Destination Account are located in the same Shard are called Intra-Shard Transactions.

It is important to remember that the execution of any Transaction is always initiated by Nodes belonging to the Shard of the Sender Account, regardless of the type of Transaction. This approach enables a relatively simple solution to the challenge posed by Cross-Shard Transactions, because the Nodes that initiate their execution already have half the required information: they have the Sender Account information by virtue of belonging to the Shard of the Sender Account.

To keep the Blocks tidy and organized, Transactions they contain are grouped in Miniblocks, which are lists of Transactions of the same type. For example, there are Miniblocks containing value-transferring Cross-Shard Transactions where the current Shard is the Source Shard, and there are Miniblocks containing SmartContract results from Cross-Shard Transactions where current Shard is the Destination Shard. See Types of Miniblocks within a Block for more details.

As one would expect, Intra-Shard Transactions are easier to execute and will finalize earlier than Cross-Shard Transactions, which require extra steps. The MultiversX Network aims to organize the Account Space in a way that minimizes the number of Cross-Shard Transactions and maximizes the number of Intra-Shard Transactions. See the subsections below for details on how both types of Transactions are executed.

Executing Intra-Shard Transactions

An Intra-Shard Transaction is a Transaction between two Accounts assigned to the same Shard. This is the simpler kind of Transaction to execute, as opposed to executing Cross-Shard Transactions, which require extra steps.

As emphasized in section Executing Transactions, the execution of a Transaction is always initiated by Nodes that belong to the Shard of the Sender Account, regardless of the type of Transaction. Since an Intra-Shard Transaction involves Accounts assigned to the same Shard (as its name implies), the Nodes executing it already have all the information required.

The steps of execution are as follows:

  1. Certain Nodes of the Shard begin the Consensus process.
  2. The Leader Node of the Consensus group proposes a Block which includes the Intra-Shard Transaction. This results in the sum of EGLD being deducted from the Sender Account and then added to the Destination Account.
  3. The rest of the Nodes in the Consensus group validate the proposed Block and validate the result available in the Block Header (e.g. root hashes).
  4. If all Nodes in the Consensus group agree on the same resulting Block, then the Block is notarized by the Metachain and the Consensus process ends.
  5. Later, after some more Blocks have been notarized by the Metachain, the Block that contained the executed Intra-Shard Transaction is declared final, which means that all the Transactions it contains are now final as well, including the Intra-Shard Transaction discussed here.

Note that Intra-Shard Transactions are stored in Miniblocks within a Block, separate from Cross-Shard Transactions (which are also stored in their own Miniblocks). See the linked sections for more details.

Executing Cross-Shard Transactions

A Cross-Shard Transaction is a Transaction between two Accounts assigned to different Shard. This is the more complex kind of Transaction to execute, as opposed to executing Intra-Shard Transactions, which are simpler and require fewer steps.

As emphasized in section Executing Transactions, the execution of a Transaction is always initiated by Nodes that belong to the Shard of the Sender Account, regardless of the type of Transaction. This means that a Node executing a Cross-Shard Transaction has only half the information it requires to completely execute the Transaction, namely the Sender Account. To execute the Transaction in full, the Node would hypothetically require the Destination Account to be retrieved from the Shard it is assigned to. But this would go against the main advantage of Sharding, namely to keep information separated and process it separately in order to gain performance. To avoid this contradiction, MultiversX Nodes employ an alternative method of executing Cross-Shard Transactions: they execute the Cross-Shard Transaction in half inside the Shard of the Sender Account, then send a partial result to the Shard of the Destination Account for completion. The steps are as follows:

  1. Certain Nodes of the Shard begin the Consensus process.
  2. The Leader Node of the Consensus group proposes a Block which includes the half-executed Cross-Shard Transaction. This results in the sum of EGLD being deducted from the Sender Account, but the sum is not added to the Destination Account, because the information on the Destination Account is stored by Nodes of a different Shard.
  3. The rest of the Nodes in the Consensus group execute the proposed Block and validate the result available in the Block Header (e.g. root hashes).
  4. If all Nodes in the Consensus group agree on the same resulting Block, then the Block is notarized by the Metachain and the Consensus process ends.
  5. Nodes from the Shard of the Destination Account receive, through Gossip, the notarized Block from the Metachain
  6. The Nodes from the Shard of the Destination Account notice that the Block contains a half-executed Transaction, of which their Shard is the Destination.
  7. These Nodes then execute the second half of the Transaction, namely adding the transferred sum of ERD to the Destination Account during their own Consensus process and add the fully executed Transaction to the Block they produce.
  8. If all Nodes in the Consensus group of Shard of the Destination Account agree on the same resulting Block, then the block is notarized by the Metachain, ending their Consensus.
  9. Later, after some more Blocks have been notarized by the Metachain, the Block that contained the fully executed Cross-Shard Transaction is declared final, which means that all the Transactions it contains are now final as well, including the Cross-Shard Transaction discussed here.

Note that Cross-Shard Transactions are stored in Miniblocks within a Block, separate from Intra-Shard Transactions (which are also stored in their own Miniblocks). See the linked sections for more details.

Executing SmartContracts

  • Describe how a Transaction with a SmartContract call looks like
  • Describe how a Transaction with a SmartContract deployment looks like
  • Connect this section with the Processor components
  • The VM needs access to info such as Accounts, Blocks, Transactions etc because SCs might need such info
  • Describe here what any SC VM must be able to do: take specialized input, provide specialized output, meter the execution in Gas units, provide "logs"

A Transaction is usually a value-transferring process, with a relatively simple execution, but there is a special kind of Transactions which, on top of transferring value, also request the execution of custom code. Such custom code must have already been published to the Blockchain in the form of a SmartContract (TODO link) before anyone could request its execution. Publishing a SmartContract to the Blockchain is called deploying the SmartContract (performed as a special Transaction), while requesting the execution of code from within a SmartContract is referred to as calling the SmartContract. Both the deployment of a SmartContract or a call to one will require a component dedicated to handling SmartContracts: the Virtual Machine.

Participating in Consensus

Nodes of the MultiversX Network append new Blocks to the Blockchain with consistent regularity: put briefly, a new Block is produced and validated every Round, regardless of how many Transactions had to be executed. This means that even if there are no user-requested Transactions, a Block will still be produced, containing Reward Transactions. Ideally, there will be one Block added to the Blockchain in each Round, although Nodes are built to handle unexpected situations where no Block was produced in a Round or even situations where multiple conflicting Blocks were accidentally produced in the same Round due to severe connectivity issues (see Fork resolution).

The Consensus process lies at the heart of the Protocol, because almost everything the Nodes do is either in preparation for, or as a consequence of a Consensus process. And because a Consensus process results in the next Block to be added to the Blockchain, its security is of high importance. The MultiversX Network relies on the Boneh-Lynn-Schacham (BLS) signing scheme, which allows for a very efficient Consensus process. This efficiency comes from not requiring randomness at all when signing, and so-called "commitment hashes" are not required, compared to other signing schemes. This eliminates a remarkable amount of overhead.

At any given moment, there are as many Blockchains in the MultiversX Network as there are Shards (Blockchains and Shards may merge and split, though - see Adaptive State Sharding). This grants the Network its efficiency and high throughput, but it also means that there is a separate Consensus process executed within each Shard per Round. The Metachain is no exception, although its Consensus process is only slightly different from the Consensus within Shards.

While each Shard executes a Consensus process in each Round, this does not involve all the Nodes. In fact, only a selected subset of the Nodes in a Shard will take part in Consensus, and it's always a different subset, as described in the linked section. The selected Nodes form what is called the Consensus Group, and out of this Group, one of the Nodes will be the Leader of Consensus. This selection is random and is designed to be unpredictable ahead of time.

During Consensus, the selected Leader produces a Block containing executed Transactions and their results and propagates it to the Consensus Group (and to the rest of the Shard, for speculative execution - see below). Then the members of the Consensus Group will validate it and will send their signed approval messages back to the Leader, which assembles an aggregated signature of the entire Consensus Group and seals the Block.

Nodes that have not been selected for Consensus in a Round will still follow the process and validate the Block themselves, in order to stay synchronized. This is a speculative approach, of course, because Consensus may fail in that Round and the Block that they validated will be invalid. But the Consensus is very likely to finish successfully, and in this case all Nodes in the Shard will already have the Block in their own working memory, and can commit it to storage. This saves both time and bandwidth, because the new Block doesn't need to be propagated through the Network anymore.

Note that all Nodes process and validate all the Blocks themselves, regardless of whether they're part of Consensus or not, in order to update their local State and thus to remain synchronized.

Step-by-step Consensus

An entire Consensus process lasts for a single Round, but a Round is further divided into Subrounds, corresponding to the tasks that the Nodes must complete in sequence and in finite amounts of time. Keep in mind that the end of a Subround is a deadline, and exceeding the time budgeted for any Subround results in the failure of the Consensus process.

The Consensus process is governed by the Consensus State Machine, which handles the sequence of Subrounds and their transitions.

The Subrounds of the Consensus are as follows:

  1. The Start Subround, in which the Consensus State Machine is initialized and Nodes are chosen for Consensus. Time allocated to this Subround: 5% of the total Round duration.
  2. The Block Subround, in which a Block is produced by the Leader and proposed to the Consensus Group. Nodes not part of Consensus receive the Block as well, and process it speculatively Time allocated to this Subround: 20% of the total Round duration.
  3. The Signature Subround, in which the Leader awaits signatures from the Consensus Group, expressing their agreement on the Block. Time allocated to this Subround: 60% of the total Round duration.
  4. The End Subround, in which the Leader aggregates the signatures received from the Consensus Group, then broadcasts the Block and commits it to local storage. The rest of the Consensus Group commits it as well. Time allocated to this Subround: 10% of the total Round duration.

The remaining 5% of the Round time is TODO.

The Node will extend the Subrounds if needed. TODO expand TODO explain processingThresholdPercentage

The Start Subround

At the beginning of the Consensus process, the Consensus State Machine is initialized and the Consensus Group is formed by randomly selecting Nodes from the Shard. All Nodes must calculate the Consensus Group, and some will, at this moment, discover that they will participate in Consensus, while the rest will discover that they won't participate. Because this process relies on the randomness generated by the Nodes, they cannot know beforehand what the Consensus Group will be - it can only be calculated after the previous Block has been generated (see the linked section for details).

In short, the Consensus Group in a Shard is a random subset of the Nodes of that Shard. The probability of any Node to be selected for Consensus is not uniform, however: Node Rating is taken into account, and Nodes with a higher Rating factor have a higher probability of being chosen for Consensus. The first Node that happens to be selected by this process will be designated Leader of Consensus. By design, this process prevents Nodes from influencing their election as Leader. Selecting Nodes for the Consensus process of a Round is a task fulfilled by the NodesCoordinator component.

The other members of the Consensus Group are called Consensus Validators. Note however that any Node currently connected to the Network is also called a Validator, because of their potential of eventually being selected in a Consensus Group. This sets them apart from non-selectable Nodes such as Observer Nodes, or Nodes placed in a Shard's waiting list (TODO link relevant section).

The size of the Consensus Group (i.e. the number of Nodes participating in a Consensus process) is currently a configurable parameter of the Network, set in the nodesSetup.json file, although in the future it will be a parameter subjected to the Network's Governance system or it may be computed dynamically as a function of Shard size.

The behavior of Nodes in this Subround is defined by the SubroundStart component. As stated at the beginning of this section, the Start Subround is allocated 5% of the total Round time.

The Block Subround

The Leader must accomplish two tasks in this Subround: (1) produce a new Block out of oustanding information waiting to be processed, and (2) broadcast the Block to the Consensus Group. This procedure is called proposing a Block. While the Leader is busy with these tasks, the Consensus Validators are waiting.

Once the Leader completes its tasks, it is the Validators' turn do their part: they must process the proposed Block and verify it, in order to send their confirmations back to the Leader in the Signature Subround.

The behavior of Nodes in this Subround is defined by the SubroundBlock component. As stated at the beginning of this section, the Block Subround is allocated 20% of the total Round time.

Responsibilities of the Leader

In short, the Leader must pick Transactions from of its current Data Pool to be processed. The Leader may freely choose any Transactions for processing, but by default the Leader will prioritize Cross-Shard Transactions over Intra-Shard Transactions, because Cross-Shard Transactions require extra time to be completed, as compared to Intra-Shard ones. Moreover, the Nodes are programmed to be greedy, which means that the Leader will also prioritize Transactions which will yield the highest rewards.

After choosing the Transactions to be processed, the Leader executes each one in the correct order (it must respect the consecutivity of Account Nonces). The Leader then assembles a Block structure from these Transactions. The resulting Block structure is then hashed and the hash is written to its Header. Producing a Block is described in more detail in its dedicated section.

After the Block is complete, the Leader serializes it to a binary representation and broadcasts it to the rest of the Consensus Group, for validation. Note that the Block is broadcasted to the Consensus Group not as a single structure, but separately as Block Header and Block Body. The reason for this division lies in how P2P messages are broadcasted. This is subject to future optimization though. See Communicating during Consensus for details on how the Leader broadcasts the proposed Block to the Consensus Validators.

The Validators validate

During the Block Subround, the Consensus Validators await a Block from the Leader, doing nothing in particular until both components of the Block (Header and Body) have arrived. But once the components of the Block arrive, the Validators react by performing the following steps:

  1. They ensure that the information in the Header matches what they would expect, such as correct Epoch and correct Round;
  2. They identify what Transactions have been selected by the Leader for this Block, by inspecting the Miniblocks referenced in the Header;
  3. They ensure they have all the information needed before proceeding to execute the Transactions in the proposed Block, and request from their Peers what is missing (or from the Metachain as well, if needed);
  4. Once they have all the information, they iterate over the Transactions and execute them, modifying their local state;
  5. They verify whether the results of their own execution matches the Block proposed by the Leader. This includes checking all Miniblock Hashes and their Shard references, comparing the Root Hash of the proposed Block with the Root Hash produced locally, and other in-depth verifications against the proposed Block. In case of single error or mismatch, a Validator will reject the Block and will withhold its signature in the following Signature Subround.

The steps listed above are performed by the BlockProcessor component.

See Communicating during Consensus for details on how the Leader broadcasts the proposed Block to the Consensus Validators.

The Signature Subround

In the previous Subround, the Validators should have received a proposed Block from the Leader and should have processed and validated it. If a Validator has found no problems with the proposed Block, then it will notify the Leader of its approval by broadcasting a special message. To construct this message, each Validator will feed the hash of the proposed Block into a multi-signature scheme: the hash is signed with the Validator's secret BLS key in a manner that allows this signature to be aggregated later with the signatures of the other Validators. This means that the Leader will be able to gather these messages from the Validators and construct an aggregate signature by combining the individual partial signatures into a single signature, which is now verifiable with the public BLS key of each Validator that contributed to it.

The Leader spends this Subround awating partial signatures of the hash of the proposed Block. Whenever it receives a partial signature from a Consensus Validator, the Leader stores it for the next Subround, when they'll be aggregated. Once the Leader has two thirds of the signatures plus one more, the proposed Block is considered accepted and further signatures that arrive will be ignored.

It is implemented as the SubroundSignature component. As stated at the beginning of this section, the Signature Subround is allocated 60% of the total Round time.

The End Subround

The final steps of the Consensus process are performed mostly by the Leader, which must (1) aggregate the partial signatures of the Validators and add it to the Header of the proposed Block, (2) sign the Block with its own secret BLS key, and must finally (3) broadcast the Block throughout the Shard (not just to the Consensus Validators). In the meantime, the Leader commits the complete Block to its local storage.

The Consensus Validators will now await the complete Block from the Leader, and when it is received, they commit it to their local storage.

Note that the entire Shard has been listening on the Consensus process; this means that the rest of the Nodes in the Shard should now have the new Block, which they must now analyze using their Sync mechanisms

It is implemented as the SubroundEnd component. As stated at the beginning of this section, the End Subround is allocated 10% of the total Round time.

Communication during Consensus

  • Topics specific to Consensus
  • When do Nodes register to these Topics

Rewarding Nodes for work

TODO

Producing Blocks

TODO

Building a Block

TODO

Generating randomness

  • initial source: genesis
  • leader produces new random seed from prev block + own signature
  • how does a new genesis block for a new epoch affect the seed? is it calculated just the same?

Gas

  • TxFee = GasPrice ⨯ ConsumedGas
  • TxFeeLimit = GasPrice ⨯ GasLimit
  • Gas = quantum of computational work performed for the network, compensated in ERD
  • Gas is "spent" executing Transactions and also storing data
  • Describe how Gas is metered during the execution of Transactions

Gas consumed by operations

TODO

Timekeeping

Epochs and rounds

MultiversX network expresses timeline in units of rounds and epochs. Each round has a fixed time duration, decided at genesis, and consistent on all nodes in the network. In each round, inside a shard, at most one block can be added to the shard's blockchain. There can be rounds where no block is added to the blockchain, for example when consensus is not reached or when the consensus group leader is offline and there is no one to propose a new block. The consensus over a block proposed in one round is done in several steps defined for the consensus state machine and need to be completed before the round time expires. These steps have their own timeframe defined as subdivisions of the round time (subrounds). The round index is used as timestamp for a proposed block, used in the calculation of the consensus group together with the randomness source, also used to decide if a new epoch needs to be started.

One epoch contains a predefined number of rounds, expressed as "best effort". This is because it might vary a bit due to confirmation blocks used for the fork choice and block finality mechanism implemented in MultiversX. In this situation, for a shard it would be correct to say that an epoch is the time between two confirmed start of epoch blocks generated by metachain. For metachain, one epoch is composed by at least the defined number of rounds in an epoch, plus the number of rounds to agree on the start of epoch block, with the mention that once the predefined number of rounds for an epoch have passed, a newly proposed block can only be a start of epoch block. There are however some exceptions, where an epoch can have less than the defined number of rounds. One example is when one of the shards has not produced blocks for a predefined number of rounds, due to say multiple nodes being offline, in which case the shard is considered stuck by metachain and metachain needs to generate an early start of epoch block, effectively triggering nodes shuffling in an effort to change the proportion of offline nodes in the shard. At the start of an epoch, there are several actions a node needs to execute/validate, like shuffling of nodes between different shards, pruning of storage and account state for old epochs, rating update for nodes according to activity in previous epoch, synchronization with the accounts state and blockchain data in the new shard, if the node was shuffled to a different shard. Due to these time consuming events there needs to be a minimum number of rounds per epoch defined.

  • NTP and staying in sync with reality

Advancing an Epoch

TODO

Propagation of information

Each Node is constantly listening to its connected Peers for incoming information of all kind, packaged as atomic Messages. A Message may arrive to the Node because it was either:

  • sent by a Peer directly to the Node (thus the Message has a sender Node and a destination Node)
  • broadcasted by a Node to all its Peers

Most often, Messages are propagated being broadcast by a Node to its Peers. Nodes will automatically re-broadcast many of the received Messages to their Peers, to ensure the widest propagation of information. This repeated broadcasting is called Gossiping. Furthermore, to ensure efficient and orderly communication, Messages are assigned to predefined Topics (i.e. "categories" of Messages). The Node that constructs a new Message must assign a single Topic to it, based on the type of information it contains and what audience it is directed at. There are many Topics defined by the Network, but a single Node will only listen to a specific subset of them. See Topics of interest for a Node for details.

Whenever a Message arrives at the Node, it firstly assesses whether the Message is of interest or not. This is easy to verify: if the Topic of the Message is not of interest, then the Message is promptly discarded and forgotten. But if the Message is assigned to a Topic of interest to the Node, a complex process begins: the Message is passed to the specific Interceptor that handles the corresponding Topic, and it is now the responsibility of the Interceptor to deal with the Message somehow. There are multiple types of Interceptors - one for each type of information that a Message can contain.

All (?) Interceptors have one task: to put the information contained by the received Message into the Data Pool of the Node. The Data Pool thus acts as a reservoir of information for most tasks performed by the Node. After being exposed to the Network for a while, a Node will have its Data Pool teeming with Transactions, Block Headers and other types of information propagated through the Network, all arrived as Messages at some time in the past. Whenever the Node must perform some task, e.g. to produce a Block, it will use the information found in the Data Pool.

In case the Data Pool does not contain certain pieces of information required by the Node at a given moment, the Node must create Request Messages (?), which it sends to some of its Peers. If a Peer has the information requested by the Node, it packages that piece of information as a Message and sends it back directly to the Node. If the requested information is missing from the Peers that have been asked for it, they sit silently and do nothing (they don't propagate the original request further). The Node that needs information must try again with some other Peers. See Requesting information from Peers for details.

A special case where a different form of propagation of information takes place is the Consensus process. See the linked section for details.

The Topic-broadcasting functionality itself is available in the libp2p library, which the Node uses for all its peer-to-peer communication.

Gossip

A Node will automatically re-broadcast to its Peers most Messages that it receives (e.g. containing Transactions or other information), in order to propagate information throughout the Network. This is called Gossiping. Note that Direct Messages are never re-broadcast, thus they are not subject to Gossip. All other Messages are considered "of general interest", thus are broadcast to as many Nodes as possible. However, not all Peers of the Node will care about each and every Message - they will simply ignore those Messages that arrive on a Topic that doesn't interest them. Because of this selective interest on Topics, it would be a waste of bandwidth and processing power to always broadcast every Message to all Peers. Consider the following example:

  1. Message m1 arrives at Node A on the Topic Tx0. Let's say that this Message is both valid and interesting to A.
  2. A wants to propagate m1 to its Peers: they are Nodes B, C, D and E.
  3. But A knows that C and E do not care about any Message that arrives on Tx0, thus they would ignore m1 if it were to be sent to them.
  4. A knows that only B and D would accept m1.
  5. Therefore A will only send m1 to B and D, to save resources.

Thus, to make communication more efficient, Nodes will be mindful about the Topics of interest of their Peers and will actually perform selective broadcasting: a Node will exclude a Peer from the broadcasting of a Message if it knows that the Message will be uninteresting to the respective Peer, as described in the example above.

To summarize: any Message m on Topic T will be sent by Node X to a certain Node Y, if the following conditions are met:

  1. The Nodes X and Y must be Peers
  2. The Message m is valid. Invalid Messages are ignored and dropped.
  3. X must have declared interest in Topic T (see Topics of interest). Otherwise, the Message is ignored and dropped.
  4. Y must also have declared interest in Topic T
  5. X must be aware that Y has declared its interest in Topic T
  6. X chooses Y to relay the message (valid for gossipsub router implementation)

Direct Messages

Whenever a node finds out that it needs a piece of information, it will send request messages on specialized topics to n connected peers. These messages will not be broadcast further to other peers. The peers that receive a requesting message will try to resolve it by searching the piece of information in the data pool or storage. The response message will be sent on the topic that is wired to an interceptor so the components are reused. The request and response messages are called direct messages because they are sent between peers (requester - resolver) and not gossiped to the other peers.

The peers of a Node

There are different types of peers acknowledged by the node. There are known peers (discovered by the component responsible with network awareness) and connected peers. Connected peers are a subset of known peers. They are stored in a peerstore component that keep their peer ID and connection information (protocol supported, address and such).

Types of information propagated through the network

Transactions
Unsigned Transactions
Rewards
Miniblocks
Block Headers
Trie subcomponents
Requests

Requesting information from peers

As stated, whenever a node decides it requires a missing piece o information, it will send a request message to a subset of its connected peers. Those peers, if they have that piece of information, will send it back to the requester. The requests are triggered by the block processor instances as they are the most high-level components and they know what to process, what pieces of information already have in the data pool and what data it needs to fetch from the network. Technically, the components responsible with requesting/resolving are located in dataretriever package. Each requester/resolver instance is wired on a side topic. For example, suppose we have a topic called transactions_0_2 on which someone will receive only transactions between shard 0 and shard 2. This topic has a side one called transactions_0_2_REQUESTS on which one will find the requests for the transactions between shards 0 and 2. The flow can be described as:

  1. Block processor needs to process a miniblock that contains a transaction hash called txHash1.
  2. Block processor tries to fetch that transaction from the data pool but it lacks the required transaction.
  3. It then passes the request to a component called RequestHandler the transaction's hash txHash1 known to be a transaction between self shard ID and sender/destination shard ID.
  4. RequestHandler finds the resolver instance between current shard ID and provided (cross) shard ID.
  5. RequestHandler instructs the found resolver instance to fetch the required data, the latter creating a request message and send it to the REQUEST topic that was assigned on.
  6. The requester node enters in an upper bound waiting loop.
  7. One of the resolver node gets the requesting message, searches the data inside data pool and storage, and, if found, will pack that data and send it on the data topic (from the example above, transactions_0_2)
  8. The requester's interceptor that is listening on the data topic fetches the piece of data, validates it and stores it in the data pool.
  9. Block processor is notified that "something" was added to the data pool and finds out it just received the required piece of information and it will resume the processing loop. The above process also works for intra-shard data (cross shard ID becomes self shard ID and thus, the topics will resemble something like transactions_0 with its side topic transactions_0_REQUEST)

Communicating with the Metachain

Suppose the node we are analyzing is a shard node. All nodes from the MultiversX Network listens to the metablock topic. Thus, the metablock topic is a global topic. Whenever a meta block header is produced by the metachain, the block will traverse the network, each node being notified that a new metablock has been produced. After validated by the metablock interceptor, the block gets stored in the data pool. A process specific component called BlockTracker, as the name suggests, will keep track of all produced blocks (intra shard and metachain) and will change its state accordingly. This component will get to know the block's height of its blockchain notarized by the metachain (it can find out that it just produced a block on a wrong fork and thus, it will start the rollback process). Also, it will find the highest nonce produced on its blockchain (if it is in a synchronizing process) and will know many headers to request from the network. From the metachain point of view, each metachain node is wired on all shard headers topics (shardBlocks_0, shardBlocks_1, shardBlocks_2...) and will receive the shard headers accordingly. The BlockTracker component embedded in each and every metachain node will further validate the shard headers, computing the longest chain a shard did built. Whenever a new metachain block header will be produced, besides the regular information held (nonce, timestamp, signatures and so on) it will hold the shard block header hashes notarized and another list of miniblock hashes included in the notarized shard headers. These intermediate structures called miniblocks enable the inclusion if thousands of transactions in structures that are under 1MB each ensuring in the same time the security of the whole system.

Ensuring tolerance to faults and malice

The Node is designed to behave properly in adverse situations, such as malicious Peers, missing information, weak connectivity and even forks in the Blockchain or a commandeered Consensus group. Most of these situations are naturally handled in the Protocol itself, mostly due to the resilience-first design of the Metachain and how the Nodes in Shards use the information stored in it.

A critical aspect of the stability of the Network is to avoid adding incorrect or maliciously crafted Blocks to the Blockchain. Correct Blocks are produced by a Consensus group that has majority of at least two-thirds of the Nodes acting honestly. But a malicious actor may try to determine the Nodes of a Shard to accept a crafted Block by broadcasting its Block Header throughout the Shard as if it were a legitimate Block. A lot of effort went into building mechanisms that protect the Node against such Blocks.

These sections will first describe how Blockchain correctness is implemented for Shard Blockchains. Later, the Metachain will be discussed as well.

Architecture-wise, the features described here are implemented in the Block Header Interceptor (package process/interceptors) and in the Sync mechanism (package process/sync). The interaction between these two components happens as follows:

  1. The Node receives Block Headers from the Network, either through Gossip or by request
  2. A newly received Block Header is preliminarily validated by the Block Header Interceptor for integrity and authenticity
  3. After the preliminary validation done by the Interceptor, the Block Header is saved into a caching component (Cacher) owned by the Interceptor (see the headers Cacher of the HdrInterceptorProcessor struct in process/interceptors/processor/hdrInterceptorProcessor.go)
  4. The Cacher will notify the components listening to its storage that a Block Header has been saved into it; one component listening on the Cacher owned by the HdrInterceptorProcessor is the ShardBootstrap, the entry point into the Sync mechanism (process/sync/shardblock.go)
  5. The ShardBootstrap will react to any Block Header added to the aforementioned Cacher by adding the newly received Block Header into its forkDetector for analysis (see function processReceivedHeader() in process/sync/baseSync.go)
  6. The forkDetector must determine whether each received Block Header fits onto the Blockchain (according to the information held by the Node) and whether a choice must be made between conflicting Blocks
  7. Based on information from the forkDetector, the ShardBootstrap must choose whether to accept or reject the Block Header (and which of the potentially conflicting Block Headers should be accepted)

Filters in the Interceptors

The first line of defense against accepting an invalid or malicious Block are the Interceptors, specifically the Block Header Interceptor, which verifies the integrity and authenticity of a received Block Header. While integrity is easy to verify, the authenticity of the Block Header requires knowledge of the Consensus group that produced it. Retrieving this knowledge relies on the fact that every Node in a Shard can deterministically calculate which Nodes of the Shard were part of the Consensus group in the Round specified by the Block, and can also determine which Node was the Consensus Leader (i.e. the one which proposed the Block).

Calculating the entire Consensus group for any given Round is possible in the MultiversX Network because of how the randomness source for Consensus selection is calculated, described in detail in this Medium post. This randomness source has three important properties: (1) it cannot be made available to the next Consensus group until the current Consensus process ends, (2) it is deterministic and known by every Node without requiring a randomness beacon and (3) it is unique in each Round.

The Block Header Interceptor will use both the information present in the Block Header itself and information about the Consensus groups in the past to evaluate the Block Header. Thus it will reject the received Block Header for any of the following reasons:

  • The Block Header has already been blacklisted in the past (how?) (method HdrInterceptorProcessor.Validate() in process/interceptors/processor/hdrInterceptorProcessor.go)
  • The Block Header does not belong to the Shard to which the Node belongs (method InterceptedHeader.processFields() in process/block/interceptedBlocks/interceptedBlockHeader.go)
  • The Block Header contains a randomness source that has not been signed by the known Consensus Leader of the Round and Shard which were specified in the Block Header (method InterceptedHeader.CheckValidity() in process/block/interceptedBlocks/interceptedBlockHeader.go)
  • The Block Header contains a signature that does not belong to the known Consensus Leader of the Round and Shard specified in the Block Header (method InterceptedHeader.CheckValidity() in process/block/interceptedBlocks/interceptedBlockHeader.go)
  • The Block Header references an incorrect previous Block (add code link)
  • The Block Header contains an aggregated signature that does not match the signatures of the Nodes known to have formed the Consensus group of the Round and Shard specified by the Block Header (method headerSigVerifier.verifySig() in process/block/interceptedBlocks/common.go, and called in InterceptedHeader.CheckValidity() mentioned above)

Once a Block Header passes the validation checks above, it is sent to the Sync mechanism via the headers Cacher owned by ShardBootstrap (mentioned in the previous section). The Sync mechanism must now determine whether to accept this Block as valid not only on its own, but also in the greater context of the previous Blocks and the Metachain.

The Sync mechanism

The Sync mechanism is a critical component of the Node which ensures that every Node has up-to-date information from Network and the Node's currently stored Blockchain is complete and correct. It is also responsible for triggering requests for missing information and for ensuring correctnes in case of conflicting information.

For Nodes assigned to Shards, the Sync mechanism is implemented in the ShardBootstrap structure (file process/sync/shardblock.go). When a Node is started, the method ShardBootstrap.StartSync() is called, which launches a go-routine that regularily verifies the synchronization state of the Node (currently, every 5 milliseconds), and if the Node is out-of-sync, missing Blocks will be requested and any forks will be resolved. This all happens in the method baseBootstrap.syncBlock().

Detecting desynchronization is the first task of the method baseBootstrap.syncBlock(). This is done in the method baseBootstrap.ShouldSync(), and can be summarized as having the latest correct Block for the current Round. To verify this, the method starts by verifying if the current Round number is different than the number of the Round of last synchronization check (boot.roundIndex). Then, the forkDetector is invoked to analyze whether there are forks in the Block Headers known to it (which were provided by the Block Header Interceptor, see the previous section). If there are no forks, the Node simply needs to have all the Blocks up to the current Round (this check always passes for Nodes that were in Consensus). But if the forkDetector detects a fork among the received Block Headers, it must also be determined if the Node finds itself on the "correct" side of the fork or on the "incorrect" one. Being on the "correct" side of the fork implies no further action, but being on the "incorrect" side implies the need for for a roll-back to the last known "correct" state of the Blockchain (performed after returning from baseBootstrap.ShouldSync() back to baseBootstrap.syncBlock()).

After finishing fork detection (and eventual roll-backs), the method baseBootstrap.ShouldSync() must continue the synchronization process by requesting missing Block Headers from the Network, processing them and commiting them to the Blockchain in its local storage.

Once the synchronization process ends, it will be no longer required during the current Round: all calls to baseBootstrap.ShouldSync() will return false.

Fork resolution

The forkDetector (structure baseForkDetector in process/sync/baseForkDetector.go) is responsible for detecting conflicts among received Block Headers and must determine which Block should be accepted in case of a conflict.

Most forks are short and trivial, and naturally appear as a result of network latency or disconnections.

  1. If the Node receives the Block Headers of two different Blocks but which have the same Nonce, produced in different Rounds, it means that for some reason a Consensus group could not propagate its correctly produced Block fast enough through the Shard, so after the Round expired, a new Consensus has produced another Block instead (same Nonce, different Round). But after some time the first Block arrives at the Node, creating a conflict. In this case, the forkDetector will choose the first Block to be correct (earliest Round for the same Nonce). This is subject to change, though, and this criterion will be replaced with The Longest Chain criterion once development progresses, because the k-finality principle also applies to Shard Blockchains.
  2. If the Node receives the Block Headers of two Blocks with the same Nonce, produced by the same Consensus group in the same Round, it means that the Leader is not a single Node, but a cloned one (two machines using the same BLS key, selected to be Leader of Consensus). In this case, they will produce very similar Blocks (might even contain the same Transactions), but due to network latencies, the Leaders will receive the signatures from the rest of the Consensus group in varying order. Even if they receive enought signatures, differences in the order in which the Leader clones received the signatures is enough to produce Blocks with different hashes. Since these Blocks are functionally identical, they are both correct, but the Block with the numerically lowest hash will be chosen as "correct" of the two.

For other cases, the forkDetector will rely on the Metachain to determine "correct" Blocks, based on the k-finality principle. Note that the implementation of the forkDetector is currently in development, and will change once a new component will be finalized (the BlockTracker component).

Tolerance to faults and malice in the Metachain

The Metachain is far less vulnerable to forks, due to the configured size of the Consensus group, which is the entire Metashard. Still, trivial forks may appear due to network latencies, which are resolved by applying the same Longest Chain principle. Fork resolution on the Metachain is in development.

Protecting agains a Shard takeover

The primary defense against Shard takeover and malicious collusion among Nodes is the Shard-Shuffling mechanism, which moves a proportion of the Nodes in a Shard to other Shards, randomly, while bringing in Nodes from different Shards. This process happens at the beginning of a new Epoch. This process is currently in development.

When the majority of a Shard has been compromised during an Epoch, a remaining honest Node may detect this issue and send a Fisherman challenge to the Metashard. This process is currently in development.

   


Info to integrate:

  • ☐ Synchronization: at the beginning of each round, the Node checks if it is synchronized with the rest of the peers. Synched = has all the current pieces of info (blocks, block headers, nonces etc) and can move on to processing. Not synched = must start requesting information from the Network.

Notes:

  • Narrative entry-point The Node is ready and connected to the network. It has been busy processing transactions, proposing and validating blocks for some days now. Let's see, in more detail, what happens to a transaction once it arrives at the Node, from initial validation until being saved in a final block on the blockchain, notarized by the Metachain.