~38 min read
Solana Consensus - From Forks to Finality
Introduction
Consensus is the essential ingredient that every blockchain is built on. It ensures that transactions, whether for spending coins or executing smart contracts, are properly validated and executed without a central authority. There are many ways to design and build a consensus protocol. In this blog post, we will give a detailed introduction to the Solana consensus protocol. This blog post is written for people who have foundational knowledge about how Proof-of-Stake (PoS) blockchain algorithms work. It provides an explanation of the Solana consensus process, including code references of the Solana Agave validator.
Why are we writing a blog post about the Solana consensus? The Solana consensus algorithm is not well documented at the time of writing, with only incomplete or outdated information being available. Right now, Solana follows the “code is law” approach, meaning that the consensus algorithm is defined in the validator codebase. Other documentation (e.g. the Solana whitepaper) contains outdated information. Compounding this, Discussions around slashing for consensus-breaking behavior — like those in SIMD-0204: Slashable event verification — are gaining momentum.
As active participants in the ecosystem, we recently launched our own validator and recognized the need to bridge this knowledge gap. So we took the time to dive into the codebase again and document the Solana consensus algorithm. In this article, we go through the validator process in consecutive steps, linearizing everything in the validator that is consensus-related.
References to the Agave Repository are placed in blocks such as this one.
TL;DR
The Solana consensus algorithm is a Proof-of-Stake (PoS) blockchain where a designated leader is chosen for each slot to create a new block. This leader sequence is randomly determined in advance before the start of an epoch (a fixed period of slots). Blocks consist of the proof of history chain, a sequence of cryptographic hashes, where every hash is computed based on the previous hash and transaction data. This serves as proof of passed time until enough hashes are computed so the block is finished and the next leader builds on top of it. For a block to be confirmed, it must receive votes from validators representing two-thirds of the total stake. Finalization (the point at which a transaction is considered irreversible) occurs after two-thirds of the stake has sent 32 subsequent votes. In the case of forks (where multiple chains exist simultaneously), the TowerBFT mechanism encourages validators to vote for the heaviest fork (the one with the most stake-weighted votes) by using lockouts, which are penalties for switching chains. This ensures that the consensus converges on the chain with the most votes.
Consensus
Blockchain Consensus
A blockchain is a decentralised digital ledger that immutably records transactions across multiple computers, meaning that they cannot be changed or deleted. This ensures data integrity and prevents any alteration without consensus from the entire network. Blockchains are the foundation of cryptocurrencies and self-executing programs known as smart contracts.
In a decentralised system like a blockchain, there is no central authority responsible for decisions like confirming new transactions. Instead, transactions are recorded and executed across multiple nodes in the network. The challenge is to get all these nodes to agree on the state of the blockchain without a central coordinator. In Proof-of-Stake blockchains, the nodes are called validators.
A consensus mechanism is a protocol that allows blockchain networks to achieve agreement on the current state of the blockchain. It ensures that all participants in the network agree on which transactions are valid, maintaining the blockchain’s integrity and security.
In particular, consensus ensures:
- Prevention of Double Spending: Once a coin is used in a transaction, it cannot be reused elsewhere. This prevents digital currency from being copied and spent multiple times.
- Correct Execution of Smart Contracts: Transactions that change the state of a smart contract are executed as intended, according to the conditions that are encoded in the smart contract.
Types of Blockchains
While there are lots of different decentralized ledger protocols, blockchains are the most popular kind of distributed ledger. The basic structure of a blockchain is append-only, starting with the first block called the genesis block. This means that blocks that contain new transactions are appended to the existing chain of blocks. To be clear about what the previous block is, the appended block contains the hash of the previous block.
If two different blocks are created and appended to the same block, we get two contradicting chains (known as forks). To solve this problem, we need rules under which a node is allowed to append a block and, in the case of forks, we need a decision criteria for the valid fork. The two most common kinds of blockchain protocols to solve this problem are Proof of Work and Proof-of-Stake. Solana uses Proof-of-Stake. Here is a comparison of them:
Proof of Work (PoW) | Proof of Stake (PoS) | |
---|---|---|
Network Nodes | The nodes are called miners. | In PoS, the network nodes are called validators. |
Description | In PoW blockchains, nodes compete to solve cryptographic puzzles. The first to solve the puzzle gets to add the next block to the chain. This means that miners with more computing power are more likely to append block to the blockchain. | In PoS, the more stake a validator possesses, the more likely it can append a block to the chain, i.e. a validator is chosen based on the amount of stake it has. Stake refers to the amount of cryptocurrency a validator locks up as collateral to participate in the consensus process. This stake acts as a financial incentive for validators to act honestly, as they can lose part of it (a process known as slashing) if they attempt to cheat the system. |
Consensus Chain Rule (If there are several forks, which one is the valid chain?) | In the event of a fork, the longest chain is the consensus chain. | To determine the consensus chain when there are multiple forks, PoS usually uses the “heaviest fork” rule. This means that the fork with the most stake-weighted votes, or (depending on the concrete implementation) the fork that most other validators append to is selected. |
Advantages | PoW protocols are simple. | Energy Efficiency: PoS does not require the extensive computational resources needed for PoW, making it cheaper to operate and more environmentally friendly. Speed: PoS can process transactions faster than PoW, as it doesn’t rely on solving complex puzzles. |
Disadvantages | Requires significant computational power, making it energy-intensive. | Implementations for PoS protocols tend to be quite complex, have lower liveness resilience and have difficulty bootstrapping from a low token valuation. |
Theoretical Background: The CAP Theorem
Blockchain consensus usually comes with vulnerabilities. There is a natural limit of “how good” a blockchain protocol can be, i.e. the designer of a blockchain protocol cannot have everything. The CAP theorem summarizes this restriction. CAP stands for consistency-availability-partition. The CAP theorem is a fundamental principle that demonstrated the limitations of distributed systems (and hence also of consensus protocols). It highlights that no consensus protocol can achieve perfection under network partition conditions.
The CAP theorem states that a system can only provide two out of the following three guarantees simultaneously:
-
Consistency: Every participant sees the same system state at any time. An example of a blockchain without consistency would be that it becomes possible to double-spend coins or to reverse smart contract executions.
-
Availability: The system is always operational and accessible. For instance, a blockchain without availability means that it stopped, i.e. no more transactions are accepted because no more blocks are finalized.
-
Partition Tolerance: The system continues to run even when network partitions occur (i.e., some participants cannot communicate with others).
The CAP theorem proves mathematically the impossibility of achieving all three properties simultaneously. This means every consensus protocol must make trade-offs, particularly during network partitions. Solana’s consensus algorithm prioritises consistency over availability during network partitions. For a visual explanation of the CAP theorem, refer to this illustration.
Block Production on the Solana Blockchain
After this intro on the nature of blockchain consensus algorithms, let’s jump into how the Solana consensus algorithm works. Very broadly, the validator processes can be split into two areas: (1) Producing new blocks and (2) other validators voting on blocks. Blocks are only accepted in the network if sufficiently many validators have voted for that block. This ensures that the network selects the consensus block if multiple conflicting blocks were produced. Let’s start with how the block production works…
The block production process is controlled in the Transaction Processing Unit (TPU)
and replay_stage.rs
. 1
1. Determining the Leader Sequence
Solana’s block production relies on a random but fixed sequence of leader validators. The leader sequence determines which validator is responsible for producing new blocks with new transactions during a specific period. The more stake a validator has, the more likely it is to be chosen as a leader.
leader_schedule.rs
defines the leader sequence for the epoch.2
In Solana’s protocol, two important time intervals relate to block production:
-
Slots: These are the time units where validators take turns producing blocks. Each slot can produce one block, with each slot lasting 400 milliseconds.
-
Epochs: These are longer periods during which the leader schedule remains fixed. The transition between epochs occurs when a new block crosses the epoch boundary, with each epoch lasting approximately two days. The leader sequence for the next epoch is defined at the start of the current epoch. The amount of delegated coins (stake) in fixed for one epoch and is only changed during epoch transition.
For instance, the leader sequence, that allows the validators to produce a block in the respective slot, could look like this:
In Solana, each leader is assigned a fixed number of NUM_CONSECUTIVE_LEADER_SLOTS
= 4 slots to produce blocks.
Why should the leader sequence be random? Randomising the leader sequence ensures every validator has an opportunity to produce a block, which guarantees decentralisation. The sequence is determined using block data from the previous epoch as a source of pseudo-randomness.
The fixed leader sequence has advantages and disadvantages concerning the threat of DoS attacks:
-
Advantages: Predictability allows validators to reject blocks from non-leaders, countering potential DDoS attacks based on whitelisting.
-
Disadvantages: Scheduled leaders are known to everyone so it is possible to plan a DoS attack to target a future leader.
2. Receiving and Verifying Transactions
Before and during a validator’s assigned leader slot, it receives transactions from RPC (Remote Procedure Call) servers and other validators to include them in the next block. RPC servers have the task of enabling communication between clients and the network of validators. If a user spends some of his Solana coins, they send it to an RPC server, which forwards them to the validator. Validators use peer-to-peer communication to forward new transactions to each other which is called gossiping. It is called gossiping because its goal is to spread new information as quickly as possible.
After transactions are received, they are verified. This process includes (among others) verifying the transaction signature and the account balance. Note that this is a very early verification, just enough to verify that a transaction is likely fine to include in a block. It does not guarantee that a transaction execution will be successful! If a transaction is not included within one of the upcoming blocks, it expires. In practice, this expiry window is around 2 minutes right now. An expired transaction cannot be included in a block any more. In this case, the sender has to create a new transaction.
The transactions and votes are received in the fetch_stage.rs
. The verifying process is defined in sigverify.rs
and banking_stage.rs
.3
3. Block Creation with Proof of History
The Proof of History (PoH) hash chain is a feature specific to Solana and absent in most other blockchain protocols. It combines with Solanas PoS mechanism of creating a random stake-based leader sequence.
What is Proof of History?
The Proof of History hash chain is a cryptographic technique designed to establish a verifiable passage of time within the blockchain, i.e. a cryptographic clock. PoH is not standalone but works alongside Solana’s Proof of Stake (PoS) to compose the consensus protocol. PoH operates by creating a sequence of cryptographic hashes, where each hash verifies that a specific amount of time has elapsed since the previous one. This process uses the SHA256 algorithm due to its widespread optimization and availability across various hardware platforms. Each hash computation necessitates the previous hash value. This ensures that no parallelization is possible. Since each hash operation takes a minimum amount of time, we can make some assurances as to the time a hash was generated. By incorporating a data point (such as a transaction) along with the previous hash to compute the current hash, PoH can prove that this data point existed at the time when the hash was computed. This method ensures a verifiable link between the sequence of transactions and the passage of time.
Block Production and PoH
After receiving transactions, the validator has all the necessary data to start block production. In Solana, a block consists of the PoH sequence for one slot, including the incorporated transaction data. The leader of the current slot is responsible for producing these blocks or PoH chains. In case the leader of the previous slot (or multiple leaders of the previous slots) do not publish a block, the current leader has to create the PoH sequence for the skipped slots. This is why the PoH recorder always runs in the background: If the current leader’s slot starts but it hasn’t received the previous block, it can still produce the block on time because it precomputed the PoH sequence. In this precomputation, the block is assumed empty, so this is only a valid precomputation if the previous leader was indeed offline.
The creation of new blocks is defined in poh_recorder.rs
. The structure of an Entry is defined in entry.rs
.4
Block Structure
The following image shows the structure of four consecutive blocks, each block containing two entries. The leader of the third slot is offline. Consequently, the leader in Slot 4 has to compute the PoH hashes for Slot 3, which acts as a placeholder for the missing block in Slot 3.
Every block must contain 12,500 hashes which are grouped into multiple entries. An entry is a sequence of hashes and a Merkle root of transactions. Including only the Merkle root is a way to save storage while allowing proof that a transaction is included in a block. Every transaction consists of a message, a recent block hash and the originator’s signature. The block hash is the last hash of the last entry in the block.
Which Problems does PoH solve?
-
Eliminates Reliance on Time Servers: By generating a chronological record of events through concatenated hash values, PoH timestamps transactions independently, negating the need for nodes to communicate to agree on time.
-
Incentivizes Inclusion of Blocks: The necessity of computing hashes makes it difficult to skip slots, promoting the inclusion of blocks from every leader and preventing the premature building of new forks. Block production should not be delayed, and with PoH a validator can prove that it gave the previous validator enough time to produce and send the block. PoH makes sure that the current leader can skip previous leaders but at the same time has no incentive to dishonestly do so because computing the previous leader’s PoH sequence by himself would not give an advantage in finishing its block earlier.
While PoH ensures a tamper-proof record of time, PoS complements it by addressing naughty validator behaviour, together forming a robust and secure consensus protocol for Solana’s blockchain.
A little side note: PoH builds on the assumption that no validator can produce the PoH sequence significantly faster, limited by the processing speed of chips. Besides the SHA256 implementation that is currently built into the validator, some implementations are multiple times faster, potentially undermining security.
Fork Creation
If a validator decides to skip previous blocks (e.g. because the block from the previous leader did not arrive at the current leader), it risks forking, which means that there are several possible versions of the blockchain now.
Forks are the reason why slot height and block heigt are not the same:
- Slot Height is the number of the current slot if we start counting at the genesis block.
- Block Height is the number of blocks that were included in the current fork from the start of the genesis block.
In general, a validator that has different choices of which fork to build on chooses the heaviest fork, i.e. the one with the most stake-weighted votes from other validators.
4. Distributing Blocks
The blocks are distributed in broadcast_stage.rs
using a distribution protocol called Turbine which has the goal that every block is transferred to other validators in a quick and error-tolerant manner.5
After block production, the blocks are distributed and sent to other validators. For easier transmission, blocks are split into so-called shreds. How exactly this works is another great topic for a blogpost, but isn’t related to consensus.
Validating and Voting on Blocks
The Transaction Validation Unit (TVU)
and replay_stage.rs
control the block verification and voting process.6
Why Do We Need Voting? Producing a block does not automatically make it a consensus block because there might be different forks of the blockchain. We need to have a mechanism that determines the consensus fork. This is achieved through votes from validators in the network. The more delegated stake (coins) a validator has, the more weight its votes carry. This is known as stake-weighted voting. In a delegated PoS blockchains like Solana, coin owners can assign their ability to participate in the validating process (= delegation) without transferring ownership of the coins. Through voting, a validator signals exclusive support for a specific fork, excluding competing forks. In addition, the validator guarantees with its vote that the validator considers the ledger valid up to the current block. On Solana, votes are normal transactions, that are also included in blocks. The votes can be sent and received in the following two ways:
- As a single loose vote that is transferred via gossip between validators.
- As a replay vote: This means that a vote has landed, i.e. it was included in a block.
The votes indicate a validators commitment to a block. There are different statuses to describe how committed a single validator is to a block:
Validator Commitment Level | Description |
---|---|
Frozen | Frozen means that the block has been successfully replayed (i.e. checked and verified) by the validator. |
Voted | The validator decided to vote for the block, committing to the block and excluding competing blocks. |
Rooted | The validator has sent 32 subsequent votes on the same fork. This means that the block has reached the maximum lockout of 32 (please refer to section “Lockouts with TowerBFT Protocol” for how the lockouts work). Once a block is rooted, the validator cannot switch away from it any more. |
The cluster is a term for describing the network of validators that build on the same Genesis block. Zooming out from an individual validator to the whole cluster, there are different stages of how ‘accepted’ a transaction is in the cluster, known as commitment status:
Cluster Commitment Level | Description |
---|---|
Processed | Transaction is included in a block (and the validator that was queried via RPC must have cast a vote on the block). |
Optimistically Confirmed | Two-thirds of the stake-weighted votes are for the block containing the transaction. |
Finalized | Finalized means that the block has been rooted by two-thirds of the cluster, which can happen after 32 blocks but not earlier. |
The most important commitment level is ‘finalized’. A transaction in a finalized block is impossible to undo without violating the consensus protocol. Some RPC clients, however, use the ‘confirmed’ commitment status to already get a good indication of a transaction’s state.
Why is Confirmation Not Enough? Confirmation alone is not enough to maintain consensus. A validator in the Solana network can switch forks and ‘retract’ a vote through a lockout period, meaning that a block with two-thirds of the votes may not end up in the consensus chain if some validators maliciously double-vote. To ensure a block remains in the consensus chain, it needs to reach finalized status. After 32 blocks, retracting a vote becomes infeasible, solidifying the block’s place in the chain.
Why Is a Two-Thirds Majority Necessary? A block is confirmed if it receives at least two-thirds of the stake-weighted votes. Now assume that 50% of these votes (which are 33% of the stake in total) come from malicious validators who also vote for Block B in a different fork. These malicious validators undermine the rules but we still want to maintain consensus. If two-thirds of stake-weighted votes are necessary for block confirmation, and we assume that no more than 50% of validators are malicious, Block A retains the majority of votes, even if the remaining one-third of the validator stake (which did not vote on Block A) votes for Block B, ensuring consensus is maintained.
Needing a two-thirds majority for block confirmation, on the other hand, means that if more than one-third of the validators are offline or do not vote, the blockchain can come to a halt because no blocks can be confirmed any more. This is why the threshold of 1/3 is called a superminority.
Note that this is a choice. It is perfectly reasonable to choose slightly different values for optimistic confirmation or consensus rules, Solana just happened to choose these.
The threshold for the percentage of stake that needs to vote on a slot is defined in consensus.rs
as VOTE_THRESHOLD_SIZE
= 2/3. SUPERMINORITY_THRESHOLD
= 1/3 is defined in replay_stage.rs
.7
Optimistic Confirmation
Solana uses optimistic confirmation to evaluate the commitment status of a slot. The core idea is that once a block receives votes from validators representing over two-thirds of the total stake, it becomes optimistically confirmed and is unlikely to be reverted unless a validator is slashed for misbehaviour. For a Block D to be optimistically confirmed, it needs two-thirds of the votes. For optimistic confirmation, it does not matter how the votes were received. This means votes received through the gossip protocol, even if not yet included in a block, are used to determine the commitment status. If we can replay the blocks from an ancestor Block B to Block D, the ancestor will also get optimistically confirmed. The optimistically confirmed blocks are are confirmed before their votes are rooted, building on the assumption that gossiped vote transactions will eventually be included in a block.
Now we know about the different states of voting that a block can be in. Let’s examine all steps that a validator performs to decide which blocks to vote for…
Optimistic confirmation is implemented in optimistic_confirmation_verifier.rs
. cluster_info_vote_listener.rs
verifies and processes votes from other validators. It outputs optimistically confirmed slots and statistics about the voting process. window_service.rs
collects, verifies and stores shreds, which also includes handling conflicts/duplicates.8
1. Receiving Blocks and Votes
Block validation begins with validators receiving new blocks and votes from other validators. A vote is a transaction calling the vote program, including the hash of the bank to vote on, which is computed with the block hash and the current state of the bank. The bank maintains the current state of all accounts, including balances and smart contract states.
Validators receive new blocks in shred_fetch_stage.rs
and gossip transactions (including votes) in fetch_stage.rs
. New blocks are stored in blockstore.rs
. In its current implementation, blockstore can only store one block per slot, leaving room for only one if there are duplicate blocks.9
2. Validating Blocks
The validation process of a block includes verifying block metadata and recomputing the PoH hashes. Computing the hashes sequentially (like during the PoH creation) would take a lot of time. This is why the validator splits the PoH hash chain into several pieces. Recomputing each of these pieces can be parallelized, hence making the verification process faster than the original PoH creation process.
After recomputing the PoH hash chain, it verifies and replays all transactions from the blocks and updates the bank. Verifying and replaying the votes is very similar to transaction verification in leader mode.
Transaction verification and replay coordination occur in replay_stage.rs
. As the name implies, it replays all transactions from a block, i.e. checking transactions against account balances and re-executing smart contract programs to validate correct execution. Other steps include accounts_hash_verifier.rs
for calculating account hashes, sigverify_shreds.rs
for verifying the leader’s signature and retransmit_stage.rs
for helping the network to retransmit shreds to other validators that thay did not fully receive.10
3. Fork Choice
Partition Detection and Resolution
Solana aims to prevent forks, ideally maintaining a sequence of one block per slot. Each block in Solana includes a pointer to its parent, ensuring that a block can only be appended to one fork. If validators work perfectly and no network issues occur, there should be no partitions. However, issues like validators going offline, network problems, or malicious behaviour can cause a validator to skip a block and append its block to a non-direct ancestor, creating a partition with multiple versions of the chain.
Heaviest Fork Rule
To determine the valid chain in case of a partition, Solana uses the heaviest fork rule. This decision rule selects the fork with the heaviest block. To compute the weight of a block, all forks starting from that block are considered using the heaviest subtree fork choice rule. This rule evaluates the block with the subtree that has the highest accumulated weight of votes. Validators calculate the total stake-weighted votes for each subtree and select the one with the most votes. This ensures that the chain with the most support from validators continues to grow. If there is a tie between two blocks, validators favour the earlier block.
heaviest_subtree_fork_choice.rs
implements the fork choice concerning heaviest fork and duplicate slots. Partitions are detected in replay_stage.rs
by the function is_partition_detected()
, which checks if the last voted slot is not an ancestor of the heaviest slot.11
Handling Duplicate Blocks
The supermajority threshold for duplicate slots is defined as DUPLICATE_THRESHOLD
in replay_stage.rs
and set to 52% by default.12
Partitions can also arise from duplicate slots, where a faulty or malicious leader produces multiple blocks for the same slot. These duplicate slots are marked as “invalid fork choice”. This means that a validator ignores the fork and attempts to vote for a different fork. However, if a duplicate slot is voted on by 52% of the stake of validators, it is marked as a “duplicate confirmed” and can be considered again as a fork to vote on. Even though the term is a bit ambiguous, this has nothing to do with the commitment status being “confirmed”. A duplicate block still requires 2/3 of the votes to get an “(optimistically) confirmed” commitment status.
For duplicate confirmations, votes are treated a little differently compared to optimistic confirmation: A replay vote also counts as a vote on all its ancestor blocks to determine the duplicate confirmation status. Gossip votes, on the other hand, just count for the exact block they are meant for.
The following is an example of duplicate blocks in Slot 2 and 3. All red blocks are marked as invalid fork choice because they are duplicate blocks or descendants of a duplicate block. Why accept duplicate blocks at all? A single validator might not know that a block is a duplicate. Network partitions could lead to situations where a subset of validators see block B1, and another subset see block B2. After reconnecting, they need to figure out what to do. Allowing to keep building on one of the two versions makes the network more available when duplicate blocks occur.
4. Voting
Now we know how validators select a fork to vote for. However, there are a couple of conditions for voting. If these conditions are not fulfilled, the validator cannot vote and has to wait. The following four conditions can temporarily keep the validator from voting…
The function make_check_switch_threshold_decision()
in consensus.rs
makes the decision about how to deal with a given fork. The votes are generated in the function generate_vote_tx()
in the reply_stage.rs
(when validator is currently the leader) and in record_vote()
in consensus.rs
(when validator is not the leader). In replay_stage.rs
, the different failures that can happen when attempting to vote on the heaviest fork are defined as HeaviestForkFailures
: LockedOut
, FailedSwitchThreshold
, FailedThreshold
, NoPropagatedConfirmation
13
4.1 Lockouts with TowerBFT Protocol
Validators vote using Solana’s Tower Byzantine Fault Tolerance (TowerBFT) protocol. The goal of TowerBFT is to give an incentive to validators so that the consensus converges around one fork. TowerBFT uses the Practical Byzantine Fault Tolerance algorithm and combines it with PoH. Byzantine Fault Tolerance (BFT) is the name for a family of protocols used for voting in distributed systems where some components may fail or act maliciously. The term comes from the Byzantine Generals Problem, which illustrates the difficulties of achieving consensus in a network with unreliable nodes.
But what is TowerBFT?
TowerBFT introduces lockouts for voting. — If you decide to vote for one fork and decide to switch to another fork later, you are locked out from voting for a certain period before the validator can submit a vote again. The Tower is a data structure that keeps track of a validator’s votes and the corresponding lockouts. It’s essentially a list of tuples with slot numbers and their corresponding lockout period. If a block is in the Tower, this means that the corresponding validator cast a vote on that block. The purpose of the Tower is to let every validator in the network commit to a certain fork while also leaving open the possibility of changing that decision and switching to another fork. If a validator votes on the wrong fork, it will get locked out and lose its ability to vote for a while. This prevents validators from quickly switching votes between forks, thereby enhancing the network’s security and making double-spending attacks more difficult.
This stack of votes for subsequent blocks is called the Vote Tower. The Tower is defined in consensus.rs
.14
Here is an example of how the Tower may influence the possibility of voting when the validator decides to switch the fork:
The main idea behind the lockout mechanism is to double the lockout period for every vote in the same fork, so voting on the wrong fork results in an exponentially growing lockout period. Votes for one fork are stacked in the Tower. Lockout periods are only doubled if the Tower reaches a new height. Votes reach maximum lockout after 32 votes and are dequeued, collecting credits (that can trigger rewards when the epoch ends). If the validator wants to vote on a different fork, it needs to roll back (pop the most recent votes from the Tower) and wait until enough blocks have been produced to fulfill the lockout period. If a validator is locked out, it has to wait for more blocks in the heaviest fork to be produced until it can vote again. This is where PoH comes in handy because it restricts the speed by which new blocks can be appended to the chain.
4.2 Failed Switch Proof
If the last vote was on a different fork than the currently heaviest fork, the validator will attempt to switch and vote for the heavier fork. However, the heaviest fork needs to have a switch proof, which shows that the current fork will never reach finality. If the switch threshold is not fulfilled, the validator has to continue voting on the current fork.
This switch proof means that certain slots need to reach the switch threshold, which is 38% of stake-weighted votes. There are certain slots (so-called candidate slots) that are used to compute the switch threshold. The rule for a slot to be a candidate slot is this: Get the greatest common ancestor of the last vote slot (the slot with the validator’s most recent vote) and the candidate slot. The switch slot (the slot on the heaviest fork that the validator wants to vote for) is a descendant of this greatest common ancestor. See the following image for an illustration:
The switch fork threshold value of 38% = 1/3 plus a random value of 4.66% which is the malicious buffer for optimistic confirmation. This means that if an optimistically confirmed block is rolled back, 4.66% or more must have committed a slashable offense.
The is_valid_switching_proof_vote()
method in consensus.rs
decides whether a slot counts towards the switch threshold.15
4.3 Failed Threshold
The TowerBFT lockouts and the switch proof are relevant when the validator switches the fork. There are other restrictions, namely the threshold and the propagation confirmation that need to be fulfilled for voting in any case.
Failed threshold means that the heaviest fork lacks sufficient stake (two-thirds of stake-weight) throughout the last eight blocks in the tower. Without this check, a validator could get locked out in the case of a partition (i.e. a different fork than the one last voted for is the heaviest fork). This check makes sure that the validator’s votes do not ‘run too far ahead’ without other validators catching up, which would cause long lockout periods. That’s why this checks the last eight blocks and not just the current block. If the validator’s latest vote is affected by a failed threshold, voting is not possible. Once a block fails the threshold check, it will never attain the threshold again. Consequently, the validator has to wait until a child garners more votes (or until the lockout will expire to vote on a different fork). The vote-depth of eight blocks is the default value and can be changed by the validator operator.
consensus.rs
defines the threshold in VOTE_THRESHOLD_SIZE
= 2/3 and the depth of eight slots is defined in the VOTE_THRESHOLD_DEPTH
constant.16
4.4 No Propagation Confirmation
The is_propagated
field in PropagatedStats
in progress_map.rs
keeps track of whether a block is propagated.17
A validator will not vote for a fork unless a superminority threshold (one-third) has received its last produced block during leader mode. In other words, a leader slot is propagated when the validator stake acknowledging the slot exceeds SUPERMINORITY_THRESHOLD
. Another validator can confirm that it received a block through (1) sending a vote for that block or (2) through a frozen indicator which is sent through gossip. This frozen indicator means that a validator replayed a block but may not have voted for that block (e.g. if the validator voted on a conflicting block).
This ensures a large part of the network has received the validator’s block, preventing voting during network partitions. For a non-leader slot, its propagation status depends on its most recent ancestor leader slot. If that ancestor leader slot is propagated, then the non-leader slot is also considered propagated.
5. Sending Votes
After deciding which fork/block to vote for, and if all voting conditions are fulfilled and nothing keeps the validator from voting, the vote is sent to the other validators (especially to the upcoming leaders) through gossip.
The votes are sent via the gossip protocol to the next upcoming leaders in send_transaction_service.rs
.18
Validator Economics
So far, we talked about what a validator should do to produce new blocks and vote for blocks to get them finalized. But what gives the validator in incentive to follow the consensus protocol? Validator operators do not just work from goodwill but are rewarded for honest behaviour. While validator economics isn’t directly part of the consensus process, it sets incentives for validators to execute the consensus protocol correctly. This makes economics an important part of building a secure and smoothly running consensus algorithm.
The block reward structure is defined in bank.rs
under update_fees()
. The inflation commission is defined in bank.rs
under calculate_previous_epoch_inflation_rewards()
. Voting fees are defined in transaction_cost.rs
under SIMPLE_VOTE_USAGE_COST
.19
Validator Income
A validator node has the following sources of income (for a more detailed list, check out this blog article):
- Block Rewards = Incentive for Block Production: Validators that are assigned as the leader for a given block receive rewards, known as block rewards, when their block ends up in the finalized chain. These rewards are composed of 50% of the base fee and 50% of the priority fees, with the other half being burned. The allocation of block rewards incentivizes validators to become leaders and produce blocks, ensuring active participation in the network.
- Inflation Commission = Incentive for Voting: Inflation, approximately 1.5% per year, is distributed relative to the validator’s stake and their voting regularity. Validators receive the reward at the end of the epoch and a validator is rewarded for every vote that roots a new block. This is an incentive to vote on the heaviest fork and to contribute consistently to consensus because their rewards depend on their voting behaviour.
- MEV Rewards: There are 3rd party MEV providers on Solana, most notably Jito. Those pay a Validator an extra tip if he includes transaction-bundles in the produced blocks. This isn’t part of the core Solana protocol, and not examined further here. In practice, the tips are approximately the same as block rewards as of end-of-2024, but this changes with network conditions and use.
In leader mode, validators are incentivized to include as many transactions as possible and to include any previous blocks in block production. The previous leader will probably get more votes because it published the block earlier, thereby having a higher likelihood of ending up in the finalized chain.
A validator has an incentive to send its votes later than other validators, thereby having a higher likelihood of its votes ending up in the finalized chain. To counter this, Timely Vote Credits reward validators who vote early to counteract the validators who vote late.
Slashing
Slashing is the process of punishing validators for performing illegal operations in the network (e.g. producing two different block versions of the same slot or voting for different forks which is not allowed due to lockouts). While not currently implemented, there is a proposal to introduce slashing in Solana. If implemented, publishing duplicate blocks or conflicting votes will be punished. In its current state, Solana can be considered a manually slashing blockchain because validators can collectively decide to exclude a malicious or faulty validator.
However, the future of slashing in Solana remains uncertain. The concept is unpopular among users who stake their coins, as they could potentially lose their stake. Additionally, the network has been functioning effectively without slashing, reducing the immediate need for its implementation. But there has been recent pickup in activity around a new proposal. The first parts, proving that slashable events have taken place on-chain, is most complete and currently under active discussion as an Solana-Improvement-Document: SIMD-0204: Slashable event verification.
Incentive Compatibility
In very simple terms, incentive compatibility means aligning validator interests with the network goals. In theoretical computer science and game theory, incentive compatibility refers to designing systems where the best strategy for every participant is to follow the protocol. For blockchains, this means ensuring validators are rewarded for honesty and punished for dishonest behaviour.
The TowerBFT mechanism, responsible for voting on blocks, is incentive-compatible. It is hard to prove full incentive compatibility (and there might be parts in the Solana protocol that are not fully incentive compatible). While a lack of full incentive compatibility opens the theoretical attack surface, the practical implications seem to be negligible for the Solana blockchain. Solana has prioritised practical solutions, such as bug fixes and optimizations, over formally verifying their consensus algorithms. While some blockchains aim for 100% incentive compatibility, Solana finds it sufficient to achieve practical compatibility, ensuring the network runs smoothly.
Congratulations! Now you should have an overview of how the Solana consensus algorithm works. In the past sections, we talked about (1) how validators produce new blocks, (2) how validators finalize blocks through voting, and (3) the reason why validators should follow all these steps (incentivization through validator economics).
Summary
That was a lot of information! That’s why we have a quick recap for you with all validator rules concerning block production and voting for blocks:
Block Production in Leader Mode
- Build on the heaviest fork.
- If the validator voted on a different fork, but the heaviest fork does not have a switch proof, the validator will build on the previous (not the heaviest) fork.
Voting in Validation Mode
- Set Root: Identify the latest block which reached maximum lockout on the respective validator. This is the root that all active forks start from.
- Choose the Heaviest Fork:
- Same Fork: If the heaviest fork is the same fork that the validator voted for recently, continue voting on it (
SwitchForkDecision::SameFork
). - Different Fork: If the validator voted on a different fork previously, attempt switching to vote on the heavier fork:
- Successful Switch: A switch proof is needed, which means that at least switch-fork-threshold (38%) percentage of votes called must support the switch (
SwitchForkDecision::SwitchProof
). - Failed Switch Threshold: If the switch fails, continue voting on the fork that the validator previously voted for (
SwitchForkDecision::FailedSwitchThreshold
).
- Successful Switch: A switch proof is needed, which means that at least switch-fork-threshold (38%) percentage of votes called must support the switch (
- Same Fork: If the heaviest fork is the same fork that the validator voted for recently, continue voting on it (
- Handling Duplicate Slots:
- Slot is Duplicate Confirmed: If a block has a majority (52%) of votes, reconsider the fork as part of the heaviest selection (basically treating the duplicate confirmed block like a non-duplicate) and choose the heaviest fork as usual.
- Slot is not Duplicate Confirmed: If the block has less than 52% of votes, mark the fork as invalid fork choice and do not consider it for voting. Proceed the fork choice with all valid forks.
- Rollback Because of Duplicate Slot: If there is no alternative fork to vote on, the validator resets to the last valid ancestor and does not vote until a clean fork is produced (
SwitchForkDecision::FailedSwitchDuplicateRollback
).
- Rollback Because of Duplicate Slot: If there is no alternative fork to vote on, the validator resets to the last valid ancestor and does not vote until a clean fork is produced (
The SwitchForkDecision
is defined in consensus.rs
.20
Conclusion
In this blog post, we’ve explored how the Solana consensus algorithm, based on Proof-of-Stake (PoS), leverages a predetermined leader sequence, TowerBFT, and Proof of History (PoH) to ensure efficient block creation, validation, and finalization. The use of delegated stake adjusts leader frequency and voting weight, while TowerBFT drives consensus toward the heaviest fork, promoting stability. This combination results in a robust system that supports fast, secure transaction processing and scalability, making Solana a strong candidate for decentralized applications.
Thanks to Ashwin Sekar from Anza for proofreading this post!
If you followed along to here, and want to see more of this content, stake with us! We’ve recently launched out own validator at NdMV1C3XMCRqSBwBtNmoUNnKctYh95Ug4xb6FSTcAWr
to create an easy avenue for anyone to support independent Solana security reserach.
Further Links
- Terminology | solana.com
- Solana Validator Architecture | docs.solanalabs.com
- Solana - How it Works | report.helius.dev
- Consensus on Solana | helius.dev
- Solana Validator Economics | helius.dev
- Halting the Solana Blockchain with Epsilon Stake | dl.acm.org
- Understanding Slots, Blocks, and Epochs on Solana | helius.dev
- Turbine — Solana’s Block Propagation Protocol Solves the Scalability Trilemma | medium.com
- Turbine: Block Propagation on Solana | helius.dev
- An Illustrated Proof of the CAP Theorem | mwhittaker.github.io
- Slashing Proposal
- SIMD-0204: Slashable event verification
Code References
Footnotes
-
The block production process is controlled in the
Transaction Processing Unit (TPU)
andreplay_stage.rs
. ↩ -
leader_schedule.rs
defines the leader sequence for the epoch. ↩ -
The transactions and votes are received in the
fetch_stage.rs
. The verifying process is defined insigverify.rs
andbanking_stage.rs
. ↩ -
The creation of new blocks is defined in
poh_recorder.rs
. The structure of an Entry is defined inentry.rs
. ↩ -
The blocks are distributed in
broadcast_stage.rs
using a distribution protocol called Turbine which has the goal that every block is transferred to other validators in a quick and error-tolerant manner. ↩ -
The
Transaction Validation Unit (TVU)
andreplay_stage.rs
control the block verification and voting process. ↩ -
The threshold for the percentage of stake that needs to vote on a slot is defined in
consensus.rs
asVOTE_THRESHOLD_SIZE
= 2/3.SUPERMINORITY_THRESHOLD
= 1/3 is defined inreplay_stage.rs
. ↩ -
Optimistic confirmation is implemented in
optimistic_confirmation_verifier.rs
.cluster_info_vote_listener.rs
verifies and processes votes from other validators. It outputs optimistically confirmed slots and statistics about the voting process.window_service.rs
collects, verifies and stores shreds, which also includes handling conflicts/duplicates. ↩ -
Validators receive new blocks in
shred_fetch_stage.rs
and gossip transactions (including votes) infetch_stage.rs
. New blocks are stored inblockstore.rs
. In its current implementation, blockstore can only store one block per slot, leaving room for only one if there are duplicate blocks. ↩ -
Transaction verification and replay coordination occur in
replay_stage.rs
. As the name implies, it replays all transactions from a block, i.e. checking transactions against account balances and re-executing smart contract programs to validate correct execution. Other steps includeaccounts_hash_verifier.rs
for calculating account hashes,sigverify_shreds.rs
for verifying the leader’s signature andretransmit_stage.rs
for helping the network to retransmit shreds to other validators that thay did not fully receive. ↩ -
heaviest_subtree_fork_choice.rs
implements the fork choice concerning heaviest fork and duplicate slots. Partitions are detected inreplay_stage.rs
by the functionis_partition_detected()
, which checks if the last voted slot is not an ancestor of the heaviest slot. ↩ -
The supermajority threshold for duplicate slots is defined as
DUPLICATE_THRESHOLD
inreplay_stage.rs
and set to 52% by default. ↩ -
The function
make_check_switch_threshold_decision()
inconsensus.rs
makes the decision about how to deal with a given fork. The votes are generated in the functiongenerate_vote_tx()
in thereply_stage.rs
(when validator is currently the leader) and inrecord_vote()
inconsensus.rs
(when validator is not the leader). Inreplay_stage.rs
, the different failures that can happen when attempting to vote on the heaviest fork are defined asHeaviestForkFailures
:LockedOut
,FailedSwitchThreshold
,FailedThreshold
,NoPropagatedConfirmation
↩ -
This stack of votes for subsequent blocks is called the Vote Tower. The Tower is defined in
consensus.rs
. ↩ -
The
is_valid_switching_proof_vote()
method inconsensus.rs
decides whether a slot counts towards the switch threshold. ↩ -
consensus.rs
defines the threshold inVOTE_THRESHOLD_SIZE
= 2/3 and the depth of eight slots is defined in theVOTE_THRESHOLD_DEPTH
constant. ↩ -
The
is_propagated
field inPropagatedStats
inprogress_map.rs
keeps track of whether a block is propagated. ↩ -
The votes are sent via the gossip protocol to the next upcoming leaders in
send_transaction_service.rs
. ↩ -
The block reward structure is defined in
bank.rs
underupdate_fees()
. The inflation commission is defined inbank.rs
undercalculate_previous_epoch_inflation_rewards()
. Voting fees are defined intransaction_cost.rs
underSIMPLE_VOTE_USAGE_COST
. ↩ -
The
SwitchForkDecision
is defined inconsensus.rs
. ↩