This is my attempt to describe the basic paxos algorithm in a programmer friendly way. I try to merge the description in Leslie Lamport's The Part-Time Parliament paper and subsequent TLA+ specification.
Process
- a Paxos participant (priest in PTP paper).Ballot
- a referendum on a single decree. Each Ballot is identified by a unique ballot number, and Ballots are ordered by the ballot number.Decree
- represents the value being agreed upon, i.e. the value being voted on.Ballot Number
- a unique number made up of a pair - proposal number, and process id. Ballot Numbers are sorted by proposal number, followed by process id.Process id
- a unique id given to each process.Proposal number
- a monotonically increasing sequence number maintained by each process,-1
indicates none, valid values are>= 0
.Vote v
- a vote cast by a process, is a tuple containing the process id of the voter, the ballot number, and the decree being voted. Votes are ordered by ballot numbers.Ledger
- each process must maintain some data in persistent storage - the ledger represents the storage data structure.quorumSize
-(Number of Participants + 1) / 2
, where Number of participants is an odd number.
MaxVote(votes)
- function that returns the max vote cast, wheremax()
is based on the ordering of votes by ballot number.owner(b)
- returns the process id that initiated the ballot numberb
.
outcome
- this is value of thedecree
in the ledger, or NULL if there is nothing written yet.lastTried
- The ballot number that the processp
last tried to initiate, or(-1,p.id)
if none.maxBal
- The maximum ballot number that processp
ever agreed to participate in, or(-1,p.id)
ifp
has never agreed to participate in a ballot.maxVBal
- The ballot number in whichp
last voted or(-1,p.id)
ifp
never voted.maxVal
- The value of the decree associated withmaxVBal
, i.e. the decree thatp
last voted, or NULL ifp
never voted.
-
status
- the status of processp
, which can be one of the following:idle
- not conducting or trying to begin a ballottrying
- trying to begin ballot numberledger.lastTried
polling
- Conducting ballot numberledger.lastTried
- On startup the status is assumed to be
idle
.
-
prevVotes
- the set of votes received inLastVote
messages for the current ballot (i.e. ballot number inledger.lastTried
). -
quorum
- the set of processes includingp
, that responded withLastVote
messages for current ballot, only meaningful whenstatus == polling
. -
voters
- the set of processes includingp
, from whomp
has receivedVoted
messages in the current ballot, only meaningful whenstatus == polling
. -
decree
- ifstatus == polling
, then the decree of the current ballot, otherwise meaningless.
NextBallot
- aka PREPARE 1a - message sent by the ballot conductor.LastVote
- aka PROMISE 1b - message sent by participant to ballot conductor.BeginBallot
- aka ACCEPT 2a - messages sent by the ballot conductor.Voted
- aka ACCEPTED 2b - message sent by participant to ballot conductor.Success
- message sent by ballot conductor to all processes once the ballot is successfully completed.
The content and timing of each message is described below.
The following algorithm must be implemented for each process that is participating in a Basic Paxos run.
This step is invoked when a new ballot must be started, perhaps on client request.
- If
status
!=idle
, reject request. Ifstatus
!=idle
,p
is already conducting a ballot. - Let
b = ledger.lastTried + 1
, where1
is added to the proposal number. First valid proposal number is thus0
, as initial value ofledger.lastTried
is(-1, p.id)
. - Set
ledger.lastTried
tob
. - Set
p.status
totrying
. - Set
p.prevVotes
to empty set. - To each process participating in basic paxos, send
NextBallot(ledger.lastTried)
PREPARE 1a message, including to itself.
This is executed by each process that receives the NextBallot
message.
- Let
b
=NextBallot.b
. - if
b > ledger.maxBal
then- Set
ledger.maxBal
tob
- To the sender of ballot
b
, i.e.owner(b)
, sendLastVote
(PROMISE 1b) message with following contentsvoter
-p.id
(i.e. the id of the process sending theLastVote
)v
Vote containingb
- ballot numberledger.maxVBal
ledger.maxVal
- Set
- if
b == ledger.lastTried
andstatus == trying
then- Set add vote
v
to the setprevVotes
. Note that two votes that are from different participants (i.e.LastVote.voter
is different), must be considered as distinct votes here. - If count of
prevVotes
with ballotb
is>=
toquorumSize
then start polling.
- Set add vote
This step is enabled when status=trying
and there is a quorum of votes in prevVotes
as described above.
- Set
status
topolling
- Set
quorum
to the set of processes inprevVotes
wherev.b=ledger.lastTried
- Set
voters
to the empty set. - Let
maxVote = MaxVote(prevVotes)
- If
maxVote
has a ballotb
with proposal number-1
, then setdecree
to the value requested by client. - Else if
maxVote
has a ballotb
with proposal number>=0
then setdecree
tomaxVote.decree
. - Send
BeginBallot(b,decree)
ACCEPT 2a to all the participants, including itself.
This is executed by each process that receives the BeginBallot
message.
- if
BeginBallot.b >= ledger.maxBal
- Set
ledger.maxBal
toBeginBallot.b
- Set
ledger.maxVBal
toBeginBallot.b
- Set
ledger.maxVal
toBeginBallot.decree
- Send to the owner of ballot
owner(BeginBallot.b)
, aVoted(b, id)
ACCEPTED 2b message whereb
isledger.maxBal
, andid
is the process sending theVoted
message.
- Set
This step is enabled when status=polling
.
- if
Voted.b == ledger.lastTried
andstatus == polling
- Add voter process
Voted.voter
to the set ofvoters
. - if count of
voters
is>=
toquorumSize
then- If
ledger.outcome
is NULL then setledger.outcome
todecree
- Send
Success(ledger.outcome)
to all participants, including itself - Set
status
toidle
- If
- Add voter process
- If
ledger.outcome
is NULL then setledger.outcome
toSuccess.outcome