Detailed explanation of how to implement Paxos in distributed systems with python

Consistency Algorithm Background

The consistency algorithm solves the problem: data cannot exist on a single node (host) in a distributed system, otherwise a single point of failure may occur; multiple nodes (hosts) need to be guaranteed to have the same data.

2. What is consistency: consistency is that the data remains consistent, in a distributed system, it can be understood that the value of the data in multiple nodes is the same.

3. Consistency model classification: generally divided into strong consistency and weak consistency, strong consistency guarantees that the system changes the state of the cluster immediately after the submission. Common models include: Paxos, Raft (muti-paxos), ZAB (muti-paxos); weak consistency is also called the ultimate consistency, the system does not guarantee that the state of the cluster will change immediately after the change of the submission, but with the passage of time the final state is consistent. Common models include: DNS systems, Gossip protocols

4. Consistency algorithm use cases: Google's Chubby distributed locking service, using the Paxos algorithm; etcd distributed key-value database, using the Raft algorithm; ZooKeeper distributed application coordination services and Chubby's open-source implementation, using the ZAB algorithm

simple-paxos is not practical on its own to achieve consistency on a single static value; the clustered system we need to implement (a bank account service) wants to agree on a specific state (account balance) that changes over time. So we need to use Paxos to agree on each operation, treating each modification as a state machine transition.

Multi-Paxos is actually a sequence of simple Paxos instances (slots), each of which is sequentially numbered. Each state transition is given a "slot number", and each member of the cluster performs the transition in strict numerical order. In order to change the state of the cluster (e.g., to process a transport operation), we try to achieve consistency for that operation in the next slot. Specifically, this means adding a slot number to each message and tracking all protocol state on a slot-by-slot basis.

Running Paxos for each slot with at least two round trips would be too slow.Multi-Paxos is optimized by using the same set of ballot numbers for all slots and performing Prepare/Promise for all slots at the same time.

Client Proposer Acceptor Learner

| | | | | | | --- First Request ---

X-------->| | | | | | Request

| X--------->|->|->| | | Prepare(N)

| |<---------X--X--X | | Promise(N,I,{Va,Vb,Vc})

| X--------->|->|->| | | Accept!(N,I,V)

| |<---------X--X--X------>|->| Accepted(N,I,V)

|<---------------------------------X--X Response

| | | | | | |

Paxos Implementation

Implementing Multi-Paxos in utility software is notoriously difficult and has spawned many papers such as "Paxos Made Simple" and "Paxos Made Practical".

First, multiposer can be a problem in busy environments where each cluster member tries to determine its state machine operation in each slot. The solution is to elect a "leader" who is responsible for submitting ballots for each slot. All other cluster nodes send new operations to the leader for execution. As a result, in normal operation with only one leader, no voting conflicts occur.

The Prepare/Promise phase can be used as a kind of leader election: whichever cluster member has the most recently committed ballot number is considered the leader. leader is subsequently free to execute the Accept/Accepted phase directly without repeating the first phase. As we will see below, leader election is actually quite complex.

While simple Paxos guarantees that the cluster will not reach conflicting decisions, it does not guarantee that any decisions will be made. For example, if the initial Prepare message is lost and does not reach the recipient, the proposer will wait for a Promise message that never arrives. Solving this problem requires well-designed retransmissions: enough to eventually make progress, but not cluster-generated packet storms.

Another problem is the propagation of Decisions. In the normal case, simply broadcasting the Decision message solves this problem. However, if the message is lost, the node may never know about the Decision and will not be able to apply state machine transitions for future slots. So implementations need some mechanism to share information about decided proposals.

Using a distributed state machine poses another challenge: when a new node is started, it needs to obtain the existing state of the cluster.
While it is possible to do this by catching up with the decisions of all slots since the first one, in a large cluster this could involve millions of slots. In addition, we need some way to initialize a new cluster.

Introduction to Cluster Library

The preceding is all theoretical, here we use python to implement a simplified Multi-Paxos

Business Scenarios and Pain Points

Let's take a simple scenario of a bank account management service as a case study. In this service, each account has a current balance and each account has its own account number. The user can perform "deposit", "transfer" and "check current balance" operations on the account. The "transfer" operation involves two accounts: the outgoing account and the incoming account, and if the account balance is insufficient, the transfer operation must be rejected.

If this service were deployed on just one server, it could be easily implemented: an operation lock would be used to ensure that "transfer" operations are not performed simultaneously, and checks would be performed on the outgoing account. However, banks cannot rely on a single server to store critical information such as account balances, and typically these services are distributed across multiple servers, each running an instance of the same code. Users can access their accounts from any of these servers.

In a simple implementation of a distributed processing system, each server keeps a copy of the account balance. It would process any incoming operations and send updates to the account balance to the other server. But there is a serious problem with this approach: if two servers operate on an account at the same time, which new account balance is correct? Even if the servers don't share balances but rather operations, simultaneous transfers to an account can cause an overdraft.

Fundamentally, these errors occur because servers respond to operations using their local state, rather than first ensuring that the local state matches that of other servers. For example, imagine that Server A receives an operation instruction to transfer money from account 101 to account 202, when Server B has already processed another request to transfer all of the money from account 101 to account 202 without notifying Server A. Thus, Server A's local state is not the same as Server B's, and even though it would cause account 101 to be overdrawn, Server A would still allow the transfer from account 101 to proceed with the operation.

distributed state machine

To prevent this from happening we use a tool called a "distributed state machine". The idea is that each server runs the same corresponding state machine for each of the same inputs. Due to the nature of state machines, the output of each server is the same for the same input. For operations such as "transfer money", "check current balance", etc., the account number and balance are also inputs to the state machine.

The state machine for this application is relatively simple:

def execute_operation(state, operation):
     if  == 'deposit':
         if not verify_signature(operation.deposit_signature):
         return state, False
         [operation.destination_account] += 
         return state, True
     elif  == 'transfer':
         if [operation.source_account] < :
             return state, False
             [operation.source_account] -= 
         [operation.destination_account] += 
         return state, True
     elif  == 'get-balance':
     return state, []

It's worth noting that running the "Check Current Balance" operation doesn't change the current state, but we still implement it as a state change operation. This ensures that the returned balance is the most up-to-date information in the distributed system, and is not based on the local state of a server.

This may not be quite the same as the typical state machine you learned about in your computer science course. Whereas a traditional state machine is a collection of a finite number of states, each of which corresponds to a labeled transfer behavior, in this paper the states of the state machine are a collection of account balances, so there are an infinite number of possible states. However, the same basic rule of state machines applies to the state machines in this paper: for the same initial state, the same inputs always have the same outputs.

Thus, distributed state machines ensure that for the same operation, every host will have the same correspondence. However, to ensure that every server allows state machine inputs, the problem mentioned in the previous section still exists. This is a consistency problem and to solve it we use a derived Paxos algorithm.

core requirement

Conformance can be provided for larger applications: we use a Cluster library to implement simplified Multi-Paxos Correctness is the most important capability of this library, so it is important to structure the code so that we can see and test how it corresponds to the specification. Complex protocols can have complex faults, so we will build support for reproducing and debugging uncommon faults. We will implement POC code: enough to demonstrate that the core concepts are practical, and the code is structured so that subsequent additions of this functionality make minimal changes to the core implementation
Let's start coding.

Types and constants

The protocols in cluster require the use of 15 different message types, each defined using a namedturple in the collection:

Accepted = namedtuple('Accepted', ['slot', 'ballot_num'])
    Accept = namedtuple('Accept', ['slot', 'ballot_num', 'proposal'])
    Decision = namedtuple('Decision', ['slot', 'proposal'])
    Invoked = namedtuple('Invoked', ['client_id', 'output'])
    Invoke = namedtuple('Invoke', ['caller', 'client_id', 'input_value'])
    Join = namedtuple('Join', [])
    Active = namedtuple('Active', [])
    Prepare = namedtuple('Prepare', ['ballot_num'])
    Promise = namedtuple('Promise', ['ballot_num', 'accepted_proposals'])
    Propose = namedtuple('Propose', ['slot', 'proposal'])
    Welcome = namedtuple('Welcome', ['state', 'slot', 'decisions'])
    Decided = namedtuple('Decided', ['slot'])
    Preempted = namedtuple('Preempted', ['slot', 'preempted_by'])
    Adopted = namedtuple('Adopted', ['ballot_num', 'accepted_proposals'])
    Accepting = namedtuple('Accepting', ['leader'])

Using named tuples to describe each message type keeps the code clean and helps avoid some simple errors. If the named tuple constructor is not given the correct attributes, it will raise an exception, making the error obvious. Tuples can be nicely formatted in log messages k and do not use as much memory as dictionaries.

Create message:

msg = Accepted(slot=10, ballot_num=30)

Access messages:

got_ballot_num = msg.ballot_num

We'll learn what these messages mean later.
The code also introduces a number of constants, most of which define timeouts for various messages:

JOIN_RETRANSMIT = 0.7
    CATCHUP_INTERVAL = 0.6
    ACCEPT_RETRANSMIT = 1.0
    PREPARE_RETRANSMIT = 1.0
    INVOKE_RETRANSMIT = 0.5
    LEADER_TIMEOUT = 1.0
    NULL_BALLOT = Ballot(-1, -1)  # sorts before all real ballots
    NOOP_PROPOSAL = Proposal(None, None, None)  # no-op to fill otherwise empty slots

Finally we need to define the Proposal and Ballot in the protocol

Proposal = namedtuple('Proposal', ['caller', 'client_id', 'input'])
    Ballot = namedtuple('Ballot', ['n', 'leader'])

component model

The core components that implement multipaxos include Role and Node.

To ensure testability and keep the code readable, we decompose Cluster into classes corresponding to the roles described in the protocol. Each is a subclass of Role.

class Role(object):

    def __init__(self, node):
         = node
        (self)
         = True
         = (type(self).__name__)

    def set_timer(self, seconds, callback):
        return .set_timer(, seconds,
                                           lambda:  and callback())

    def stop(self):
         = False
        (self)

Roles for cluster nodes are glued together by the Node class, which represents a single node on the network. Roles will be added to and removed from nodes during the course of the program.

Messages arriving at a node are relayed to all active roles, calling methods named after the message type, prefixed with do_. These do_ methods receive the attributes of the message as keyword arguments for easy access.The Node`` class also provides the ``send method as a convenience, using some of the arguments provided for the same method of the Network class.

class Node(object):
    unique_ids = ()

    def __init__(self, network, address):
         = network
         = address or 'N%d' % self.unique_ids.next()
         = SimTimeLogger(
            (), {'network': })
        ('starting')
         = []
         = (, self)

    def register(self, roles):
        (roles)

    def unregister(self, roles):
        (roles)

    def receive(self, sender, message):
        handler_name = 'do_%s' % type(message).__name__

        for comp in [:]:
            if not hasattr(comp, handler_name):
                continue
            ("received %s from %s", message, sender)
            fn = getattr(comp, handler_name)
            fn(sender=sender, **message._asdict())

application interface

A Member object is created and launched on each cluster member, providing a list of application-specific state machines and peers. The Member object adds the bootstrap role to the node if the Member object is joining an existing cluster, or the seed if a new cluster is being created. reuse to run the protocol in a separate thread.

The application initiates a state transition by interacting with the cluster through this invoke method. After determining the offer and running the state machine, the invoke returns the output of the state machine. The method uses a simple synchronized Queue to wait for the result of the protocol thread.

class Member(object):

    def __init__(self, state_machine, network, peers, seed=None,
                 seed_cls=Seed, bootstrap_cls=Bootstrap):
         = network
         = network.new_node()
        if seed is not None:
            self.startup_role = seed_cls(, initial_state=seed, peers=peers,
                                      execute_fn=state_machine)
        else:
            self.startup_role = bootstrap_cls(,
                                      execute_fn=state_machine, peers=peers)
         = None

    def start(self):
        self.startup_role.start()
         = (target=)
        ()

    def invoke(self, input_value, request_cls=Requester):
        assert  is None
        q = ()
         = request_cls(, input_value, )
        ()
        output = ()
         = None
        return output

Role class

The roles in the Paxos protocol include: client, acceptor, proposer, learner, and leader. in a typical implementation, a single processor can play one or more roles at the same time. This does not affect the correctness of the protocol, and roles are often merged to improve latency and/or the number of messages in the protocol.

The following implements each role class one by one

Acceptor

The Acceptor class implements the acceptor role in Paxos, so it must store the ballot numbers of the most recent promises, as well as the proposals for each slot accepted at each time slot, along with the corresponding Prepare and Accept messages. The POC implementation here is a short class that can correspond directly to the protocol, and for acceptor Multi-paxos looks like simple Paxos, just with the slot number added to the message.

class Acceptor(Role):

    def __init__(self, node):
        super(Acceptor, self).__init__(node)
        self.ballot_num = NULL_BALLOT
        self.accepted_proposals = {}  # {slot: (ballot_num, proposal)}

    def do_Prepare(self, sender, ballot_num):
        if ballot_num > self.ballot_num:
            self.ballot_num = ballot_num
            # we've heard from a scout, so it might be the next leader
            ([], Accepting(leader=sender))

        ([sender], Promise(
            ballot_num=self.ballot_num, 
            accepted_proposals=self.accepted_proposals
        ))

    def do_Accept(self, sender, ballot_num, slot, proposal):
        if ballot_num >= self.ballot_num:
            self.ballot_num = ballot_num
            acc = self.accepted_proposals
            if slot not in acc or acc[slot][0] < ballot_num:
                acc[slot] = (ballot_num, proposal)

        ([sender], Accepted(
            slot=slot, ballot_num=self.ballot_num))

Replica

The Replica class is the most complex subclass of the Role class, corresponding to the Learner and Proposal roles in the protocol, and its main responsibilities are: proposing new proposals; invoking the local state machine when deciding on a proposal; keeping track of the current Leader; and adding newly started nodes to the cluster.

The replica creates a new proposal in response to an "invoke" message from the client, selects the slot it believes is unused, and sends a "Propose" message to the current leader. If the consensus for the selected slot is for a different proposal, the replica must re-propose using the new slot.

The following figure shows the role control flow for Replica:

Requester Local Rep Current Leader

X---------->| | Invoke

| X------------>| Propose

| |<------------X Decision

|<----------X | Decision

| | |

The Decision message indicates the slots on which the cluster has reached consensus, and the Replica class stores the new decision and runs the state machine until it reaches the undecided slots.The Replica identifies the decided slots on which the cluster has agreed from the submitted slots that have already been processed by the local state machine.In case of disordered slots, the submitted proposal may lag, waiting for the next empty slot to be decided. After submitting a slot, each replica sends an Invoked message back to the requester with the result of the operation.

In some cases, a slot may have no valid proposal or decision, requiring state machines to execute slots one after another, so the cluster must reach a consensus on what to fill the slot with. To avoid this possibility, the replica will propose a "no-op" proposal when it encounters a slot, and if it finally decides on such a proposal, the state machine will not perform any operation on the slot.

Similarly, it is possible for the same proposal to be Decisioned twice. For any such duplicate proposal, Replica will skip calling the state machine without performing any state transition for that slot.

Replicas need to know which node is the active leader in order to send a Propose message to it; to accomplish this, each replica uses three sources of information to track the active leader.

When the leader's role switches to active, it sends an Adopted message (below) to the replicas on the same node:

Leader Local Repplica

X----------->| Admopted

When the acceptor role sends an Accepting message to the Promise's new leader, it sends the message to its local copy (below).

Acceptor Local Repplica

X----------->| Accepting

The active leader will send Active messages in the form of a heartbeat. If no such message arrives before the expiration of LEADER_TIMEOUT, the Replica will assume that the leader is dead and move on to the next leader.In this case it is important that all replicas select the same new leader.We can do this by sorting the members and selecting the next leader in the list.

When a node joins the network, Bootstrap sends a Join message (below.) Replica responds with a Welcome message containing its latest status, allowing the new node to be enabled quickly.

BootStrap     Replica        Replica       Replica
     X---------->|             |             |    Join
     |<----------X             X             |    Welcome
     X------------------------>|             |    Join
     |<------------------------X             |    Welcome
     X-------------------------------------->|    Join
     |<--------------------------------------X    Welcome      
class Replica(Role):

    def __init__(self, node, execute_fn, state, slot, decisions, peers):
        super(Replica, self).__init__(node)
        self.execute_fn = execute_fn
         = state
         = slot
         = decisions
         = peers
         = {}
        # next slot num for a proposal (may lead slot)
        self.next_slot = slot
        self.latest_leader = None
        self.latest_leader_timeout = None

    # making proposals

    def do_Invoke(self, sender, caller, client_id, input_value):
        proposal = Proposal(caller, client_id, input_value)
        slot = next((s for s, p in () if p == proposal), None)
        # propose, or re-propose if this proposal already has a slot
        (proposal, slot)

    def propose(self, proposal, slot=None):
        """Send (or resend, if slot is specified) a proposal to the leader"""
        if not slot:
            slot, self.next_slot = self.next_slot, self.next_slot + 1
        [slot] = proposal
        # find a leader we think is working - either the latest we know of, or
        # ourselves (which may trigger a scout to make us the leader)
        leader = self.latest_leader or 
        (
            "proposing %s at slot %d to leader %s" % (proposal, slot, leader))
        ([leader], Propose(slot=slot, proposal=proposal))

    # handling decided proposals

    def do_Decision(self, sender, slot, proposal):
        assert not (, None), \
                "next slot to commit is already decided"
        if slot in :
            assert [slot] == proposal, \
                "slot %d already decided with %r!" % (slot, [slot])
            return
        [slot] = proposal
        self.next_slot = max(self.next_slot, slot + 1)

        # re-propose our proposal in a new slot if it lost its slot and wasn't a no-op
        our_proposal = (slot)
        if (our_proposal is not None and 
            our_proposal != proposal and our_proposal.caller):
            (our_proposal)

        # execute any pending, decided proposals
        while True:
            commit_proposal = ()
            if not commit_proposal:
                break  # not decided yet
            commit_slot,  = ,  + 1

            (commit_slot, commit_proposal)

    def commit(self, slot, proposal):
        """Actually commit a proposal that is decided and in sequence"""
        decided_proposals = [p for s, p in () if s < slot]
        if proposal in decided_proposals:
            (
                "not committing duplicate proposal %r, slot %d", proposal, slot)
            return  # duplicate

        ("committing %r at slot %d" % (proposal, slot))
        if  is not None:
            # perform a client operation
            , output = self.execute_fn(, )
            ([], 
                Invoked(client_id=proposal.client_id, output=output))

    # tracking the leader

    def do_Adopted(self, sender, ballot_num, accepted_proposals):
        self.latest_leader = 
        self.leader_alive()

    def do_Accepting(self, sender, leader):
        self.latest_leader = leader
        self.leader_alive()

    def do_Active(self, sender):
        if sender != self.latest_leader:
            return
        self.leader_alive()

    def leader_alive(self):
        if self.latest_leader_timeout:
            self.latest_leader_timeout.cancel()

        def reset_leader():
            idx = (self.latest_leader)
            self.latest_leader = [(idx + 1) % len()]
            ("leader timed out; tring the next one, %s", 
                self.latest_leader)
        self.latest_leader_timeout = self.set_timer(LEADER_TIMEOUT, reset_leader)

    # adding new cluster members

    def do_Join(self, sender):
        if sender in :
            ([sender], Welcome(
                state=, slot=, decisions=))

Leader Scout Commander

The main task of the Leader is to receive messages from Propose asking for a new vote and make a decision. After successfully completing the Prepare/Promise part of the protocol, the Leader will be "Active". An active Leader can immediately send an Accept message in response to Propose.

Consistent with the role-by-role model, the Leader delegates scout and Commander roles to execute each part of the protocol.

class Leader(Role):

    def __init__(self, node, peers, commander_cls=Commander, scout_cls=Scout):
        super(Leader, self).__init__(node)
        self.ballot_num = Ballot(0, )
         = False
         = {}
        self.commander_cls = commander_cls
        self.scout_cls = scout_cls
         = False
         = peers

    def start(self):
        # reminder others we're active before LEADER_TIMEOUT expires
        def active():
            if :
                (, Active())
            self.set_timer(LEADER_TIMEOUT / 2.0, active)
        active()

    def spawn_scout(self):
        assert not 
         = True
        self.scout_cls(, self.ballot_num, ).start()

    def do_Adopted(self, sender, ballot_num, accepted_proposals):
         = False
        (accepted_proposals)
        # note that we don't re-spawn commanders here; if there are undecided
        # proposals, the replicas will re-propose
        ("leader becoming active")
         = True

    def spawn_commander(self, ballot_num, slot):
        proposal = [slot]
        self.commander_cls(, ballot_num, slot, proposal, ).start()

    def do_Preempted(self, sender, slot, preempted_by):
        if not slot:  # from the scout
             = False
        ("leader preempted by %s", preempted_by.leader)
         = False
        self.ballot_num = Ballot((preempted_by or self.ballot_num).n + 1, 
                                 self.ballot_num.leader)

    def do_Propose(self, sender, slot, proposal):
        if slot not in :
            if :
                [slot] = proposal
                ("spawning commander for slot %d" % (slot,))
                self.spawn_commander(self.ballot_num, slot)
            else:
                if not :
                    ("got PROPOSE when not active - scouting")
                    self.spawn_scout()
                else:
                    ("got PROPOSE while scouting; ignored")
        else:
            ("got PROPOSE for a slot already being proposed")

The Leader creates a Scout role when it wants to become active in response to Propose receiving a message while it's inactive (below), and the Scout sends (and resends, if necessary) the Prepare message and collects Promise responses until it hears the message. Most peers or until they are preempted. After replying to the Leader via Adopted or Preempted.

Leader Scout Acceptor Acceptor Acceptor

| | | | |

| X--------->| | | Prepare

| |<---------X | | Promise

| X---------------------->| | Prepare

| |<----------------------X | Promise

| X---------------------------------->| Prepare

| |<----------------------------------X Promise

|<---------X | | | Adopted

class Scout(Role):

def __init__(self, node, ballot_num, peers):
        super(Scout, self).__init__(node)
        self.ballot_num = ballot_num
        self.accepted_proposals = {}
         = set([])
         = peers
         = len(peers) / 2 + 1
        self.retransmit_timer = None

    def start(self):
        ("scout starting")
        self.send_prepare()

    def send_prepare(self):
        (, Prepare(ballot_num=self.ballot_num))
        self.retransmit_timer = self.set_timer(PREPARE_RETRANSMIT, self.send_prepare)

    def update_accepted(self, accepted_proposals):
        acc = self.accepted_proposals
        for slot, (ballot_num, proposal) in accepted_proposals.iteritems():
            if slot not in acc or acc[slot][0] < ballot_num:
                acc[slot] = (ballot_num, proposal)

    def do_Promise(self, sender, ballot_num, accepted_proposals):
        if ballot_num == self.ballot_num:
            ("got matching promise; need %d" % )
            self.update_accepted(accepted_proposals)
            (sender)
            if len() >= :
                # strip the ballot numbers from self.accepted_proposals, now that it
                # represents a majority
                accepted_proposals = \ 
                    dict((s, p) for s, (b, p) in self.accepted_proposals.iteritems())
                # We're adopted; note that this does *not* mean that no other
                # leader is active.  # Any such conflicts will be handled by the
                # commanders.
                ([],
                    Adopted(ballot_num=ballot_num, 
                            accepted_proposals=accepted_proposals))
                ()
        else:
            # this acceptor has promised another leader a higher ballot number,
            # so we've lost
            ([], 
                Preempted(slot=None, preempted_by=ballot_num))
            ()

The Leader creates a Commander role (below) for each slot with an active proposal. Like Scout, Commander sends and resends Accept messages and waits for the majority of recipients to reply with Accepted or preempted messages. After accepting the proposal, the Commander broadcasts the Decision message to all nodes. It responds to the Leader with Decided or Preempted.

Leader Commander Acceptor Acceptor Acceptor

| | | | |

| X--------->| | | Accept

| |<---------X | | Accepted

| X---------------------->| | Accept

| |<----------------------X | Accepted

| X---------------------------------->| Accept

| |<----------------------------------X Accepted

|<---------X | | | Decided

class Commander(Role):

def __init__(self, node, ballot_num, slot, proposal, peers):
        super(Commander, self).__init__(node)
        self.ballot_num = ballot_num
         = slot
         = proposal
         = set([])
         = peers
         = len(peers) / 2 + 1

    def start(self):
        (set() - , Accept(
            slot=, ballot_num=self.ballot_num, proposal=))
        self.set_timer(ACCEPT_RETRANSMIT, )

    def finished(self, ballot_num, preempted):
        if preempted:
            ([], 
                           Preempted(slot=, preempted_by=ballot_num))
        else:
            ([], 
                           Decided(slot=))
        ()

    def do_Accepted(self, sender, slot, ballot_num):
        if slot != :
            return
        if ballot_num == self.ballot_num:
            (sender)
            if len() < :
                return
            (, Decision(
                           slot=, proposal=))
            (ballot_num, False)
        else:
            (ballot_num, True)

One problem that will be described later is that the network simulator introduces packet loss even on intra-node messages. When all Decision messages are lost, the protocol cannot continue. the Replica continues to retransmit Propose messages, but the Leader ignores them because it has already made a proposal for that slot, and since no Replica has received a Decision so the Replica's catch process can't find the As a result, the solution is something like the actual network stack to ensure that local messages are always delivered successfully.

Bootstrap

When a node joins a cluster, it must get the current cluster state. The Bootstrap role loops through each node to send a join message, knowing that it has received a Welcome, Bootstrap's timing diagram is shown below:

Implementing the startup process in each role (replica, leader, acceptor) and waiting for the welcome message would spread the initialization logic to each role, which would be very cumbersome to test, in the end, we decided to add bootstrap roles, and once the startup is complete, add each role to the node and pass the initial state to their constructor.

class Bootstrap(Role):

    def __init__(self, node, peers, execute_fn,
                 replica_cls=Replica, acceptor_cls=Acceptor, leader_cls=Leader,
                 commander_cls=Commander, scout_cls=Scout):
        super(Bootstrap, self).__init__(node)
        self.execute_fn = execute_fn
         = peers
        self.peers_cycle = (peers)
        self.replica_cls = replica_cls
        self.acceptor_cls = acceptor_cls
        self.leader_cls = leader_cls
        self.commander_cls = commander_cls
        self.scout_cls = scout_cls

    def start(self):
        ()

    def join(self):
        ([next(self.peers_cycle)], Join())
        self.set_timer(JOIN_RETRANSMIT, )

    def do_Welcome(self, sender, state, slot, decisions):
        self.acceptor_cls()
        self.replica_cls(, execute_fn=self.execute_fn, peers=,
                         state=state, slot=slot, decisions=decisions)
        self.leader_cls(, peers=, commander_cls=self.commander_cls,
                        scout_cls=self.scout_cls).start()
        ()

Above is a detailed explanation of how to use python to implement Paxos in distributed systems in detail, more information about python please pay attention to my other related articles!