Skip to content

Error in the replica message processing in mencius.go #21

@PasinduTennage

Description

@PasinduTennage

There is an error in design in the mencius message processing logic.

`select {

	case propose := <-r.ProposeChan:
		//got a Propose from a client
		dlog.Printf("Proposal with id %d\n", propose.CommandId)
		r.handlePropose(propose)
		break

	case skipS := <-r.skipChan:
		skip := skipS.(*menciusproto.Skip)
		//got a Skip from another replica
		dlog.Printf("Skip for instances %d-%d\n", skip.StartInstance, skip.EndInstance)
		r.handleSkip(skip)

	case prepareS := <-r.prepareChan:
		prepare := prepareS.(*menciusproto.Prepare)
		//got a Prepare message
		dlog.Printf("Received Prepare from replica %d, for instance %d\n", prepare.LeaderId, prepare.Instance)
		r.handlePrepare(prepare)
		break

	case acceptS := <-r.acceptChan:
		accept := acceptS.(*menciusproto.Accept)
		//got an Accept message
		dlog.Printf("Received Accept from replica %d, for instance %d\n", accept.LeaderId, accept.Instance)
		r.handleAccept(accept)
		break

	case commitS := <-r.commitChan:
		commit := commitS.(*menciusproto.Commit)
		//got a Commit message
		dlog.Printf("Received Commit from replica %d, for instance %d\n", commit.LeaderId, commit.Instance)
		r.handleCommit(commit)
		break

	case prepareReplyS := <-r.prepareReplyChan:
		prepareReply := prepareReplyS.(*menciusproto.PrepareReply)
		//got a Prepare reply
		dlog.Printf("Received PrepareReply for instance %d\n", prepareReply.Instance)
		r.handlePrepareReply(prepareReply)
		break

	case acceptReplyS := <-r.acceptReplyChan:
		acceptReply := acceptReplyS.(*menciusproto.AcceptReply)
		//got an Accept reply
		dlog.Printf("Received AcceptReply for instance %d\n", acceptReply.Instance)
		r.handleAcceptReply(acceptReply)
		break`

In Mencius, each node should have FIFO channels, which is correctly implemented in this implementation. However, upon receiving a message from a node, that message is pushed to a channel that is specific to that message type. Then the messages are processed in the receiver side in non-FIFO method. The following is an example where this design approach breaks safety.

Assume that there are 3 nodes; A, B and C. Node A first sends a Accept message and then later sends a Propose message. Now both these messages are received by B in the order sent by A. However, upon receiving the two messages, Node B will push these messages to two separate queues. Another thread scans each channel using a select polling mechanism.

Now there is a violation of the protocol if the Propose message is first processed by B (which is possible in this design). This is a problem in mencius because, from messages each node derives piggy backed messages, hence the order of processing messages should be strictly similar to the sender's order.

A fix for this would be to have a single channel for each type of replica messages.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions