Why K8S HA chooses 3 instead of 5..6..7 as the size of masters?

Theory: In order to take any action, there has to be n/2 + 1 members reach the consensus.

Algorithm: RAFT

Cluster size

Majority

Fault Tolerance

1

1

0

2

2

0

3

2

1

4

3

1

5

3

2

6

4

2

7

4

3

8

5

3

9

5

4

cluster_size / 2 + 1 = majority

fault_tolerance = cluster_size - majority - In order to let the system work properly on failures , the number of majority has to be reached. So the number of fault_tolerance is the cluster size minus the required number of majority.

Why choosing 3:

If all other factors are the same, the larger cluster size, the more expensive. Choosing 4 does not increase the number of fault tolerance. However, if choosing 9, one needs to consider the cost of the cluster.

Sometime people choose 5 as well in production:

You have 2 fault tolerances. That means you could have one fails, one is down for maintainance.

Last updated

Was this helpful?