Introduction
In the following article, we'll be learning as much as possible about the CAP theorem. What is the CAP theorem, you ask? The CAP theorem aka Brewer's theorem states that in order to design a distributed system, one has to choose a combination of two out of three of the properties from the CAP in the theorem's name. We'll be learning more about these properties in the next section.
The 'CAP' in CAP Theorem
The CAP in the CAP theorem consists of the initial letters of three properties, which are:
Consistency
Consistency is a property that prioritizes the distribution of the same data across all nodes in a distributed system. Ideally, if you were to fetch data from any two random nodes from a consistent system, the data in these two nodes would be the same.
Let's take an example, there's a party, this party can be called the consistent system party (lol), The party organizer makes sure that everybody in the party gets fresh food, and if there isn't fresh food for everyone, the remaining people get nothing (sad). We can see that the priority here is 'fresh food' and not everybody getting some food.
Availability
Availability is quite the opposite of Consistency. While a consistent system makes sure that the data across all the nodes remains consistent, an Available system prioritizes the availability of data. This means that if we were to choose any two random nodes from a highly Available system, the data in these two nodes may or may not be the same.
If we consider the same 'party' example as before, with this time the party being an available system party, the priority of the host is making sure that everyone in the party gets something to eat, it may not be fresh (gross, ik) or even the same as what the other people in the party are eating, but there's food for everyone. We can see that unlike consistency, the priority here is making sure that everybody gets some food/ availability of food.
Partition Tolerance
When the connection between the nodes in a distributed data system is broken, the system gets partitioned. This breaks the flow of information and affects the functioning of the system. Partition tolerance is the ability of a system to continue functioning like it usually would when it has been partitioned.
Since ruling out the possibility of partitions in a distributed system is highly improbable, partition tolerance is a crucial property for distributed systems. We won't be using the 'party' example in this section (thank god).
Since we're now aware about what the properties in CAP stand for, let's bring our attention back to the fact that we can only choose two out of three of these properties. In the next section, we'll see and evaluate the choices that a system designer can make, given these options.
The Choice: CP, AP or CA Systems
We now have to make a choice between a CP, AP or a CA system. Let's learn more about these systems and explore the options we get:
CP System
A CP system chooses Consistency and Partition Tolerance as between the three. A CP System makes sure that the data available across all the nodes in a distributed data system remains consistent. In case of a partition, the system shuts down/ sacrifices the nodes that do not fall in line with the majority of the nodes. However, this is temporary since as soon as the system heals from the partition, the nodes are brought back on and consistency is maintained.
The trade-off that comes with this choice is that the system does not run where the nodes are sacrificed, which cannot be fixed since the system does not prioritize availability. However, the recovery is often quick, which makes this a viable option
AP System
An AP System chooses Availability and Partition Tolerance over Consistency. In an AP system, availability of data is ensured in case of a partition. This means that the data may not be the same across all the nodes, but every request receives data. As soon as the network heals from the partition failure, the nodes that are out of sync get synced again.
The trade-off that comes with this choice is that the data fetched from two different nodes may be different. Any system that relies on data being consistent must give up on the system being highly available, which can be a tough choice to make. For systems that do not care about being consistent, an AP system is highly desirable since each request receives data and since the system does not care about being consistent, there is enough room for the network to heal and the nodes to get synced again.
CA System
In the previous section, we discussed how it is very difficult and almost inevitable for distributed systems to avoid partition failures. Due to this, Partition Tolerance has to be one of the two choices. For a local system that isn't distributed, avoiding partition is possible, which is why a local system can be a CA system. Such is not the case for a distributed system. So, while a CA system is possible on paper, it is not feasible.
Continue Reading About The CAP Theorem
That is it for this article. I think I've covered the crux of the theorem, but there may be a few topics that I have not covered. If you'd like to learn more about the theorem, these resources do an amazing job at explaining the theorem: