Databases are used everywhere and are critical to most of the informatics work that is done. They are also very critical part of Big data analysis Since databases store critical data, they are handled very carefully. They are obviously backed up regularly but also get replicated and distributed. With large databases, there are many ways that they are handled, accessed and manipulated. However, one of the critical limitation in distributed systems has been best explained by the CAP theorem. Detailed information on the CAP theorem is available on the Wikipedia but the concept is simple.
First the acronym CAP : This stands for Consistency, Availability and Partition Tolerance.
Consistency: Data is consistent if replicated between different databases or servers. For e.g. Two computers serving up the same data to different parts of the world need to have the same consistent data.
Availability: The distributed database on computers is available. In the example above, either of the two computers are both available or one of them is available.
Partition tolerance: This is the ability to tolerate a lack of communication between the computers. In the example, if the two computers stop communicating amongst themselves then it will be difficult to keep the data current on both of them
The theorem states that it is impossible to have all three of them to be guaranteed and fail-safe. There has been a nice blog written explaining it at the link below but much of the theorem seems obvious. If you want the data to be consistent across computers then they need to communicate amongst themselves. This communication will take a finite amount of time so it is obvious that the data will not be current on the other one while it is being updated. And similarly, if the communication breaks down then it will be impossible to keep the data alive.
This has been proven mathematically too and engineers have also planned other work around that enables the three of the parameters to be guaranteed.
{youtube} Jw1iFr4v58M{/youtube}