Centralized and Distributed DBMS

Perhaps the best place to start with comparing centralized and distributed DBMS instances is the architecture itself. As the names suggest, it is mostly a matter of whether the data resides in one physical location—not necessarily logical, as multiple volumes within a single location do not qualify as a distributed DBMS—or multiple locations with an underlying controller to bring it all together. It might be compared to disk RAID options, in which data on a storage system is mirrored or striped across multiple physical drives.

No alt text provided for this image

We can continue the RAID analogy in discussing replication and partitioning. Much like Distributed DBMS architecture, RAID storage allows disks to be seamlessly duplicated for high fault tolerance or the data itself to be written across multiple disks to increase storage capacity and throughput. In RAID 0, data is striped across multiple disks; this is the equivalent of DDMBS partitioning. All the nodes in a DDMBS store different parts of the complete database. This may be accomplished by horizontal partitioning (in which all columns are stored, but different nodes have different subsets of records) or vertical partitioning (in which certain columns are stored in different nodes, of all records). Alternatively, in RAID 1, a disk is mirrored to another disk; this is the equivalent of replication.

A common misconception with DDBMS instances involves the CAP theorem. There is an assumption that while CDBMS instances enjoy Consistency, Availability, and Partition Tolerance all at the same time (the latter by virtue of being in a single location and it being a moot point), DDBMS administrators must choose either CP, CA, or AP. Rather, it is more accurate to say that a DDBMS administrator, in the event of a network partition, must choose between availability or consistency. The former may sacrifice consistency and the latter may sacrifice availability.

In terms of applications, a DDBMS is most appropriate for large volumes of data or for users spread across a large geographic area. A partitioned DDBMS architecture might be optimized to store specific columns on nodes local to user groups that use those columns more frequently than other user groups, even though they are not directly accessed. Geographic spread is a relevant use case due to the various network hops and latency differences that may exist between an otherwise central data center and users worldwide.

References

Connolly, T. & Begg, C. (2015).  Database Systems: A Practical Approach to Design, Implementation, and Management (6th ed.). London, UK: Pearson. 

 Mehra, A. (2017). Understanding the CAP theorem. Retrieved from https://dzone.com/articles/understanding-the-cap-theorem

Most content also appears on my LinkedIn page.

Leave a Reply

Your email address will not be published. Required fields are marked *