Centralized and Distributed DBMS

Perhaps the best place to start with comparing centralized and distributed DBMS instances is the architecture itself. As the names suggest, it is mostly a matter of whether the data resides in one physical location—not necessarily logical, as multiple volumes within a single location do not qualify as a distributed DBMS—or multiple locations with an underlying controller to bring it all together. It might be compared to disk RAID options, in which data on a storage system is mirrored or striped across multiple physical drives.

No alt text provided for this image

We can continue the RAID analogy in discussing replication and partitioning. Much like Distributed DBMS architecture, RAID storage allows disks to be seamlessly duplicated for high fault tolerance or the data itself to be written across multiple disks to increase storage capacity and throughput. In RAID 0, data is striped across multiple disks; this is the equivalent of DDMBS partitioning. All the nodes in a DDMBS store different parts of the complete database. This may be accomplished by horizontal partitioning (in which all columns are stored, but different nodes have different subsets of records) or vertical partitioning (in which certain columns are stored in different nodes, of all records). Alternatively, in RAID 1, a disk is mirrored to another disk; this is the equivalent of replication.

A common misconception with DDBMS instances involves the CAP theorem. There is an assumption that while CDBMS instances enjoy Consistency, Availability, and Partition Tolerance all at the same time (the latter by virtue of being in a single location and it being a moot point), DDBMS administrators must choose either CP, CA, or AP. Rather, it is more accurate to say that a DDBMS administrator, in the event of a network partition, must choose between availability or consistency. The former may sacrifice consistency and the latter may sacrifice availability.

In terms of applications, a DDBMS is most appropriate for large volumes of data or for users spread across a large geographic area. A partitioned DDBMS architecture might be optimized to store specific columns on nodes local to user groups that use those columns more frequently than other user groups, even though they are not directly accessed. Geographic spread is a relevant use case due to the various network hops and latency differences that may exist between an otherwise central data center and users worldwide.

References

Connolly, T. & Begg, C. (2015).  Database Systems: A Practical Approach to Design, Implementation, and Management (6th ed.). London, UK: Pearson. 

 Mehra, A. (2017). Understanding the CAP theorem. Retrieved from https://dzone.com/articles/understanding-the-cap-theorem

On DBaas migrations

An increasing number of enterprise systems are moving to as-a-service models, reducing a company’s overhead and turning traditional facets of information technology—those that have used up both real estate and capital expenditures—into outsourced subscriptions that are managed by outside companies. Infrastructure, Networking, and Reporting as a service are already popular. Moving the databases themselves off a company’s property and balance sheet into a cloud architecture entails what is known as Database-as-a-service (DBaaS) (Bonthu, Thammiraju, & Murthy, 2014). There are many factors involved in establishing the DBaaS environment and migrating the data from on-premise boxes to cloud.

There are typically eight steps involved in moving from on-premise to cloud databases:

  1. Define the scope of migration
  2. Ensure data security
  3. Select service provider
  4. Map the data
  5. Schedule the migration
  6. Select tools for migration or develop migration scripts
  7. Test before (and after) the migration
  8. Actual data migration

 The actual migration, insofar as relational databases are concerned, typically consists of three steps:

  1. Relational schema migration – it includes the migration of tables, indexes and views.
  2. Data migration done via tools or migration scripts. The time required for data migration depends on the size of the database.
  3. Database stored programs migration – the migration of stored procedures and triggers. (Vodomin & Andročec, 2015)

 The different types of cloud databases available, relational and non-relational, make for a variety of ways to migrate and a number of considerations for enterprise migration. Regardless, a one-time expenditure on migration can save countless dollars and hours of ballooning infrastructure and database sprawl. It is much easier to handle such sprawl by responding with both storage and virtual machine elasticity as opposed to investing more in onsite resources (Bonthu, Thammiraju, & Murthy, 2014). Further research in this space is warranted as the options for cloud architecture increase and companies have more options for service-based managed IT.

 References

 Bonthu, S., Thammiraju, S. D. M., & Murthy, Y. S. S. R. (2014). Study on database virtualization for database as a service (dbaas). International Journal of Advanced Research in Computer Science, 5(2), 31-34.

 Vodomin, G., & Andročec, D. (2015). Problems during database migration to the cloud. Paper presented at the Central European Conference on Information and Intelligence Systems, Varaždin, Croatia.