Database Indexing and Clusters

A useful analogy for database indexing is highlighting an article or book. We highlight or mark passages to make them easier for retrieval at a later date, when we pick up the book and want to find something quickly. Likewise, an index provides quick access to key information in the database tables. It would be silly to highlight an entire book, or mark up all of the pages; by the same token, it would be functionally useless to index a large number of columns in a table. There is a point of diminishing returns here.

It is generally recommended to index columns that are involved in WHERE or JOIN clauses (Larsen, 2010). These columns are frequently sought out by multiple query operations and are typically as critical to the table as the Primary Key. It is important to choose wisely here because for every table operation done, an index update must be done. This work can become exponential if multiple indexes are placed on a single table. Again, we come back to the principle of diminishing returns.

There is also the matter of choosing between clustered and nonclustered indexes. The former typically reads like browsing through a telephone directory: in order. Primary Keys are typically used in clustered indexes. One drawback here is the need to re-order the index when information is updated, added, or deleted. On the other hand, a non-clustered index operates much like an index in the back of a textbook, or like a dimension table in a star-schema database. While the latter may seem more advantageous at all times, but it usually shines when values are constantly updated. In situations where most of the table data is returned in a query, or a Primary Key is the rational identifier, a clustered index is the type of choice.


Connolly, T. & Begg, C. (2015).  Database Systems – Database Systems: A Practical Approach to Design, Implementation, and Management (6th ed.). London, UK: Pearson. Larsen, G. A. (2010). The dos and don’ts of database indexing. Retrieved from

Wagner, B. (2017). Clustered vs nonclustered: What index is right for my data? Retrieved from