I’ve written on Data Mining before, as it is a fundamental step for higher-order predictive and prescriptive analytics work. Enterprise data warehouses are a trove of useful information, and data mining methods help to separate what is useful from what is not (Sharma, Sharma, & Sharma, 2013). Data mining is itself an analysis method; that is, “the analysis of data that was collected for other purposes but not the questions to be answered through the data mining process” (Maaß, Spruit, & de Waal, 2014, p. 2). Data mining takes on the unknown-unknowns of the dataset and begins to make sense of the vast amount of data points available. It involves both data transformation and reduction. These are necessary as “prediction algorithms have no control over the quality of the features and must accept it as a source of error” (Maaß, Spruit, & de Waal, 2014, p. 6). Data mining reduces the noise and eliminates the dilution of relevant data by irrelevant covariates. It provides the business intelligence framework with usable data and a minimum of error.
Tembhurkar, Tugnayat, & Nagdive (2014) outline five stages for successful data-to-BI transformation:
- Collection of raw data
- Data mining and cleansing
- Data warehousing
- Implementation of BI tools
- Analysis of outputs (p. 132).
Given the importance of Data Mining in the BI process, I do not see it going away or diminishing in stature. In fact, more attention may be coming to it because of the growing interest in data lakes and ELT over ETL (e.g., Meena & Vidhyameena, 2016; Rajesh & Ramesh, 2016). Increased attention will be paid to mining and cleansing practices. New developments will include advances in unstructured data, IoT data, distributed systems data mining, and NLP/multimedia data mining.
Maaß, D., Spruit, M., & de Waal, P. (2014). Improving short-term demand forecasting for short-lifecycle consumer products with data mining techniques. Decision Analytics, 1(1), 1–17.
Meena, S. D., & Vidhyameena, S. (2016). Data lake – a new data repository for big data analytics workloads. International Journal of Advanced Research in Computer Science, 7(5), 65-67.
Rajesh, K. V. N., & Ramesh, K. V. N. (2016). An introduction to data lake. i-Manager’s Journal on Information Technology, 5(2), 1-4.
Sharma, S. A., Sharma, A. K., & Sharma, D. M. (2013). Using Data Mining for Prediction: A Conceptual Analysis. I-Manager’s Journal on Information Technology, 2(1), 1–9.
Tembhurkar, M. P., Tugnayat, R. M., & Nagdive, A. S. (2014). Overview on data mining schemes to design business intelligence framework for mobile technology. International Journal of Advanced Research in Computer Science, 5(8).