Machine Learning: Supervised and Unsupervised

Supervised typically takes the form of classification or regression. We know the input and output variables, and try to make sense of the relationships between the two. Tembhurkar, Tugnayat, & Nagdive (2014) refer to this as Descriptive mining. Common methods include decision tree, kNN algorithm, regression, and discriminant analysis. The methods are dependent upon the type of data input: continuous variables will use regression methods, while discrete variables will use classification methods.

For example, a human resources division in a large multinational company wants to determine what factors have contributed to employee attrition over the past two years. A decision tree methodology can produce a simple “if-then” map of what attributes combine and result in a separated employee. An example tree might point out that a male employee over the age of 45, working in Division X, who commutes more than 25 miles from home, has a manager 10 years or more his junior, and has been in the same unit for more than seven years is a prime candidate for attrition. Although many of the variables are continuous, a decision tree method makes the data manageable and actionable for human resources division use.

Unsupervised are usually clustering or association. The output variables are not known, and we are relying on the system to make sense of the data. No a priori knowledge. Temburkhar et al refers to this as Prescriptive mining. Common methods include neural networks, anomaly detection, k-means clustering, and principal components analysis. The methods are dependent upon the type of data input: continuous variables will use association methods, while discrete variables will use clustering methods.

For example, a multi-level marketing company has a number of data points on its associates: units sold, associates recruited, years in the program, rewards program tier, et cetera. They know the associates can be grouped into performance categories akin to novice and expert but are unclear on both how many categories to look at and what factors are important. Principal components analysis and k-means clustering can reveal how the associates differentiate themselves based on the available variables and suggest an appropriate number of categories within which to classify them.

References

Brownlee, J. (2016, September 22). Supervised and unsupervised machine learning algorithms.  Retrieved from https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/

Soni, D. (2018, March 22). Supervised vs. Unsupervised learning – towards data science.  Retrieved from https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d

Tembhurkar, M. P., Tugnayat, R. M., & Nagdive, A. S. (2014). Overview on data mining schemes to design business intelligence framework for mobile technology. International Journal of Advanced Research in Computer Science, 5(8).

Most content also appears on my LinkedIn page.