As I go through various data analytics books, I slowly discover data mining tasks such as classification, regression, clustering, causal modelling and so on. But the same question keeps popping up in my mind, “How do I know when to do what task?”
So in this post, let’s join me and explore the 9 popular data mining tasks, together with some common real-life applications. By the end of the post, I will also attempt to create a quick guide to frame business questions into data mining tasks to address my earlier question. Let’s go!

Classification
The question
Among a limited number of mutually exclusive classes, which class can we put an individual into?
Use case
- Churn analysis: Consider each customer, who are likely to switch to a competitor? 2 mutually exclusive classes here include will churn and will not churn
- Credit application assessment: Given an application for credit cards, do we approve, reject or request further human evaluation based on personal details such as annual income and historical debts? 3 mutually exclusive classes here are approve, reject and flag for review.
Regression
The question
Given a variable of interest, how much is the expected value of that variable?
Use case
- Sales forecast: Given historical monthly sales numbers for the past 3 years and other macroeconomic data, predict monthly sales of the next financial year.
- Pandemic prediction: Considering all information about a pandemic outbreak, calculate the projected death rate in Country A for the next 3 months.
- Electricity usage: Taking into account historical electricity usage of the city, predict the average electricity usage for each geographical area in the next summer
Similarity matching
The question
What are those customers/ products that are similar to a targeted one?
Use case
- Product recommendation: Ever seen Netflix’s recommendations for similar movies? How about major online stores showing a section of “You may also like this…” The underlying assumption is people who like one product will likely enjoy a similar offering.
- Targeted ads: Based on Web browsing activities and online purchasing history, online advertisers identify targeted users (who share similar profiles with existing customers) to show them specific advertisements. Clustering
Clustering
The question
At the first glance, how does the entire population organise themselves into different groups?
So what’s the difference between clustering and classification? Classification requires pre-defined classes whereas clustering doesn’t start with any existing grouping. If you are doing exploratory data analysis to understand similarity among things, then clustering will more likely be used. On the other hand, if you have a specific purpose in mind, which is to sort an individual into one of the pre-defined buckets, then classification is the way to go.
How about clustering versus similarity matching? Both tasks look at the similarity among things, but the purpose is different. Clustering looks for the different groups or segments that data is naturally organised themselves into whereas similarity matching identifies similar individuals to a target.
Use case
- Customer segmentation: How many different types of customers do I have?
- Employee training plan: Considering all current employees, how many groups of roles/ pathways are we planning for career development and professional training?
Association rule discovery
The question
What items or events usually occur together?
Use case
- Market basket analysis: Supermarkets analyse past transactions to understand which products are usually purchased together. The goal is to improve their store layout, conduct in-store promotion activity for cross-selling or create an online product catalog.
- Bioinformatics – Protein Sequences: By observing the sequence of different amino acids present in a protein, researchers can better understand the composition of protein sequences to facilitate the synthesis of artificial proteins.
Behaviour description
The question
What is the typical behaviour of this specific individual or group?
Use case
- Fraud detection: The bank usually holds a profile of your typical spending behaviour. When someone makes transactions on your credit card without you knowing, the bank can automatically cancel your credit card and inform you about the suspicious transaction for potential refunds.
- Cybersecurity alert: Recently received a ‘suspicious sign-in prevented’ email from Google? Based on your historical log-in behaviour, Google has noticed the unusual activity and flag it out for your notice.
Link prediction
The question
Based on the existing relationships, what are the missing links that are likely to exist?
Use case
- Social media’s friend suggestion: The logic goes something like this. “Since you and Julie have 15 mutual friends on Facebook, maybe you would like to add Julie as your friend?”
- Criminal intelligence analysis: Given the relationships between known terrorists and their social network, polices can identify possible missing links to new suspects to detect and prevent potential terror attacks.
Data reduction
The question
From a huge set of data, what is the main gist or what are the key points?
Use case
- Sentiment analysis: By analysing all existing posts on social media, companies can determine the key topics related to the brands and the products.
Causal modelling
The question
What factors can actually influence the outcome?
Use case
- Product pricing: A company offer different service plans at different price points to determine the best price point for new service offerings. The key here is to determine how different price points affect the decision to subscribe to the service.
- Predictive maintenance: Does missing preventive maintenance for your car in the past 12 months lead to early breakdown of the car?
Wrapping Up: How do we frame our business questions into data mining tasks?
Here is a quick summary of how to do it.
