I recently completed an assignment for my Big Data Technologies unit and one of the questions is about topic modelling. Being a newbie in the data science domain, I have to confess that my head was spinning while going through multiple articles and trying to understand what exactly topic modelling is and what I am supposed to do with it. If you are in the same boat or just simply want to expand your knowledge, then this blog post will lift some of those wrinkles on your forehead by explain topic modelling with absolute simple language.
The Business: Hoot Cookies
Once upon a time, an owl named Hoot was living in the dark green forest. He was a professional baker who made the best melt-in-your-mouth cookies. His business, Hoot Cookies, always had a queue snaked around the block.
The autumn was well worn and Christmas was coming. As Hoot spent nights after nights at the kitchen counter, baked cookies of every kind layered the counter. From chocolate chip cookies to gingersnap, there was always something for everyone. But Hoot observed something strange. For the past one week, the business had reduced by half and regular customers no longer visited the shop as frequently.
Hoot knew he had to do something. But he had no idea what might go wrong and why his customers suddenly stopped showing up. After thinking carefully, Hoot decided that gathering feedback from his customers would be the next thing to do before reacting to the situation. And so he sent an email to his customers.
Key Point 1: Instead of being obsessed with the business, focus on what your customers have got to say.
The Problem: Time-consuming Text Analysis
After the first hour, 3 customers came back with their thoughts.
Hoot was staying up all night, eagerly reading the replies and taking notes. But as soon as he finished reviewing the first 3 replies, 10 more replies came in. Another hour, 15 more. And the next morning, 200 unread emails appeared in Hoot’s inbox. As the number of replies kept rising, Hoot’s head started spinning and he soon realised it might take him till the next summer to finish eyeballing all replies and understanding what customers were talking about his business.
But… Hoot being Hoot, he didn’t give up that easily. Instead, he came to visit his old friend Brownie to ask for some help to solve his problem.
Key Point 2: But customers’ feedback can be massive, unstructured, unorganised and time-consuming to understand.
The Solution: Topic Modelling
“I have an idea, but you need to give me a few hours to look at all emails you got, please. I won’t be long, promise!” Brownie said.
Being a helpful bear, Brownie sat down in front of his computer and started typing as quickly as possible. As the moments passed, the piles of freshly baked cookies grew in Brownie’s kitchen as Hoot continuously created more sweet treats for his friend.
Suddenly the typing sound on the keyboard stopped, Brownie turned around and said, “Come here bro. Let me show you what you want to know.”
As soon as Hoot looked at the screen, his light bulb was switched on.
“Ah ha, this is awesome. How did you do that, buddy?”
“I used a technique called topic modelling to sort out all feedback. When we aren’t sure what we aren’t looking for, topic modelling helps to condense long chunks of text into concise words to understand the main ideas. In your case, from hundreds of emails, you discover majority of your customers are hoping for improvements in online delivery, pricing and special dietary requirements.”
Key Point 3: Topic modelling helps to quickly make customers’ feedback more concise, thus allowing more time to understand the main ideas and deciding on what to do next for the business.
Where do we go from here?
As this short blog post offers a simple and easy to understand explanation about topic modelling, we have barely scratched the surface of what topic modelling is capable of when mining text. For more technical details about topic modelling, don’t hesitate to check out this awesome post on ML+. It would be incomplete not to mention Latent Dirichlet Allocation (LDA), which is arguably the most common technique for topic modelling.
To kickstart the journey with topic modelling, one can never go wrong with some reading up on Gensim, Mallet and pyLDAvis, which are 3 crucial Python libraries for the task. I personally had lots of fun using LDA to perform topic modelling on a Twitter dataset as part of my Python assignment.
When baking cookies, from raw ingredients comes something of beauty. When performing topic modelling, from textual chaos comes order and understanding. Hope you have enjoyed reading this. Cheers everybody!