Computer vision is one of the most extraordinary gifts coming out of the artificial intelligence world. With computer vision, many companies have attempted to see the world through computers’ eyes and made great strides in solving complex business problems such as identifying product defects in real-time, verifying customers’ identification or automating insurance claims process. Overlooking such real-life applications of computer vision could represent missed opportunities to unlock growth, productivity and cost-savings for businesses. So what is computer vision and how can it help?
- What is computer vision?
- How is computer vision applied in today’s world?
- How do we implement computer vision with machine learning?
What is computer vision?
A brief journey in time
Long before computers were invented, scientists have tried to find ways to understand how our eyes and brain work together to recognise and react to what we see. Believe it or not, much of what we know today about visual perception can be traced back to neurophysiological research conducted on cats since the 1950s by David Hubel and Torsten Wiesel.
During the 1960s, artificial intelligence became an academic discipline. It was during this period that computer vision was first introduced as an MIT summer project, which was regarded as a stepping stone to creating a computer can perform human cognitive functions such as seeing, learning, reasoning and solving problems. Although the summer project didn’t succeed, it marked the official birth of computer vision as a scientific field which seeks to enable computers to automatically see, identify and understand the visual world, simulating the same way that human vision does.
Yes, that was the goal, but back then, our technology simply wasn’t ready. Luckily for us, we didn’t have to wait too long. During the 2000s, 4 important factors have converged to make a whole new paradigm for computer vision a reality.
Today, we’ve come a very long way as computer vision is one of the hottest areas of artificial intelligence and machine learning with a wide range of business applications and tremendous potential. But before diving into real-life use cases, let’s attempt to define computer vision and understand what problems it can solve. So here we go, everyone.
Human vision versus Computer vision
Computer vision is the automatic analysis of images and videos by computers in order to gain some understanding of the world.From A Practical Introduction to Computer Vision with OpenCV by Kenneth Dawson-Howe
As you can see from this picture, what we try to achieve is to determine whether the dog in the image is our four-legged buddy. In the human vision system, we see with our eyes, then let our brain understand the image and recognise whether he is our dog through a very complex reasoning process. Similarly, a computer vision aims to mimic the same process of understanding the image, matching it with known features of our dog and recognising whether he is our dog or not. So in a nutshell, human vision and computer vision are simply two different means to an end, which is to interpret visual information.
Understanding what problem is being solved
But why do we need another mean if our human eyes and brain already have such powerful capabilities?
It’s because computers are very good at performing a single task extremely fast without getting distracted like a human. Over the past decades, convolutional neural networks have demonstrated object recognition accuracy better than or comparable to human. For example, in 2015, the PReLU-Net deep network became the first computer model to surpass human accuracy on the ImageNet 2012 dataset.
Impressive! But what problem does it solve for businesses anyway? Well, let me explain.
- Computer vision can help to automate small, repetitive visual tasks related to analysing and interpreting images or videos to save costs and free up our time for more strategic activities.
- Computer vision can help to conduct more consistent and accurate visual assessments, thus enable data-based decisions within seconds for better consumer experience, security or quality improvements.
How is computer vision applied in today’s world?
The advent of computer vision during the last decade brought about great changes across various industries. Here are 7 common tasks related to computer vision that we might have even come across in our daily lives.
- Object detection: Locate the presence of a specific object(s) in an image or video
- Image classification: Categorise and label image based on specific rules
- Image segmentation: Divide a single image into various segments to separately process relevant segments
- Feature matching: Match corresponding features from two similar images or videos
- Edge detection: Find the boundaries of objects within images by identifying points in an image at which the image brightness changes sharply
- Pattern detection: Automatically recognise patterns and regularities in images or videos
- Facial recognition: Identify or verify the identity of a person using their facial features
It’s helpful to note that when the problem is simple, certain use cases of computer vision might only aim to fulfil one of these tasks. However, in the real world, things might get fairly more complicated with a myriad of challenges and environmental variations. For example, a robotic harvester might need to employ object detection to locate exactly where each strawberry is among its stem and leaves, then use image classification to categorise each strawberry into ripe and unripe categories to make sure it only picks those ripe ones. Therefore, complex business applications have to be able to execute various computer vision tasks at the same time to interpret images or videos. In the below picture, let’s take a look at those fairly complex business applications that are happening right here right now!
How do we implement computer vision with machine learning?
We have talked a lot about what are the possibilities with computer vision. It’s also not a secret that machine learning (in the form of neural networks) can create computer vision applications with amazing accuracy. But here is the million-dollar question to answer: How can we achieve these amazing computer vision capabilities with machine learning?
Well, admittedly, it would probably take an entire book to cover this topic. But staying true to the spirit of giving all non-technical readers a comprehensive overview, here are 2 main approaches to build a computer vision model with machine learning in layman’s terms to get you started.
1st Approach: Traditional Machine Learning (ML)
Long before deep learning was even a thing in the world of machine learning, many computer vision models are built entirely on traditional ML algorithms such as decision trees, support vector machines or logistic regression.
To keep it simple, these ML algorithms are just modifiable math functions. Based on known pairs of inputs and outputs, computers learn to tweak and tailor the math functions to better associate certain inputs with certain outputs. Over time, if the inputs and outputs cover sufficient real-life complexities with all sorts of exceptions and unusual circumstances, the math function will be fine-tuned to represent reality as closely as possible, thus enabling more accurate detection or classification of objects, plants, animals or people. How cool is that?
The above picture illustrates a simplified process of traditional machine learning. As we look at this process, it’s crucial to keep in mind the following 2 points.
Firstly, building a machine learning model for a business application is an iterative process even after deployment. The golden model should be constantly monitored, updated or even re-created from scratch to stay relevant to business changes.
Secondly, traditional machine learning requires some serious human intervention to succeed. For example, since we can’t directly apply traditional ML algorithms to our raw data (e.g. images or videos) to perform computer vision task, data scientists have to perform an additional data preprocessing step called Feature Extraction to translate raw data to structured and shaped data into relevant features, which are essential inputs for machine learning algorithms. Don’t underestimate what it takes to properly extract features from raw data because this time-consuming task usually takes multiple iterations and demands proper domain knowledge.
2nd Approach: Deep Learning
When thinking about deep learning, many of us think of some deep dark mystery. But it isn’t a mystery at all. First and foremost, it’s helpful to understand that deep learning is a subset of machine learning. At its core, the above-mentioned traditional machine learning and deep learning for computer vision share the same goal: trying to find a math function that expresses reality (together with all of its complexities and exceptions) as closely as possible by examining a huge amount of examples (a.k.a. huge training datasets).
So why might we want to consider deep learning over traditional machine learning? The arguably biggest advantage here is that deep learning does not require human to perform any feature extraction tasks (Remember how we said above that it’s time-consuming and requires very specific domain knowledge?).
But hang on a sec! Before we rush to the conclusion that deep learning is the silver bullet of all problems, here are 2 most crucial aspects to note.
- You can’t use deep learning if you don’t have sufficient data to train it. And by sufficient, I mean not only in terms of quantity but also quality (i.e. relevant, complete and free from bias).
- Deep learning won’t work if you don’t already possess or aren’t willing to pay for powerful computing power to process a large amount of data and perform complex mathematical calculations.
As compared to traditional machine learning, deep learning requires much more training data and computing power to make it work. It’s not the only way and not always the best way. The technology itself is powerful yet not as mature as other traditional methods. Therefore, the decision to proceed with deep learning for computer vision should never be taken lightly.
Where do we go from here?
Like so many other things in life, we take our human vision for granted until we attempt to mimicking it with computers. As of today, we’re nowhere near understanding, let alone being able to simulate the way our eyes and brain work together to understand the beautiful world surrounding us. But that doesn’t mean computer vision remains a novel idea with zero relevance for businesses. We are already seeing it on our phones, on our streets, in our offices and even more so in those factories that manufacture different products that we buy every day.
Those big names associated with above-mentioned use cases also don’t mean you have to be Tesla or Walmart to learn how computer vision could help us work more efficiently. With various cloud-based pre-trained machine learning models such as Google’s Cloud Vision, Amazon Rekognition, Azure Computer Vision and other solutions in the market, the options to explore and experiment with computer vision are truly endless. Therefore, if your team are currently dealing with a large number of images or videos regularly, it’s never too late to reimagine how computer vision can help to get the job done effortlessly. The discovery journey has a cost, but so does refusing to experiment, fail, learn and grow.