What Is Computer Vision? Explanation, Types + Examples

Table of Contents

From self-driving cars and medical tests to surveillance cameras, computer vision is used everywhere we look.

But what is computer vision exactly?

In this post, we will delve deeper into this field of artificial intelligence and explain what it is, how it works, as well as the types of computer vision. Plus, we’ll show you real-world examples of computer vision applications. 

What Is Computer Vision?

Computer vision is the field of artificial intelligence which focused on enabling computers and systems to process and derive information from visual data such as images and videos just as a human would.

Computer vision enables computers to analyze visual input at a pixel level and understand it.

To better grasp the concept, let’s compare computer vision to human vision. Both are based on a similar principle, however, when we, humans, observe things, we already have context based on which we can identify objects.

What is computer vision & why does it matter? Video credit YouTube

In computer vision, machines need data and algorithms in order to learn how to perform these functions. The difference is that computers can analyze vast amounts of data and analyze countless products every minute which humans, naturally, can’t.  

Computer vision is used for classifying an object in a photo or video. 

How Computer Vision Works

Computer vision algorithms are based on pattern recognition. Pattern recognition, also sometimes referred to as pattern detection, is the ability of machines to identify patterns in data and use them to make predictions. 

How does that work?

We feed the computers massive amounts of visual data. The computer analyzes the data, labels objects in them and finds patterns. 

For example, if we provide a million images of a cat, the computer will analyze the images, identify patterns similar to all cats and finally, it will create a model cat. The computer will therefore be able to identify if an image we provided the next time is a cat. 

Computer vision algorithms enable machines to learn independently without being programmed to recognize a specific image. 

Computers interpret images as series of pixels that all have a set of color values. A pixel’s brightness is represented by a number that range from 0 (black) to 255 (white). The software cannot see the colors but sees the numbers and knows which color they represent. 

Basically, the process of computer vision consists of three steps:

  • Acquiring an image
  • Processing the image
  • Understanding the image

Technologies used to train computers to recognize images are deep learning (a subset of machine learning) and convolutional neural network (CNN).

A CNN is what helps break down images into pixels with labels and uses these labels to make predictions about what the system is “seeing”. It then runs a series of iterations to check the accuracy of the predictions.

Types of Computer Vision Techniques

There are different computer vision techniques depending on the specific tasks they are performing and their applications. 

The main computer vision techniques are:

1. Image Classification

One of the essential types of computer vision is image detection. 

Image recognition or classification is the ability of a machine to see an image and classify it or make accurate predictions to which class the image belongs. 

Basically, given a set of images within a single category, the algorithm teaches itself to recognize these images the next time it receives them.

2. Object Recognition

Object recognition or object detection uses image classification to detect objects or classes within images. 

Typically, the algorithm finds a class within an image and localizes the objects using a bounding box. A bounding box is an imaginary rectangle that marks objects on an image and it is used as a point of reference for object detection.

3. Object Tracking

As the name suggests, object tracking tracks an object within an image or a video once it’s detected. 

It refers to the ability of a computer to predict the position of a target object in each consecutive frame of a video once the initial position has been determined. A computer can track objects within a video offline or in real-time.

Object tracking can be:

  • Single object tracking: Tracking one specific object in a video.
  • Multiple object tracking: detecting and tracking more than one object in a video while plotting the unique trajectory of each target object. 

Computer Vision Applications

While this technology is constantly evolving and new research is being done as we speak. However,  some real-world computer vision applications have been here for a while. Computer vision is used in many areas to improve business endeavors, transportation, healthcare, and more. 

Here are a few examples of computer vision applications:

Google Translate

Google Translate is a great example of computer vision application. By pointing a phone camera at a sign in another language, users can get an almost instant translation into their language.

Google’s cameras automatically detect languages even if we don’t know what language it is we’re looking at. Google Translate can translate 133 languages as of 2022.  

Self-Driving Cars

Computer vision enables self-driving vehicles to make sense of visual input coming from the car’s cameras that capture video from different angles. 

The system processes the visual input it receives from the cameras and identifies cars, traffic signs, pedestrians, lights, and other objects in its surroundings. An example of this is Tesla’s Autopilot option which is a standard now on every new Tesla car. 

Augmented Reality

Augmented reality (AR) apps are supported by computer vision technology. Augmented reality is the interactive experience that combines the real world with computer-generated information to enhance one’s experience. 

Photo credit: Shutterstock.com/MONOPOLY919

The computer-generated inputs include the Global Positioning System (GPS), video, and sounds. Computer vision is what helps in detecting objects in GPS settings as well as in processing and comprehending images and videos. 

Thanks to computer vision, AR apps can detect physical objects in real-time and place virtual objects into their environment. 

Facial Recognition

Computer vision plays an important role in face recognition apps. Facial recognition technology enables computers to match images of people and their identities. 

How does facial recognition work? Video credit YouTube

These apps use computer vision algorithms to detect facial features within images. Neural networks are trained to detect human faces and distinguish the faces from other objects in an image. 

Facial recognition technology is now implemented into many apps we use daily. Facebook used computer vision to automatically identify people in photos but the option is no longer available as of last year. 

Surveillance

Artificial intelligence and computer vision have a number of applications in video surveillance and security. Some of the computer vision applications include: 

  • Human detection
  • Person recognition
  • Vehicle surveillance
  • Weapon detection
  • Traffic incident detection

Computer vision-enabled surveillance systems help increase accuracy and overcome some of the biggest challenges such as identifying faces or detecting a dangerous situation. For example, through high-definition surveillance cameras, surveillance systems can detect violent behavior and alert the authorities promptly. 

Healthcare

Computer vision is widely applied in healthcare for diagnosis. Magnetic resonance imaging (MRI), X-rays, and many other medical tests are based on image processing. Computer vision algorithms help process these images and detect diseases. 

Deep learning and computer vision are also used in cancer detection and it is believed to show a much higher accuracy compared to human doctors. 

What is Computer Vision: Key Takeaways

Computer vision is a field of AI that enables computers to process and understand visual data such as images and videos. 

Computer vision systems consist of three steps: Acquiring an image, processing an image, and understanding an image. 

Types of computer vision techniques include: 

  • Image classification
  • Object detection
  • Object tracking

Computer vision applications are already present in our everyday lives, from Google Translate app and self-driving cars to augmented reality, facial recognition, surveillance, and healthcare. 

As computer technology evolves, we are going to see more and more of these applications used to improve business processes, increase accuracy, and bring an array of other benefits. 

Neil Sahota
Neil Sahota (萨冠军) is an IBM Master Inventor, United Nations (UN) Artificial Intelligence (AI) Advisor, author of the best-seller Own the AI Revolution and sought-after speaker. With 20+ years of business experience, Neil works to inspire clients and business partners to foster innovation and develop next generation products/solutions powered by AI.