Your First Step Into The Field Of Computer Vision

Knowing various Computer Vision tasks and their formatsImage Source: Wikimedia (Creative Commons)Introduction:Computer Vision(CV) is the field that makes computers extract information from images like how the Human Visual System can do.There are various Computer Vision tasks, all of which typically take an image of dimension WxHx3 as input and produce the output according to the task at hand. While W and H represent the image’s resolution (Width and Height), 3 is the number of channels (R, G, and B).“The first step to solving a problem is to define it well.”This article covers various Computer Vision tasks and their formats, defined mathematically, highlighting what the corresponding inputs and outputs for each task are.Various Computer Vision tasks:Image Classification- Binary Classification - Multi-class Classification- Multi-label ClassificationClassification with LocalizationObject DetectionImage Segmentation- Semantic Segmentation - Instance Segmentation- Panoptic Segmentation1. Image Classification:Image Classification is the task of classifying or categorizing a given image into pre-defined classes.Binary Classification:Binary classification classifies a given image into either of the two pre-defined classes.Input: X — Dimension: W x H x 3Output: Y — Dimension: 1Here, the ground truth value of an output Y will be either 1 or 0 (for positive and negative examples, respectively).However, a binary classification model’s output will typically be a number between 0 and 1, which will be treated as a probability of the input image belonging to the positive class.Binary Classification Model (Image by Author)The below example classifies an input image into either ‘Cat’ or ‘Not a Cat.’Binary Classification Example (Image by Author)Multi-class Classification:Multi-class classification classifies a given image into one among the many(n) pre-defined classes.Input: X — Dimension: W x H x 3Output: Y — Dimension: n x 1Here, the ground truth value of an output Y will be a one-hot encoded vector (position of 1 indicating the class assigned to the input image).However, a multi-class classification model’s output will typically be a vector of numbers between 0 and 1, all of which add up to 1. These will be treated as probabilities of the input image belonging to the respective classes.Multi-class Classification Model (Image by Author)The below example classifies an input image into one among the n pre-defined classes.Multi-class Classification Example (Image by Author)Multi-label Classification:Multi-label classification is similar to multi-class classification except that the input image can be assigned with more than one pre-defined class among n.Input: X — Dimension: W x H x 3Output: Y — Dimension: n x 1Here, the ground truth value of an output Y can have 1 in more than one position. So it can be a binary vector instead of a one-hot vector.However, a multi-label classification model’s output will typically be a vector of numbers between 0 and 1. Note that these numbers need not add up to 1 since an input image can belong to more than one pre-defined class at a time. Each number is treated as the probability of the input image belonging to the corresponding class.Multi-label Classification Model (Image by Author)The below example assigns multiple classes to a given input image as applicable.Multi-label Classification Example (Image by Author)2. Classification With Localization:Classification with localization typically means a multi-class classification along with localizing the portion of the input image that determined its classification output.Input: X — Dimension: W x H x 3Output: Y — Dimension: (1+4+n) x 1In the ground truth value of Y, Pc is 1 if anyone among n classes is present in the input image and is 0 otherwise. bx, by, bw, and bh represent the bounding box around the portion that determined the classification output. And C1 to Cn is the one-hot vector representing the classification.Note that when none of the n classes is present in the input image, Pc is 0, and the rest of the values in the output are immaterial.Classification with Localization Model (Image by Author)The below example classifies the input image as a Horse along with outputting the localization information.Classification with Localization Example (Image by Author)While x, y, w, and h are the absolute values in the image, bx, bw are expressed as a fractional value in W, and bx,bh in H.Input Image (left) and Localization information superimposed on Input Image (right). (Image by Author)Note that the origin (0, 0) is always considered to be the top left corner of the image. And x is along the width (from left to right) of the image and y along the height (from top to bottom).3. Object Detection:Object detection is typically the localization applied to multi-label classification.YOLO is one example of an object detection model. It divides the input image into a gxg grid of cells and produces one classification-localization vector as output fo

Your First Step Into The Field Of Computer Vision