CNNs

Image Classification Using Convolutional Neural Networks

1. What is Convolutional Neural Networks (CNNs) ?

Convolutional Neural Networks (CNNs) are a class of deep learning models commonly used for image-related tasks like gesture recognition. CNNs automatically learn spatial hierarchies of features from raw image data through the use of convolutional layers. In our application, these CNNs will learn to classify hand gestures based on the processed input from Mediapipe.

The fundamental operation in CNNs is the convolution. The convolution operation involves applying a kernel (also known as a filter) to an image to detect features such as edges, textures, and patterns.

Given an input image 𝐼 and a kernel 𝐾, the output of a convolution 𝑂 at a given position (𝑖,𝑗) is computed as:

Where:

𝐼 is the input image.
𝐾 is the convolutional filter (or kernel).
(𝑖,𝑗) are the coordinates of the output feature map.
m and n are the coordinates of the filter.
This process highlights certain features of the image, such as edges, which are crucial for recognizing the structure of hand gestures.

2. CNN Structure

The typical CNN architecture used in this application includes the following layers:

Convolutional Layer: Applies multiple filters to the input image to detect features. Each filter learns different characteristics of the hand gestures.
Activation Function (ReLU): After each convolution, an activation function like ReLU (Rectified Linear Unit) is applied to introduce non-linearity. This helps the network learn complex patterns. The ReLU function is defined as:

Pooling Layer: Reduces the spatial dimensions of the feature maps while retaining the important information. A common pooling operation is Max Pooling, defined as:

Fully Connected Layer: After feature extraction, the output from the convolutional and pooling layers is flattened and passed through one or more fully connected layers to output a final classification. The activation used here is typically Softmax, which normalizes the output into probabilities for each gesture class.

PreviousAI model NextMediapipe

Last updated 7 months ago