• Saharsh Shukla

Haptics Recognition for triggering Security alarms

Check it out here: Github


Convolutional neural networks are much more competent in learning automatically with unsupervised learning, they are also highly efficient and faster while reducing the compute time compared to its predecessors. With the introduction of OpenCV, we are capable of infinitely more as every physical aspect that was inaccessible essentially can be represented and converted into data including video feeds as well as pictures. With the use of Deep learning, this data can be processed to get meaningful outputs. This paper explores the application and recreation of hand sign recognition with the use of open computer vision or OpenCV and CNN.


Convolutional Neural Networks are a class of deep learning neural networks that are used for the analysis of images and in classification of large sets of image based data. These networks use the convolution operation instead of direct multiplication of matrices therefore providing a much faster as well as accurate result, while its predecessors used multiplication of large matrices with each other. This leads to a much faster result due to the decrease in the total number of operations. The complexity of these operations increases significantly and therefore require much more competent CPUs or GPUs to work with. The breakthrough in CNN technology was only after the invention and advent of faster GPUs and TPUs (Tensor Processing Units). It is important to note that in mathematics a tensor is an algebraic object that describes a relationship between sets of algebraic objects with objects related to vector spaces, these are essentially what we supply to a convolutional neural network.

Convolutional neural networks were invented in the 1980s but could not be utilized because of lack in hardware. These networks are shift or space invariant. Most of the data that is fed through to MLPs needs to highly processed and manipulated because of the inability to define variance in location of an object and the inability to understand and separate individual objects from the main Image.



Assuming we have a data set with images. A neural network is to be created that is capable of recognizing the said images in the data therefore classify these images with certain labels. Each image and image type or category have certain characteristics, these characteristics are called features. To detect these features is the challenge that arises with image processing. To solve this challenge neural nets were programmed – These neural nets or multilayer perceptron modeled after the human brain. These MLPs were a simulation of the brain which were comprised of nodes or neurons that were connected to each other and were activated only when the connections had reached a certain threshold.

When processing images, MLPs used one perceptron for each pixel input. Therefore, for images that were large the amount of weight carried increases dramatically and therefore renders it unmanageable. Thus overfitting of the model was a very common occurrence.

MLPs are also sensitive to shifting of images and are not translation invariant.

To solve the issues caused by MLPs, Convolutional neural networks were written.


The concept revolves around the fact that in any image the nearby pixels have a stronger relational carry than distant ones. A small filter is created and stored in for the form of a matrix that analyses the image by moving from the top left of the image to the bottom right, this filter is mathematically a square matrix and for each point where the filter is, a value is calculated based on the value of each square through the convolution operation in linear algebra.

These filters can be related to any number of features scanning for specific objects. These filters are automatically updated when the neural network is trained. Some filters like edge detection and sharpening are kernel filters for CNNs.

Feature Map

Once the filter has passed over the image a feature map is generated for each filter. An activation function is responsible for deciding whether a certain feature is present in an image for a given location or not, these individual features are stored in a feature map which can then be used for operations. Once the feature map is constructed any number of filters can be added to the Neural network.

CNNs are made up of multiple layers that are not connected completely, they comprise of cube shaped weights that are applied throughout the image. These 2D slices of the filters are called kernels. The filters introduce translational invariance.

ReLu Activation (Rectified Linear Unit)

To introduce non linearity into the Convolutional neural network, a rectified Linear Unit combats the vanishing gradient problem occurring in sigmoids, that is when more layers using a certain activation function are added to the neural network the gradients of the loss function approach zero which renders the neural network hard to train.

Certain functions show a large change in input but the derivative shows minimal change in output therefore the output change is also small. Using ReLU will not cause small derivatives and combats this issue.

Combining layers

To get the final result multiple filters are combined, a 3D cube is generated out of each 2D slice of the filters. A feature map dimension can change from one later to the next.

Note: Tensorflow receives data in the form of tensors, which are also 3D shapes present on a vector space.

In Conclusion, we find that each layer of a CNN learns filters of increasing complexity, initial layers detect the edges as well the corners and the middle layers detect the more complex features such as the parts of the object. The Last or ending layers learn to recognize the objects in their entirety.

The Network

The Network comprises of multiple layers as is expected of a CNN, the input layer is fully connected to a Conv Layer, which fully conects to the max Pooloing layer, after pooling operation the second Conv layer and pooling layers are connected, the image is flattened and dense layers are connected with dropout including the ReLu activation.


As soon as the program runs, the contour window pops up that shows the focus of the camera or the cropped area in which the processing and filtering takes place.

A representation of the input data

The filters process the edge and the added models process the image into HSV color from BGR, this helps with, compression of the image and therefore with the compute time and processing of image is faster.

The Framework for the processing include capture function with the read method followed by a horizontal flip.

The image is then converted into HSV color format which is essentially a different representation of RGB, it is important to note that OpenCV receives all inputs in the BGR format. Cv2 has a method called ‘cvtColor’ that applies the HSV mask and changes the color of the image, after which the mask is applied to separate the colors and convert a certain part into Gray while the other is normally processed. This essentially crops a part of the original capture and processed the color only inside that part into Gray, for this, first a range of pixels is specified which is then processed to gray, monochrome format. The output Emoticon is shown on the image to the right.

Accuracy and loss

4 views0 comments

Recent Posts

See All