2 min read

Hand Gesture Recognition using Support Vector Machine and Bag of Visual Words model

Table of Contents

Abstract

This paper explores a very well known technique for image classification and recognition that is Bag of Visual Words. The process involves feature extraction using Canny Edge Detector and Scale-Invariant Feature Transform (SIFT), codebooks construction using generative model like K-Means and Vector quantization. Finally, classification is done using Support Vector Machines (SVM) using chi-squared kernel. After applying 10 cross validation the accuracy comes to be around ≈ 70%.

View Paper:

Overview

As a part of final project for the course Statistical Machine Learning during my Master of Science at Rochester Institute of Technology I worked on development of a model for Hand Gesture Recognition with an additional constraint of a very small dataset.

Methodology

  • Feature Extraction: Canny Edge Detection removes background noise, while SIFT identifies keypoints and descriptors.
  • Bag of Visual Words (BoVW) Construction: Features are clustered using K-Means, and histograms of visual word frequencies are created.
  • Vector Quantization: The histograms are transformed using TF-IDF vectorization to form input feature representations.
  • Classification with SVM: A chi-squared kernel is used for classification due to its suitability for handling histograms of visual words.

Despite moderate accuracy, the study suggests that improvements can be made using deep learning or additional feature extraction techniques like HOG or SURF.

alt text

alt text

alt text

Architecture

alt text

Conclusion

The classification accuracy is not really good taking into account deploying this model in production. With a 10 cross validation we could get an accuracy of ≈ 70%. This can be improved by applying techniques like deep learning and transfer learning. Which is out of scope for this project, as it explores image recognition utilizing non-neural network and transfer learning techniques.

Technologies

Python, OpenCV, SIFT, Bag-of-visual-words, SVM, Edge Detection