Sign Lanaguaage Hub
  • Welcome to Sign Language Hub
  • Problem to solve
  • Roadmap
  • Technology
    • AI model
      • CNNs
      • Mediapipe
      • Usage
    • System
  • FAQ
  • Offical Links
  • Articles
    • Alphabet
    • Training and Validation Metrics Analysis for Model Performance Optimization
    • SLHUB update 1.2.2: Two-hand version
Powered by GitBook
On this page
  1. Technology
  2. AI model

Mediapipe

Mediapipe for Hand Gesture Detection

PreviousCNNsNextUsage

Last updated 5 months ago

Mediapipe, developed by Google, provides a pre-built pipeline for real-time hand tracking, detecting hand landmarks with high accuracy. The system detects 21 key landmarks per hand, which are crucial for recognizing gestures.

Mathematical Representation of Hand Landmarks

Mediapipe outputs the positions of 21 hand landmarks, which are represented as 2D or 3D coordinates (𝑥,𝑦) for each point. These coordinates correspond to the joint positions and the fingertip locations.

For example, the coordinates of the landmark on the thumb might be L1{L}_1L1​​ = (x1{x}_1x1​, y1{y}_1y1​, z1{z}_1z1​), and similarly for other landmarks.

The distances between landmarks or their relative angles are essential features for classification. For instance, the distance between the tip of the index finger and the thumb dindex-thumb{d}_{\text{index-thumb}}dindex-thumb​ ​ is calculated as:

dindex-thumb=(x2−x1)2+(y2−y1)2+(z2−z1)2d_{\text{index-thumb}} = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2}dindex-thumb​=(x2​−x1​)2+(y2​−y1​)2+(z2​−z1​)2​

This distance can serve as a feature for distinguishing different hand shapes or gestures.

21 hand lanmarks