Mediapipe

Mediapipe for Hand Gesture Detection

Mediapipe, developed by Google, provides a pre-built pipeline for real-time hand tracking, detecting hand landmarks with high accuracy. The system detects 21 key landmarks per hand, which are crucial for recognizing gestures.

Mathematical Representation of Hand Landmarks

Mediapipe outputs the positions of 21 hand landmarks, which are represented as 2D or 3D coordinates (š‘„,š‘¦) for each point. These coordinates correspond to the joint positions and the fingertip locations.

21 hand lanmarks

For example, the coordinates of the landmark on the thumb might be L1{L}_1​ = (x1{x}_1, y1{y}_1, z1{z}_1), and similarly for other landmarks.

The distances between landmarks or their relative angles are essential features for classification. For instance, the distance between the tip of the index finger and the thumb dindex-thumb{d}_{\text{index-thumb}} ​ is calculated as:

dindex-thumb=(x2āˆ’x1)2+(y2āˆ’y1)2+(z2āˆ’z1)2d_{\text{index-thumb}} = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2}

This distance can serve as a feature for distinguishing different hand shapes or gestures.

Last updated