AI-Gesture-Controlled-Drone

This project develops an AI-based hand gesture recognition system designed to control a drone through real-time hand gestures. By leveraging MediaPipe for hand tracking and a custom CNN (Convolutional Neural Network) for classification, the system translates physical movements into drone commands (e.g., takeoff, land, movement) without the need for traditional controllers.

Project Overview

The system features a hybrid pipeline that combines precise hand landmark detection with specialized binary image classification to ensure robust performance across various lighting conditions and backgrounds.

Key Features:

Dual-Model Pipeline: Uses MediaPipe for hand localization and a custom TFLite model for gesture classification.
Binary Processing: Converts hand crops to black-and-white (binary) to focus on morphology rather than skin tone.
Apple Silicon Optimized: Performance-tuned for Apple M5 chips using XNNPACK delegates.

Model Performance

The model was trained for 20 epochs using a CNN architecture:

Training Accuracy: 98.41%
Validation Accuracy: 99.93%
Input Format: 96x96 Grayscale Binary Image

Dataset

The model is trained on a specialized version of the Hand Gesture Recognition Dataset.

Source: Hand Gesture Recognition Dataset (Kaggle)
Total Images: 24,000 (18,000 Train / 6,000 Test)
Classes: 20 distinct gesture categories (0-19)
Input Specs: 96x96 pixels, Grayscale (Single Channel)

System Pipeline

The following workflow describes the real-time inference process:

Camera Input (RGB)
↓
MediaPipe Hand Landmarker (Hand Localization)
↓
Crop & Preprocess (Grayscale + Otsu's Thresholding)
↓
Custom CNN Model (96x96x1 TFLite)
↓
Gesture Classification (20 Classes)
↓
Drone Command Mapping

Gesture Command Mapping (Example)

Gesture (ID)	Command	Description
OK (0)	Takeoff	Start the motors and hover.
Fist (11)	Land	Secure landing at current position.
Point (10)	Forward	Move the drone forward.
Rock (17)	Flip	Perform a 360 degree stunt flip.

Technologies Used

Python 3.11: Core development language.
MediaPipe: For high-fidelity hand landmark detection.
TensorFlow / Keras: Used for training the CNN classifier.
TensorFlow Lite: For lightweight, real-time edge inference.
OpenCV: For advanced image preprocessing and binary thresholding.
XNNPACK: Optimized CPU inference for Apple M5.

How to Use

1. Environment Setup

Activate the virtual environment and install dependencies:

# Create the virtual environment (only need to do this once)
python3 -m venv venv_detect

# Activate the virtual environment
source venv_detect/bin/activate

# Upgrade pip and install required packages
pip install --upgrade pip
pip install mediapipe tensorflow opencv-python numpy

2. Required Models

Before running the detection, you must download the official MediaPipe model:

Hand Landmarker Bundle: Download the hand_landmarker.task file from the MediaPipe Official Models page.
Placement: Ensure hand_landmarker.task is placed in the project root directory.

3. Training the Model

To retrain the classifier using the grayscale binary approach:

python train_model.py

4. Running Real-Time Detection

Execute the main detection script:

python gesture_detection.py

Future Improvements

Drone SDK Integration: Connecting the command outputs to DJI Tello or ArduPilot.
3D Gesture Tracking: Utilizing Z-axis data from MediaPipe for altitude control.
Robustness: Adding more background-noise augmentation to the binary training set.

Drone Integration Strategy (Future Work)

The trained TFLite model is designed to be integrated into drone systems using two primary architectural approaches:

1. Ground Control Station (GCS) Mode

This is the most accessible method for drones like the DJI Tello.

Workflow: The drone streams live video via Wi-Fi to a laptop (Ground Station). The laptop runs gesture_detection.py using its CPU/GPU (optimized for Apple Silicon).
Command Transmission: Recognized gestures are translated into SDK commands (e.g., tello.takeoff(), tello.land()) and sent back to the drone over the same Wi-Fi network.
Tools: djitellopy library for Python-based drone control.

2. Onboard Edge AI Mode

For autonomous or semi-autonomous drones (e.g., custom builds with Pixhawk or Betaflight).

Hardware: Mounting a lightweight companion computer such as a Raspberry Pi 4 or NVIDIA Jetson Nano on the drone.
Efficiency: Since the model is in .tflite format, it is highly optimized for these edge devices.
Communication: The companion computer processes the camera feed locally and sends MAVLink commands to the Flight Controller (FC) via a serial connection.

Proposed Gesture-to-Command Mapping

Gesture (Folder ID)	Drone Command	Action Description
Folder_5	Takeoff	Start the motors and hover at 1m altitude.
Folder_11	Land	Perform a controlled vertical landing.
Folder_0	Stop / Hover	Stop all movement and hover in place.
Folder_1	Move Up	Increase drone altitude (Fly Upward).
Folder_12	Move Down	Decrease drone altitude (Fly Downward).

Technical Challenges to Address

Latency Control: Optimizing the Wi-Fi video stream to minimize command delay.
Safety Interlocks: Implementing a "Command Confirmation" logic (e.g., gesture must be held for 0.5s) to prevent accidental maneuvers.
Dynamic Lighting: Enhancing binary thresholding stability for outdoor environments.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
convert_to_tflite.py		convert_to_tflite.py
gesture_detection.py		gesture_detection.py
gesture_model.tflite		gesture_model.tflite
requirements.txt		requirements.txt
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Gesture-Controlled-Drone

Project Overview

Key Features:

Model Performance

Dataset

System Pipeline

Gesture Command Mapping (Example)

Technologies Used

How to Use

1. Environment Setup

2. Required Models

3. Training the Model

4. Running Real-Time Detection

Future Improvements

Drone Integration Strategy (Future Work)

1. Ground Control Station (GCS) Mode

2. Onboard Edge AI Mode

Proposed Gesture-to-Command Mapping

Technical Challenges to Address

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-Gesture-Controlled-Drone

Project Overview

Key Features:

Model Performance

Dataset

System Pipeline

Gesture Command Mapping (Example)

Technologies Used

How to Use

1. Environment Setup

2. Required Models

3. Training the Model

4. Running Real-Time Detection

Future Improvements

Drone Integration Strategy (Future Work)

1. Ground Control Station (GCS) Mode

2. Onboard Edge AI Mode

Proposed Gesture-to-Command Mapping

Technical Challenges to Address

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages