Our implementation went through several distinct development phases. After evaluating various computer vision libraries, we chose MediaPipe for its robust hand detection capabilities, which provided pre-built functionalities for identifying and tracking hands, including detailed finger position data essential for our gesture recognition system. Rather than using pre-trained models limited to standard gestures, we implemented a system that allowed us to create and train recognition for custom gestures, involving capturing images of our gestures in real-time and using them to build a custom dataset for machine learning-based recognition.
We mapped specific gestures to robot behaviors, initially using random gesture assignments before transitioning to a more intuitive number-based system for better usability. The gesture control system evolved from basic commands to more sophisticated interactions, with movement control using the robot's odometry to create consistent patterns rather than time-based commands, leveraging the robot's positioning system to achieve precise movements and turns for driving in various shapes.
The number gesture system used intuitive commands where each number triggered a specific robot behavior: Number 1 (pointer finger) for forward movement, Number 2 (peace sign) for turn right, Number 3 (three fingers) for turn left, Number 4 (four fingers) for stop, Number 0 (fist) for continuous spinning, open hand for drawing a square, triangle gesture for drawing a triangle, and thumb and index finger pinch for speed control with distance proportional to speed. We also began work on an advanced path drawing feature that would allow users to "draw" paths in the air for the robot to follow, involving tracking finger movement, recognizing the intended shape, and converting it to precise robot navigation commands.