Project Overview:

An experimental project exploring the integration of real-time hand gesture recognition, using Python and Google’s MediaPipe framework, with interactive animations within Unreal Engine, facilitated by OSC communication.

Context & Goal:

The primary motivation behind this project was to dive into the possibilities of real-time computer vision, specifically hand tracking and gesture recognition, as an input method for interactive experiences. The goal was to create a robust pipeline from a standard webcam feed to responsive animations within Unreal Engine, creating a simple yet engaging demonstration of this interaction loop. The playful concept of a virtual “protest” reacting to user gestures was chosen to provide clear visual feedback and a defined interaction goal.

1. Real-time Hand Gesture Recognition (Python & MediaPipe):

The core of the interaction relies on accurately detecting and interpreting hand gestures in real-time. Google’s MediaPipe framework was chosen for its efficiency and comprehensive solutions for on-device vision tasks.

  • MediaPipe Gesture Recognizer: Instead of solely relying on raw landmark data, this project leveraged MediaPipe’s integrated GestureRecognizer task. This task combines the powerful HandLandmarker (which identifies 21 key hand landmarks) with a pre-trained classification model. This model is capable of recognizing a set of common static hand gestures directly from the image or video stream.
  • Python Script & Webcam Input: A Python script utilized the OpenCV library (cv2) to capture the video feed from a standard webcam. Each frame was then processed by the MediaPipe GestureRecognizer.
  • Gesture Classification & Output: The GestureRecognizer directly outputs the classification results, typically including a category name (e.g., Peace, ThumbsUp, OpenPalm etc) and a confidence score for the most likely gesture detected in the frame. The Python script simply extracted the name of the highest-scoring recognized gesture above a certain confidence threshold to ensure reliability. This significantly simplified the process compared to manually implementing classification logic based on landmark coordinates. If no gesture was recognized with sufficient confidence, a default state was reported.

2. Bridging Python and Unreal Engine (OSC Communication):

A seamless, low-latency communication channel was needed to send the recognized gestures from the Python script to Unreal Engine.

  • OSC Protocol: Open Sound Control (OSC) was chosen for its lightweight nature, speed, and my experience with OSC from past projects. It’s well-suited for sending discrete event messages like recognized gestures.
  • Python OSC Client: The python-osc library was used within the Python script. Upon recognizing a gesture, the script formatted an OSC message with a specific address pattern (/gesture/gesture) and an argument carrying the gesture name as a string (e.g. peace. thumpbsup etc). This message was sent via UDP to the local IP address and a designated port where Unreal Engine was listening.
  • Unreal Engine OSC Server: Inside Unreal Engine, the default OSC Plugin was configured to listen for incoming messages on the specified UDP port.

3. Driving Interaction in Unreal Engine (Blueprints):

The logic within Unreal Engine translates the incoming OSC messages into visual changes in the scene.

  • Receiving & Parsing OSC: An event dispatcher within Unreal Engine’s Blueprint system was bound to the incoming OSC messages matching the address pattern (/gesture/open_palm). The Blueprint logic then extracted the gesture name string from the message arguments.
  • Triggering Logic: A Switch on String node was used to determine the appropriate action based on the received gesture name. This allowed for different execution paths for “peace”, “thumb_up”, “open_palm”, etc., as well as a default path for unrecognized gestures.
  • Controlling Animations & Materials: Each logical path in the Blueprint triggered specific events on the “Protester” Actor Blueprints in the scene. This primarily involved blending between pre-defined animation sequences (like waving different signs). Dynamic Material Instances were likely used on the sign meshes, allowing the Blueprint logic to change a texture parameter, effectively swapping the displayed emoji (e.g., switching between the ‘crossed-out thumbs up’ texture and the standard ‘peace’ texture). A default state was implemented to play the “We Want Peace” animation when no specific gesture was actively detected or sent via OSC.

4. Asset Creation (Blender & Unreal Engine):

Simple visual assets were needed to represent the scene and the interactive elements.

  • Rigging & Animation: The signs were rigged in Blender with a simple armature allowing for a basic waving motion. A few short animation cycles were created: an idle/default wave (“We Want Peace”) and distinct waves for showing the ‘protested’ gesture signs and the ‘peace’ sign.
  • Import & Integration: The models, rigs, animations, and textures were imported into Unreal Engine. Animation Blueprints within the Protester Actor Blueprints were used to blend between the imported animations based on the triggers received from the OSC processing logic.

Outcome & Learnings:

This project successfully demonstrated a functional pipeline for real-time hand gesture interaction within Unreal Engine using accessible tools like Python, MediaPipe, and OSC. It served as a valuable exploration into:

  • The capabilities and ease of use of MediaPipe for hand tracking.
  • Implementing reactive game logic and animation control using Blueprints based on external data streams.

The resulting interactive experience, while simple, effectively showcased the potential for using computer vision to create fun and engaging user interactions within virtual environments. Based on this experience, I am excited to dive deeper into combining computer vision techniques with the interactive capabilities of Unreal Engine in future projects.