A few days ago – Microsoft released its latest update to its Windows SDK 1.7 which includes Kinect™ for Windows®. Microsoft realized that Kinect isn’t just for gamers, and people have been using the technology for business applications because of the incredibly powerful natural computing technology Kinect offers.
Amadeus Consulting has been looking into the technology as it has a lot of potential to help us solve complex technology problems. As custom application developers, we get to be innovative on many of our client solutions. Kinect Interactions offers new controls for a more consistent user experience, including the ability to recognize up to four hands simultaneously! As part of our research into the Kinect technology, we pulled together some information about gesture recognition that we thought our readers would find interesting.
Kinect Gesture Recognition Patterns
There seem to be several schools of thought regarding gesture recognition out there. Of these there are two basic forms; one that is data driven and the other operates on heuristic algorithms for skeletal positioning. In computer science and artificial intelligence, a heuristic is a techniquethat more quickly solves a problem than would a classic method. The data driven gesture recognition route involves recording actors performing gestures using the device and matching the recordings to live data for recognition. This method is widely used for lifelike model movements in video games as the models movements closely match those of the actor at the time of the recording. The heuristic method reduces the expected states, or frames, of the gesture from the actor’s movements during detection and comparing them heuristically to expected values from a predetermined template.
Data Driven Gesture Recognition
The process of recording gesture profiles includes defining a starting point, transition points along the way and an end point by recording skeletal joint positioning in the key frames. Timing from start to end as well as point to point is measured along with data points that describe the skeleton joint positions (or “pose”) demarking the transition states. For example, a very simple gesture has four transition points; a start, an end and two intermediate states. If an actor successfully moves from the starting point through the two intermediate states and finally the ending state, in the described time periods for each segment and the overall gesture, the gesture will be recognized algorithmically. The problem with this approach for gesture detection is that the recorded gesture would be specific to the actor who recorded it and only those with a similar frame would successfully have the gesture detected. Meaning each required gesture would need to be configured for each person, resulting in a gesture profile for that actor.
Heuristic Gesture Recognition
The heuristic approach involves the defining of gestures as an ordered set of gesture states based on a general expectation of what that state (gesture segment or “pose”) looks like mathematically in 3D space. The data streams from the Kinect sensor have all the data required for such an approach. What’s missing is the expectation of gesture states. This could be done by recording the orientation matrices (or quaternions) of the segment states of a single actor and abstracted so that it could be applied to any motion using a similar skeletal orientation. This generalized approach benefits in that there is no per-user configuration or profiling needed.
If all of this sounds technically difficult, that is because it is. This kind of technology is the stuff that our innovation team thrives on however, because we enjoy the hard stuff. If you have a really tough technical problem you need help solving, don’t hesitate to contact us!