Welcome to my 2nd post on using Kinect v2 with C++. If you haven’t read it, you can read my introduction to Kinect v2 post here With Kinect v2 you can track up to 6 people. This tutorial does not deal with drawing the skeletons or anything like that. I want to focus only on Kinect part here. You can use whatever tool you want to draw the data you get from Kinect.
First, let’s talk about the data types that you’ll be using for skeleton tracking.
IKinectSensor: Represents a Kinect sensor device. This is the interface that you use to connect to a device, open new streams, or the availability of the device.
IBodyFrameSource: You use this interface to get the frame source and connect to the
IBodyFrameReader: This is the interface to acquire the skeleton data. You cannot create a
IBodyFrameReaderunless you have a valid, open and available
IBodyFrame: Represents the current frame in
IBodyFrameReader. This holds relative time from the sensor, skeleton joint data and the floor clip plane.
IBody: This is the that holds the data for skeleton joints (Joint positions, rotations), hand states and lean information.
So here’s the flow to getting the skeleton joint position. First, you get the default Kinect device to a
IKinectSensor interface, then using that you open a source for your
IBodyFrameReader from which you get
IBodyFrame that holds all the available skeleton information. Let’s get into some code.
Remember the flow? We first get an available Kinect device.
HRESULT is a Windows specific data type that holds return codes for the Windows specific functions. You use
FAILED to determine if the code was a success. You can find all the error codes in
winerror.h file or here. From here we only continue if the operation succeeded. So the rest of the code looks like this.
After we are done with error checking, we can go ahead and get our skeleton data.
IBody frame itself doesn’t give you any seemingly meaningful data. It’s up to you to make sense of it, otherwise they are just numbers on the screen. You don’t get a message saying user swiped right or raised her hand. That’s why we use the
processBodies function. Now, gesture recognition is not for this blog but we’ll just do a simple one. Let’s look at
Kinect has 3 data types for coordinate representation:
ColorSpacePoint. You can guess what they mean. But keep in mind that
CameraSpacePoint coordinate system is relative to Kinect. So, the point
0, 0, 0, is the center of Kinect’s field of view. So it will change depending on where the sensor is looking at.
CameraSpacePoint have the origin on their top left corner.
As you can see clearly from the code above, we only get the positions for head and left hand. And from their positions we deduce that user has his left hand up. As you can imagine, this is a very very simple way of gesture detection. You’ll need more code and sophistication as you need more complex gestures, like swipe or punch (They are not that complex but those came to mind first.).
You can also get hand states from
IBody and you can guess what each of them are, maybe except
HandState_Lasso. You get
HandState_Lasso when you extend only your two fingers.
Kinect does its thing at 30fps, so If you are developing a game or an interactive application with higher FPS nees you might want to use Kinect in a different thread.
You can find the full source code here.