Getting joint data from the Kinect. The Kinect can track up to NUISKELETONCOUNT people simultaneously (in the SDK, NUISKELETONCOUNT 6 ). The skeleton data structure, NUISKELETONDATA can be accessed in the SkeletonData array field of the frame. Note that each skeleton may not necessarily refer to an actual person that the Kinect can see. The first installation is just some basic programming frameworks that the Kinect SDK uses. Once you have those installed you can proceed directly to the Kinect SDK. The Kinect SDK even includes a viewer program that shows you the content outputted by Kinect. Details below. The details about what to download and what to install is further in. This Viewer is No Longer Being Maintained. The CtrlAltStudio Viewer was developed in order to try out and share a number of ideas, primarily Oculus Rift support and stereoscopic 3D display, but also variable walk & fly speed, Xbox 360 controller support, and Kinect for Windows control.
I’m talking about the second version of the kinect. The data comes as four frames of TOF data that have to be reconstructed on the host machine running the application. Right now, doing that in the CPU yields 10fps on high end machines.
It can be accelerated nicely though using a GPU, as the xbox does, but that implies that there is some kind of OpenCL support and good bandwidth, which mobile platforms just don’t have yet. This will not be usable in the embedded space for quite some time and I just think that people should know what’s the situation before they invest in it. Yannis, I get what you mean now, but you are missing some points. First of all, Kinect v2 is currently in Developer Preview. SDKs, APIs and hardware requirements will change. Many things currently handled by the CPU will be moved to the GPU. Moreover, Kinect is an accessory for XBOX or desktops.
Power and hardware are critical. There are libraries and tools that transmit Kinect data to mobile and tablet devices (I have implemented such tools, too) so there is a kind of interoperability.More and more people invest on Kinect and this because of the business opportunities it provides. Corporations invest on Kinect for accomplishing really complex staff, such as 3D body scanning. Software developers can now build natural user interfaces faster and easier. Version 2 supports even more accurate and smoother body tracking.People develop software and earn money right now, so the time to invest on Kinect is right now 🙂.
Things are already handled by the GPU in the SDK you’re using. If that wasn’t the case you wouldn’t get any sort of acceptable performance. Openkinect has already disassembled the shaders used in the SDK and they’re trivial, as expected, so this won’t get much better by software alone. I agree that the new kinect is something that will be usable only in the xbox and medium to high grade desktops.Now about the libraries and tools that retransmit data, I’ve written all kinds of them too. You can see my latest one in action here. The head and hand tracking in the video is done on an arm@1Ghz and transmitted via udp. The processing takes 15% of the arm’s cpu and as you understand the latency is as low as it can get.
I was actually surpised that this thing would work considering the requirements of vr. An embedded board like that costs 30$ and is smaller than a credit card. With the new kinect I’d need a 300$ PC that’s bulky and harder to install where I need, and I have very weird needs. So this is the definition of regression, especially for doing complex stuff.As far as getting easier to do NUI and stuff, the demo above was done with a 50 line lua script in a matter of hours, networking and all.
I feel that the MS SDK and OpenNI approaches are dinosauric by now. The latter is also extict as of yesterday.I’ll also provide a tip. After finding a point of interest it’s useful to do an averaging of a small cube around that. As depth data tend to be very linear in nature, this cheap step will stabilize the position. In that sense, the resolution of the original kinect is underutilized as it is so the new one is not a breakthrough. Just more controlled by MS. They also have a long history of flopping perfectly good technologies which someone else adopts and makes mainstream so I’m pretty reluctant to bet on that horse.
Maybe the new CEO who is an actual enginneer can change that, but probably not in this iteration of the hardware.As far as money is concerned, yeah we all make money out of it but it’s just services for now. The real money are in actual marketable products and the internet of things. The new kinect will certainly not facilitate these and not just cause of the corporate distrust. The big showstopper is technical.Sorry for being raw but I think that people should also hear the pragmatic perspective. Yannis, I admire your projects, however this is not a pragmatic perspective. Microsoft SDK and OpenNI are “huge” because they provide body tracking algorithms, face tracking mechanisms and MUCH more!I can’t see how such things can be accomplished via LuaVision for example. It’s a great project, but it is out of scope of the series of my blog posts.
I am showing people how to access the various Kinect streams. In the upcoming blog post, I’ll demonstrate body tracking and facial expressions via the Body stream. Such staff can only be achieved using the Microsoft SDK.No point to reinvent the wheel, right?
OpenNI is dead, haven’t you heard?. I’ve criticized stallman’s stance a lot but this time I feel like I owe him a beer or something. This is what happens when you trust corporations instead of the open community. Sooner or later, you’re gonna get burnt.The reinventing the wheel argument does not hold well here.
I was recently involved in a EU project for medical rehabilitation. The requirements seemed simple at the beginning, just track the patient’s hands. Some university guys in the project said it’s ok we can handle it with the OpenNI and then the MS SDK and we don’t need to reinvent the wheel with custom tracking and stuff. But then it turned out that the user was to be seated and the kinect placed 1m ahead of him and we needed the palm orientation as well. It turned out that this was not part of the SDK but it had to be done and it was done.Still I understand the challenge thrown. I’ll see what I can do and get back to you.
Yannis, I am not sure that I understand your arguments. First of all you are complaining that Kinect can not be used in an embedded/mobile platform. Yes this is the case today. My iPad mini has almost the same processing power as XBOX 360 (nvidia claims that SoC graphic performance will match the one in XBOX 360 sometime in 2014). If you also take into account that Xbox One doesn’t have a state of the art GPU and it only uses 10% of it for Kinect, then I do not really see your point. It is just a matter of time (and to be honest I do not think that it is a matter of GPU processing power at all). As I’ve already said, 10% of a mediocre GPU -like the one in Xbox One- should be available in the current gen of mobile devices.
Now you can either complain why this is not feasible right now, or just envision and prepare for the world of tomorrow.Then, you hijack this post in order to advertise your work and compare apples with oranges. You are comparing kinect, a system that performs full body recognition/tracking (which is transparent for the user) with a system that does some kind of “hand” tracking (head tracking is performed with the traditional gyr/acc/mag combination -and I am not going to mention all those cables/devices that you have to “wear”). To be honest, I am not sure that this is hand tracking at all.
What makes kinect special is that it recognizes all your body parts and understands that what you move is actual your hand. In your demo it isn’t clear to me if you recognize a hand, or just something moving. With kinect you can achieve some degree of “self-awareness”.
It doesn’t simply recognizes some pixels moving around, it recognizes a hand, that it is part of a whole body that is performing an action. But of course, Kinect is not a panacea. If you want to do something that isn’t build for, then you have to do it yourself.(And I am not going to comment on your effort to make it a M$ vs flame). Well, you can’t do positional tracking of the head using just the rift’s gyr/acc/mag combination. The cables are part of the rift which has to be connected to a machine doing the rendering. The tracking takes place on a separate arm board which has a kinect connected on it.
This combo is on the right of the screen which is not visible because it’s 3-6m away from the user. The positional information of the head and hands are transmitted via udp to the box doing the rendering from which the cables run. We’re trying to make that mobile, with an android device worn on the user most probably. Min that the user’s body is not fully visible because it has tables between him and the camera. Now for the MS SDK this qualifies as “seated” mode and if that was used it would require 10 times more expensive hardware to run on and I doubt that the latency would be on par. The jitter could probably be improved with the tip I gave earlier. But unfortunately the range for the seated mode is up to 3m, so this setup is entirely not feasible with the body tracking feature of the MS SDK.
This is something that needed actual creative coding but it came through in hours and 50 lines of code. If you’re not sure you are more than welcome to try it for yourself. Just email me and we can set it up.I assume that when you say “kinect” you mean the MS SDK for the kinect because the kinect by itself cannot understand anything, it just sees a soup of voxels. The MS SDK contains a machine learning model that actually labels joints based on statistic models. This works with the things it was trained to recognize.
You can fool it very easily. I’m using a rule based approach with multiple blob tracking passes and empirically script the steps needed to find a body, find the extremities of the body and which are hands and which is head, which I assure you is quite robust for the things it tracks. It also has the advantage that if it encounters some pose that fools it, I just add more rules because I actually write the thing. You can’t feed the model of the MS SDK anything else besides what MS thinks you need. This approach is also way faster than the machine learning one and I mean way way faster.
The usb transfer has more overhead than the tracking itself. This is the most crucial and understated issue with NUI. It is input, it has to be low latency and leave as much headroom as possible for the actual application to perform rendering and whatever else it does.Now, comparing the xboxone with the current or the next gen of mobiles is absurd.
The XboxOne CPU has way more memory bandwidth and processing power than your average mobile will have for many years from now. The current mobile gen is on the level the average GPUs were in 2005. Wishful marketing is nice but hard numbers are what define what’s feasible eventually. Have you ever tried running OpenCL and OpenGL on a mobile? It’s a very tight situationAbout the M$ bashing, sure I’m biased. I present reasonable arguments though while your premises are completely off. You don’t know what you’re talking about and I’m not saying this as an insult.
It’s just obvious. Rule based + Empirical scripts vs Statistical models Please send me your paper that you describe your approach and your results.Also for some reason you claim that 100% of the xbox one CPU/GPU is used for kinect (and yes, when I am speaking about kinect in a post that describes how to use the official kinect sdk -which is the M$ sdk- I mean the whole bundle: Kinect+Software). If you think that a modern mobile cpu/gpu isn’t able to provide the same horsepower as a fragment of the mediocre Xbox one CPU/GPU, please just visit nVidia site.Now, about the comment, feel free to say what ever you want. I had 3 arguments in my post:1.) that the mobile devices are not as week as you are presenting them (I am sorry but this is about numbers. Say what ever you want about me, but the numbers will prove who is wrong).2.) that you are hijacking this post in order to present your work (just read your posts that have nothing to do with the original post and are describing your work).3.) I’ve expressed my thoughts about the video that you posted and your comments. To be honest I didn’t understand that you used the actual kinect as an input for your own recognition algorithms.
You are right. I didn’t know what I was talking about:You were talking against MS SDK,you were saying that you didn’t like kinect 2 -not their sdk, you were talking about the new kinect- andyou were claiming that you are using a different approach that recognizes hands easier.Also in the video there was no kinect.So I cannot believe how it was possible to miss that you were using kinect and not simply recognizing movement via a camera (just search oculus rift hand tracking and you will know what I mean).
Sorry you are absolutely right. This was my fault It was obvious. You were using kinect. (and by kinect I mean the actual device, not the sdk:p)But as I said before please show me your published papers with your results and then we can talk about it (it is very difficult to believe someone who uses an irrelevant blog post in order to present his work, and who claims that he is getting better results that the Cambridge research team).
In your post you wrote “which I assure you is quite robust for the things it tracks.” It would be very interesting to know what are those things, but if you are comparing the whole MS SDK with a limited framework, once again you are comparing apples and oranges. I already invited you to come and see the results. I will certainly not waste time writing a paper I’ve tried doing it for other stuff I’ve done and got them rejected because they were not “clear enough”.
I will not post what these things are because you’ll charge me with “hijacking this post in order to present your work”. And I admit, I don’t like writing papers. This approach can be perfectly described as one though and you are most welcome to meet me so I can explain it so you can write a paper for it. I don’t mind sharing credit, I think that describing a process is important for spreading knowledge. This is a serious invitation“Also for some reason you claim that 100% of the xbox one CPU/GPU is used for kinect”Where did I say that, are you imagining things?“If you think that a modern mobile cpu/gpu isn’t able to provide the same horsepower as a fragment of the mediocre Xbox one CPU/GPU, please just visit nVidia site”Oh yeah now we’re talking numbers.
Best on the mobile market right now Tegra4 with Fill Rate of 2.68 Gpix/s. Best on the market at 2004 GeForce 6800 Ultra Extreme Fill Rate 6.4 Gigapixel/s. I was mistaken, the gap is actually a year more than I thought. Xbox One has 12.8 Gpixels/sec Fill rate. This one is actually weaker than I thought it was, the PS4 has almost twice that. This is why it needs to prioritize the kinect processing which won’t be feasible in generic GPGPU environments and it’s still 5 times the best we have today and more than 10 times the avg. You don’t want to see what the current high nvidias fare.
They will make your eyes bleed.“I am sorry but this is about numbers. Say what ever you want about me, but the numbers will prove who is wrong”Hmmm. No I will resist the temptation. I shall not troll thee“who claims that he is getting better results that the Cambridge research team”I don’t remember mentioning cambridge. If you mean my reference to the EU project, these university guys were from germany. I will not disclose anything more, it would be unprofessional on my part.” In your post you wrote “which I assure you is quite robust for the things it tracks.” It would be very interesting to know what are those things, but if you are comparing the whole MS SDK with a limited framework, once again you are comparing apples and oranges.”I told you what this specific script tracks. Hands and head.
These were the requirements for the vr project which was basically a one night hack in the athenian hackerspace. So far that framework, which I believe “toolkit” would describe it better, has worked fine for every single assignment I’ve been given. Go check it out for yourself it’s at.
Mind though that I’m currently refactoring it so it can be can be used from other languages besides Lua. Do you have a preference? ““Also for some reason you claim that 100% of the xbox one CPU/GPU is used for kinect”Where did I say that, are you imagining things?”Perhaps I am imagining that you’ve just compared the full xbox one GPU (12Gpixels/sec fill rate) with Tegra (2.68 Gpix/s) According to various articles, only 10% of the xbox one GPU is reserved for Kinect (video/voice). So I thing that roughly Tegra should be able to do the trick (also you should take a look at Tegra k1). This isn’t about horse power. It has to do about battery power 🙂I didn’t say that you mentioned Cambridge. The Cambridge team is the one that developed the recognition algorithms for Kinect (along with M$).“Mind though that I’m currently refactoring it so it can be can be used from other languages besides Lua.
Do you have a preference?” Not really. I have developed my own algorithms 🙂 (and yes, I am using my own recognition algorithms when I have to do something that Kinect is not made for -both device/SDK. But this isn’t the place to write about).
“So I thing that roughly Tegra should be able to do the trick ”Yeah sure, if you completely saturate the GPU you MAY be able to get a decent and highly inconstant framerate. You will just have at least a full frame of latency (I hope you understand why this will happen). Won’t you have any drawing to perform? Performance will be decent only when the mobile gpu itself needs 10% of available resources to process the frames. This won’t happen with the K1 which has less than half the power of the xboxone, maybe the next gen from that will come close.
So it’s at least 2-3 years before this capability will be generally available to mobiles. Is this not what I said in the beginning?“This isn’t about horse power.
It has to do about battery power”Of course it has to do with power draw. How does this change the reality of the situation though? Mobiles are like that“I have developed my own algorithms”That’s great to hear.
Maybe we can exchange notes. Is there somewhere I can take a peek? Yannis, I really can’t understand your mentality. If you don’t support the Kinect SDK, then why do you post in a Kinect-related blog? If you have developed something better than the official Kinect SDK, then prove it, make benchmark tests or anything. It is quite unfair to blame a technology (used by millions of people) just because it is not yet available for mobile devices.Warning: The fact that this blog has hundreds of thousands of views doesn’t mean that I have to host flames and digital battles – for any reason.
This post is a developer tutorial. You can use the comments section for asking questions or providing feedback.
If you want to say something that is not related to the contents of my blog posts, please use my contact form or email me directly.Thank you. Everything I’ve said is relevant to the kinect. Maybe you meant that this blog is about the MS SDK for the kinect. If that’s the case then ok I don’t have anything further to add. People are using the first kinect with mobiles and wont be able to use the new incarnation.
They should be aware of that before engaging with the device. Other than that there’s not much more to say.I like that you pose challenges but it’s time to throw the ball back in the field. You had an article about implementing gestures the other day. I’ll benchmark that case and get back to you with results there. Let me make my case clear:– This blog is about whatever I like. It’s a personal website. I have posts about OpenNI with more than a million unique page views.
Nowadays, my focus is on Kinect version 2 official SDK.– Kinect v2 is still in Developer Preview. Mobile will be supported similarly to version 1.
Please do not spread false news to people, in purpose.– Feel free to benchmark whatever you like.Once again, I do not like your attitude. It is too personal and I cannot understand the reason. If you have something to say to me, simply email me your complaints. There is no need to post irrelevant thoughts to my blog, just because you think your SDK is better than Microsoft’s.Thank you. “Kinect v2 is still in Developer Preview. Mobile will be supported similarly to version 1. Please do not spread false news to people, in purpose.”I presented reasonable arguments as to why this will not be the case, along with numbers as an engineer is obligated to do.
Do you have any inside information from MS that their hardware will behave differently in the future? If you do then present them else people can only assume that you are spreading opinionated false news based on absolutely nothing besides your wishful thinking. You’re welcome.
You need to edit the image pixel by pixel.First, you need to create a new byte array for the new bitmap. The length of the new array is supposed to be the new desired width multiplied by the new desired height multiplied by 4 (bytes per pixel):Then, you loop through the pixels array and specify which pixel values you need. You insert the values to the corresponding position of the cropped array.Finally, you simply call BitmapSource.Create method to create the new bitmap using the cropped array. You can have a look at the CoordinateMapping example of the SDK to see how you can copy values between byte arrays.
Thanks for writing up this tutorial for Kinect v2.I have been capturing and dumping BodyFrames, BodyIndexFrames, DepthFrames, and InfraredFrames onto disk, from a MultiSourceFrameReader and a read loop in a thread. Now I want to capture and dump ColorFrames as well. Dumping the raw frame data seems to still yield acceptable frame rates.
However, I have no clue how to convert from the raw data to RGBA afterwards. If I use the ColorFrame’s CopyConvertedFrameDataToArray method, performance suffers a lot and the frame rate becomes unacceptable.So it appears that the raw color image format is supposed to be Bgra? But why does it seem to only take 2 bytes per pixel, instead of 4 bytes? How do I convert that into 4-byte RGBA? Would you be able to enlighten me?
I am trying to playback the.xef files using kinect studio and use them as the kinect source for my application which needs to read face expressions and body movements.I am unable to read in any face expressions this way. I have created a FaceFrameSource and FaceFrameReader just like the example in the kinect sdk( OF course the example too doesn’t work with recorded.xef files). So I am a bit stumped. There is no documentation whatsoever on these classes. Have you by any chance come across such a problem?
I can read infrared and body frames from the recorded.xef file, not just face(and also hdface).Any insight will be much appreciated.Regards,Richard Macwan. Hi;Thank you again for those great tutorials.I am a newbie at C# and thanks to your help and the Microsoft SDK, I was able to display the streams! (lol I know very simple but all this is new to me).So now I want to save the those streams. The only thing that I have is the screenshot button from the SDK but unfortunately I have to click on it to keep saving the frames.How would you suggest I save those streams continuously?I tried a while loop and all and got stuck in that loop. So any tips on how to save the streams continually?So I just basically want a record button that I can click and it saves the frames one by one.Thanks!Ali. Dear Vango,Thank you very much for all the information you are providing. I am very new to all this electronic world (have made project using the Arduino), so please forgive my limited understanding.I have an air BB gun fire a 6 millimeter diameter pellet, which goes through a chronometer (built with an Arduino and LCD display), displaying speed in meters/sec and feet/sec.
Max speed about 240 feet/sec.Can the Kinect v2 record in Depth mode only (I’m interested only in x-y-z coordinates), and put this data into an array? The array will later (at a press of a button on the Arduino) be send to Processing for 3D graphing? I’m interested in only two seconds of data (x-y-z).Also, the Arduino is coming out soon with a new ARDUINO TRE with a 1-GHz Sitara AM335x processor. Can I connect the Kinect v2 to this new device and perhaps get a faster sampling? I’m interested in capturing as fast as possible the Depth data and store it into an array.I am presently studying Processing and reading articles on the new Kinect v2.Thank you very much for your time and effort.Sincerely,Alexander. Hello Vangos,Thank you very much for the presenting such a nice tutorial and I hope it will be very helpful.
I want to record a video stream color/depth/skeleton joints simultaneously using Kinect v2. I don’t know much about the WPF programming. So I started first by making the WPF project and make 3 buttons, color, depth, and joints. I am not exactly sure where I can add the code to initialize the sensor and the remaining part to record the video stream. Please can you help me to explain some more basic steps for the beginners?I am waiting for a kind and soon reply.Thank you. Along with the SDK, Microsoft provides Kinect Studio, a handful utility that lets you record Kinect frames and save them as.xef files.If you need to implement this functionality by yourself, you’ll need to capture the bitmaps and pass them to a recording tool. Is a library with such a tool available.There is also a beta implementation for WinRT in my project.Hope this helps you.Alternatively, you can store the raw frames in binary files (which would be faster in terms of performance) and read those files later.
Hi Vangos Pterneas,Every time I take skeletal point measurements of a person, they show variations. Even if the same person stands in front, the reading changes based on the distance from camera, the clothes that they wear etc. Based on your experience, do you know of any mechanisms where I can limit these variations and get consistent readings for same person under different condition? Are there any tools way to find out the invariant joints/points? I am using Kinect V2 with the latest SDKThanksPriya. Hi Vangos;First off, thank you so much for all those useful materials about Kinect v2 sensor.So I went through the above code and everything works fine but now I would really like to save those “streams” as a series of images or video file.So the color image I would like to have it as a series of png files or an avi video file and for the depth images I would like to save all the depth information and save it.
I am not sure what format it would be such that I can get all the depth data and not just an image.So can you please point me in the right direction so that I can record the “streams”ThanksAli. Those are two consecutive frames?? Then you maybe are doing something wrong in the adquisition but thats not my territory, i dont know anything about the kinect sdk and thats why im here, because i want to adquire the depth frames with precise timestampshere is an example that lets you adquire depth frames, and export those directly to matlab, with all the depth range!
Maybe this can work for you!but its a resources eater, it doesnt finish an adquisition without a force close of the running program and because of that, cant manage a constant framerate, adn the programm has to finish to report the timestamps and because it needs to be force closed i cant get the times of adquisitiongive it a try and lets share our results!. Hi Vangos,Really appreciate your exhaustive material on the kinect.I had a small query, on which you might be able to throw some insight.For an application we are working on, we need to write the raw depth data acquired from kinect v2.0 to a text file. I have a c# code that converts the live data acquired from the kinect v2.0 sensor to a.txt file.
But this process mandates the kinect to be connected all the time.So, I would want to know if there is a way, wherein we can convert the pre-recorded raw data (.xrf format ) to (.txt format).Thanks for your time. Hi PTERNEASThank you for your help, i’ve an other question for you, because i want to do a mapping of the cloud points on the face basic wpf (Avatar). I have the two codes of each situation separatly and everyone works, but when i want to combine them in the same code, i’ve nothing as result because it seems there is a conflict beetween the cloud points and the avatar and sometimes i have just the result of the face basic but it’s very slow.So my question is: Do you know the code to map the cloudpoints on the avatar?Thank you PTERNEAS.
30 juin 2012I simply had to thank you very much once more. I am not sure the things I would’ve done without those concepts documented by you over that industry. This has been a very distressing situation for me, nevertheless understanding the professional way you solved the issue made me to leap with contentment. Now i’m thankful for this work and even pray you are aware of a powerful job you happen to be getting into training men and women through your site. I am sure you’ve never encountered all of us. Hi Vangos,Your blog was very useful to me.
I am trying to get the x,y,and z coordinates of a moving object using the kinect sensor and matlab. If you feel coding in matlab is tough any other software would also be highly appreciated. Would love to hear ways in which this can be done. It would also be wonderful if you can mail me the code for the same.
I am an ardent roboticist and I want to make a project where i can track a moving robot. Looking forward for you inputs. Thanks once more for your extensive blogs. Hi Vangos,Abishek here again.
I wanted to understand whether the depth data that the kinect sensor gives is measured from the kinect sensor to the pixel or is the real world z coordinate. Other than that, I also want to know how to convert the depth value(0-2047) into metric units. There are many formulas for this on the web, but i am not sure as to which one to trust. I also want to know as to what is the width frame and height frame of an image captured by the kinect. Can you also please let me know the correct formula to convert the pixel size into real world x,y coordinates.1cm = 37.79 pixelThanks a lot,Regards,Abishek.
About The (K4W) team release samples to show how to build applications and experiences using K4W. Our “Basics” samples are designed to quickly show a particular feature. Other samples are more robust and can serve as a template for building an app. You will still want to download and install our SDK and Toolkit for the full developer experience but the code here is easy to browse and you can submit feedback about the samples right here on the site. This project uses Git. Get started using.
You can also download source code as a Zip file: This is the list of all samples included in the latest Developer Toolkit release. The table lists the name of the sample, in which languages it’s available and what technologies and additional SDKs are used. You can browse each sample’s source code by using the “Source Code” tab above.