In this article I want to introduce a Virtual Reality system that can be build with two Nintendo Wii Remotes (Wiimotes) and just a little extra hardware that costs less than 20 USD. By that VR should be affordable for everyone. The demo software and source can also be downloaded from this website in the download section. Please see the videos for a short introduction to this installation. You can also see my Coding4Fun article.
A VR Desktop setup uses a monitor as 3D output display and is sometimes referred to as fishtank Virtual Reality. I use a stereo monitor that needs a pair of polarization glasses to display stereo images. If you don't own a stereo monitor you can still construct the VR Desktop by using the anaglyph or red/green stereo method. For that you just need a pair of red/green glasses and you can use any monitor. Even though you loose color information the stereo effect is still very good.
For interactivity the two Wiimotes come into play. In this setup there are two kinds of interaction possible. First we have a kind of navigation support. Navigation inside a rendered 3D scenes means the movement of the camera, as the camera represents the eyes of the user. For the Desktop setup this means we have to track the users' eyes or head position to move the 3D scene camera correctly – therefore this is also called head tracking. For the head tracking I mount one Wiimote at the top of the monitor to point at the user's face.
The second Wiimote is mounted from the top in vertical down position facing the area in front of the monitor. This will be used to interact with the 3D scene by manipulating the transformation of a 3D object. As you use the hand for this interaction I call this hand tracking. The Virtual Reality Desktop setup I built looks like this:
The core of this VR setup is using the Nintendo Wiimote Controller as a tracking camera. As one might already know the Wiimote is equipped with an infra-red camera that is able to recognize up to four infra-red lights. The main idea now is to build a device with four infra-red LEDs that can be recognized by the Wiimote. This device I call IR-LED beacon. Using the values of the 4 LEDs as recognized by the Wiimote I use an algorithm to reconstruct the original position and orientation of the IR-LED beacon. Getting all 3 axis translations and all 3 axis rotation angles of the device is called 6 degree of freedom tracking – or short 6DOF tracking.
Because this setup uses
head tracking as well as a tracking for a hand manipulation two
beacon have to be built. This are the ones I built:
As can bee seen the beacons just consist of 4 IR-LEDs, a battery holder and wiring. For easy construction and less wiring I use a stripboard. When choosing the IR-LEDs it is important to look out for a very wide angle of radiation. Typical LED just have a small angle, try to find LEDs with angles at least greater than 65°, a good source might be digikey.com. In Germany you can get them from Conrad. Three of the four LEDs are aligned in a line with only slightly different height. The fourth LED is mounted above the line with more height. This special order of the lights is needed by the algorithm to be able to assign the IR-points recognized by the Wiimote to the original LEDs of the beacon. It is also important that the fourth LED has not the same hight, so that the LEDs are not so planar. Please see the following picture for a schematic layout of the beacon. For power supply I just us on AAA battery and connect all LEDs in parallel to the battery poles. For easy handling I use a battery holder which are also available at electronic components supply stores.
After soldering everything together the exact positions of the LEDs have to be measured in the correct order. It is very important to measure the 3D position of the LED lights very accurately. If the data for these coordinates is not accurate the tracking results will be poor. The values have to be in millimeters.
After position the Wiimotes as shown and constructing and measuring the beacons you have to configure the software using two configuration files. You will find the files in the application's directory.
eyeDistance = "0.02"
switchLeftRight = "False"
fieldOfView = "60"
antiAlias = "False"
stereoMode = "lineInterlaced"
fullscreen = "False"
resolution = "1280,1024"
displayDevice = "Screen"
anaglyph = "True"
windowPosition = "0,0" />
If you own a line-interlaced 3D Monitor from Zalman, you can change the anaglyph to False. If you have difficulties adjusting your eyes to the stereo image you can try changing the value for the eyeDistance. Reducing the value makes it easier to adjust but might reduce the 3D effect. The other parameters should work as you might expect from the name.
In this file the configuration for the tracking is stored. It defines the Wiimotes and its positions and the IR-LED beacons. Furthermore it defines filtering parameters. Here I will point to the places where you need to adjust the values for your setup.
First you will need to specify the exact position of your Wiimotes relative to the center of the screen. You need to measure this distance for each Wiimote in millimeter. Probably the values will be similar to the default values:
translation="0,200,50" rotation="0,0,0" scale="0.001"
xAxis="x" yAxis="y" zAxis="z">
translation="0,350,350" rotation="0,0,0" scale="0.001"
xAxis="x" yAxis="z" zAxis="y">
The Wiimote with id=2 is the one mounted at the top of the monitor to do the head tracking. In the translation field enter the distance of the front tip of the Wiimote to the center of the screen. In the default settings the Wiimote is 200mm above (y-value) and 50mm in front (z-value) of the center. Do the same for the second Wiimote in the following entry with id=1. You can see, that here the y-axis is assigned to the z-axis and the z-axis is assigned to the y-axis. This is necessary because the Wiimote is mounted in vertical position. If the software recognizes the Wiimotes in wrong order you can exchange the order of the definition blocks.
The second part where you need to make changes is the definition of the IR-LED-Beacons:
<WiiMarkerBody id="0" name="WiiMote Head Beacon" nearClip="20" farClip="1500"
<point3d value="0, 4, 7"/>
<point3d value="40, 4.5, 10.5"/>
<point3d value="83.5, 5, 7.5"/>
<point3d value="38, 45, 18"/>
</WiiMarkerBody> <WiiMarkerBody id="4" name="WiiMote Hand Beacon" nearClip="20" farClip="1500"
<point3d value="1, 0, 8.5"/>
<point3d value="29, 0, 11"/>
<point3d value="56, 0, 6"/>
<point3d value="29, 45, 21"/>
Here you need to change the values for the beacon points in millimeter according to your measurements. Please note that the correct order of the points is necessary. They have to be from LED1 to LED4 according to the schematic in Picture 4.
Furthermore you will find a description for the tracked device:
Here you assign which Wiimote (TrackedCamId) to use with which IR-LED-Beacon (MarkerBodyId). The only value that you might want to adjust from these settings is the LocalTranslation. The translation and rotation values that are calculated need to have their reference point on the beacon. By default the first LED will be this reference. Usually you want to have a different reference point, like the center of the beacon. Therefore measure the distance from the first LED to your preferred reference point in the same metric as the beacon values and write it in the LocalTranslation.
There are lot of more parameters you can tweak inside this configuration file, but to make the VR-Desktop run this shouldn't be necessary.
Before running the software it is necessary to connect the Wiimotes with the computer. For that the computer needs to be equipped with a compatible Bluetooth adapter. For more detailed description on how to do that please refer to Brian Peek's Wiimote Library article:
After putting in your values in the configuration files and placement and connection of the two Wiimotes you can run the binary installation by clicking on the VRDesktoDemo in the start menu. If you want to run from the source code you have to copy the OpenCV dlls from (VRDesktopSrc)\ExtLibs\OpenCV\opencvlib to the binary destination directory of the compiled project, e.g. (VRDesktopSrc)\VRDesktopDemo\bin\x86\Release before you can start the application.
Using the VRDesktop in your own XNA application is very easy. Here I will point out the relevant steps from the VRDesktopDemo application.
To start from scratch you will have to create a new XNA Windows Game Project. First include the references to the two libraries Tgex and Tvrx. Then open the created Game class.
At the top the namespaces for libraries have to be added:
Then the parent class has to be changed from Game to VRGame:
public class VRDesktop : VRGame
VRGame is part of the Tgex library and adds the support for the stereo display. It creates a stereo camera and the application window according to the settings file. You can use the class more or less similar to the original game class. The main difference is, that for the drawing you must not override the Draw(GameTime time) function but the new DrawScene(GameTime time) because the Draw function of the VRGame class takes care about the stereo rendering.
For storing the transformation matrix of the hand tracker we define a variable:
Matrix modelTransform = Matrix.Identity;
And in this simple example we define a variable for the model:
In the Initialize function the initialization of the TrackerManager needs to be done:
// The tracker manager is a singleton but needs to be initialized once.
// initialize base clase.
In the LoadContent function the model is loaded and the tracking is started:
// for this demo just load the coordinate cross
model = Content.Load<Model>("coordinate");
modelTransforms = new Matrix[model.Bones.Count];
// start tracking now
The Update function is for the main game logic. First we allow the user to exit the game and the tracker to stop regularly:
Allows the default game to exit on Xbox 360 and Windows
Before getting the latest tracking data we have to call an update on the TrackerManager:
To get the transformation data we have to call the GetProxyTransform(indexNumber) from the manager. The proxies are defined in the tracking.xml file. In the example we call:
tracking proxy with id 1 is the hand tracker
the m_camera is defined in the VRGame parent class. The camera class will also make the necessary camera transformation adjustments for screen projected head-tracking by creating a perspective of center projection matrix.
Finally int the DrawScene function the model mesh is drawn. Here we need to pass the model transformation matrix as well as the camera matrices to the effect:
the model, a model can have multiple meshes, so loop
//This is where the mesh orientation is set, as well as our camera and projection
foreach (BasicEffect effect in mesh.Effects)
effect.World = modelTransforms[mesh.ParentBone.Index]
effect.View = m_camera.ViewMatrix;
effect.Projection = m_camera.ProjectionMatrix;
//Draw the mesh, will use the effects set above.
Thats all what has to be done to make use of the Wiimote Virtual Reality Desktop in your own application.
For the interested reader I will explain now in more detail how the Wiimote tracking actually works. Anyhow, I will not go into the mathematical details of the core algorithm but provide you with the necessary references. I will just focus on the Wiimote tracking part and will not go into details of neither the tracking library Tvrx nor the game library Tgex.
Technically speaking, the Wiimote
tracking is an optical marker based tracking. It is optical because
we use the Wiimote camera and it is marker based, because we don't
use the whole camera image. Instead the Wiimote returns the position
of up to four infrared LEDs which represent our markers.
focal length in pixel = 1380
// assume 1/4" cd sensor (even though it probably is not)
pixel size in mm = 0.0035
chip resolution = 1024x768
// Wiimote center (approx.)
principal point = 512x384
The resolution of the values returned by the Wiimote are 1024x768. Obviously this is not the physical resolution because cameras with this resolution would cost more than 1000 USD. The Wiimote has a PixArt Imaging Inc. (http://www.pixart.com.tw) sensor and probably has a resolution of 352x288 or 164x124. However, trying to guess the real values with the help of the PixArt Sensor data sheets did not work out satisfactory so I decided to fix the pixel size and resolution on the above values and estimate the focal length. Even thought the values are not correct, they only need to be correct relative to each other to make the pose estimation work. The principal point is the actual origin of the image plane. Ideally this value should be measured, here I just assume it is the middle of the sensor chip.
The overall tracking algorithm can be divided in the following steps:
In the following I will describe the steps in more detail and with code examples.
The connection and data retrieval with the Wiimote is done in the WiiMoteTracker class. This class implements the interface IMarkerTracker, that defines the interface for an optical marker based tracker. The Wiimote is initialized in the Initialize() function and connected when StartTracking() is called :
public void Initialize()
// test static variable for first time call
if (m_wiimoteCount == 0)
(m_wiimoteCollection.Count <= m_wiimoteCount)
ErrorHandler.Report("Invalid WiimoteTracker count, only "
+ m_wiimoteCollection.Count.ToString() + " Wiimotes found");
wm = m_wiimoteCollection.ElementAt(m_wiimoteCount);
// setup the event to handle insertion/removal of extensions
wm.WiimoteExtensionChanged += wm_WiimoteExtensionChanged;
// create filter for accelerator values
AverageFilterDesc filterDesc = new AverageFilterDesc();
filterDesc.numOfValues = 1000;
for (int i = 0; i < 3; i++)
m_acceleratorFilter[i] = new AverageFilter();
// create filter for image points
filterDesc = new AverageFilterDesc();
filterDesc.numOfValues = 5;
for (int i = 0; i < 8; i++)
m_imagePointsFilter[i] = new AverageFilter();}
public void StartTracking()
m_isTracking = true;
// connect to the Wiimote
// set the report type to return the IR sensor and accelerometer data (buttons always come back)
ErrorHandler.Report("Cannot connect to Wiimote");}
m_isTracking = false;
To receive the Wiimote data the wm_WiimoteChanged Callback has been registered. This function is then called whenever the Wiimote has updated values. Inside this function the infrared LED values are read and the LED beacon light assignment is done.
For that, first a list with with Vector2 for the image points is created:
// put in list
for (int i = 0; i < 4; i++) irList.Add(
new Vector2((float)(ws.IRState.IRSensors[i].Position.X * m_resolution.X) - m_principalPoint.X,
(float)(ws.IRState.IRSensors[i].Position.Y * m_resolution.Y) - m_principalPoint.Y));
Then the values have to be assigned to the IR-LEDs by putting them into the right order. This is done by a simple geometric pattern recognition. The idea is to have a geometric pattern that is invariant to the projection from 3D to 2D. As can be seen in picture 4, the 3 LEDs of the LED beacon are arranged in more or less a line and the 4th LED is above of the line. In the 2D image data of the Wiimote the 3 LEDs also form more or less a line. Therefore, the first step in the assignment algorithm is to find the 3 image points which are most close to form a line.The line test is done in the following function:
void TestPoints(Vector2 lineStartPoint,
float dist = onLinePoint.DistanceToLine(lineStartPoint, lineEndPoint, out lambda);
// check if projected point is between line end and start point
if ((lambda > 0) && (lambda < 1))
// if distance is short, make this combination the result
if (dist < m_pointLineDist)
m_pointLineDist = dist;}
m_lineStartPoint = lineStartPoint;
m_lineEndPoint = lineEndPoint;
m_onLinePoint = onLinePoint;
m_freePoint = freePoint;
The function is called with the four image points as input. It assumes the first point to be the line start point and the second point to be the line end point. Then the distance of the third point to the line is calculated. This is done by using a C# 3.0 extension function of Vector2:
Vector2 startLinePoint, Vector2 endLinePoint, out float lambda)
Vector2 rv = endLinePoint - startLinePoint;
Vector2 p_ap = point - startLinePoint;
float dot_rv = Vector2.Dot(rv, rv);
lambda = Vector2.Dot(p_ap, (rv / dot_rv));
Vector2 distVec = point - (startLinePoint +
lambda * rv);
The line distance test is a typical algorithm as described on: http://mathenexus.zum.de/html/geometrie/abstaende/AbstandPG.htm
It returns the distance to the line and an lambda value, which defines the position of the line projected point. If the projection point is outside the line start and line end point the lambda will be below 0 or greater than 1.
If the line was valid the distance of the third point is compared to the formerly smallest distance, and if it is smaller, then this order of image points will be saved as best solution.
To find the right order of the image points this function has to be called with all possible combinations of the 4 LED image points. In my code I do all the calls explicitly after initializing the minimum distance with float max value:
Now that we have the right order of the three points that form the line and the 4th point, it is still necessary to determine the right direction of the line. In our LED beacon the 4th LED is above the line. If the start and end point of the line would be interchanged then the 4th LED would be below the line. Mathematically we check if the order of the points is clockwise or counterclockwise:
only remaing test is check if line start and end point is in right
Vector2 tmp = m_lineEndPoint;
m_lineEndPoint = m_lineStartPoint;
m_lineStartPoint = tmp;
The algorithm for the clockwise check is taken from http://www.geocities.com/siliconvalley/2151/math2d.html
Then the correctly ordered points are slightly filtered with a simple average filter over the last 5 values and multiplied with the pixel size the change from pixel metric to millimeter metric. Finally the points are passed to the pose estimation class.
now write resulting order to image points
m_imagePoints[i] *= m_pixelSize;
The pose estimation is done by the class Posit. This class implements the IPoseEstimate interface:
void InitializeCameraParameter(double focalLengthMM, bool flipImage, float scale,
int assignAxis, int assignAxisSign);
void InitializeMarkerBody(Vector3 markerPoints);
void UpdateImagePoints(Vector2 imagePoints);
void GetTransform(out Vector3 position, out Vector3 rotation);
The pose estimation has to be initialized with the focal length of the tracking camera. The 3D positions of the real device – in our case the LEDs of the LED beacon – are passed in the InitializateMarkerBody function. The measured image points are passed with the UpdateImagePoints call and the calculated result can be taken from the GetTransform function. Because the pose estimation itself runs in an own thread asynchronously it has to be started and stopped by StartEstimation and StopEstimation. By using the interface it is easy to plug in different pose estimation algorithms.
As mentioned before the pose estimation algorithm used here is the PosIt algorithm published by D. DeMenthon. I use the implementation from the OpenCV computer vision library. As this library is C code it has to wrapped to manage code. I use the freely available wrapper EmguCV. Before the pose estimation can be done a pose estimation object has to be created. This is done when the 3D positions of the markers are passed:
m_numOfMarker = markerPoints.Length;
MCvPoint3D32f worldMarker = new MCvPoint3D32f[m_numOfMarker];
for (int i = 0; i < m_numOfMarker; i++)
worldMarker[i].x = markerPoints[i].X;
worldMarker[i].y = markerPoints[i].Y;
worldMarker[i].z = markerPoints[i].Z;
m_positObject = CvInvoke.cvCreatePOSITObject(
m_imagePoints = new MCvPoint2D32f[m_numOfMarker];
m_imagePointsBuffer = new Vector2[m_numOfMarker];
MCvPoint32f is a managed structure for the OpenCV CvPoint32f and similar to a Vector3f. The CvInvoke class of the EmguCV wrapper is a collection of static functions to invoke the original OpenCV functions. Because the pose estimation algorithm was not included in the class I had to insert the following functions:
/// Release pose esitmation object
public static extern void cvReleasePOSITObject(
The returned object of the CvInvoke.cvCreatePOSITObject function call is a simple IntPtr and is used later for the pose estimation function.
The pose estimation itself is done in an own thread in the PoseEstimate() function. First the new image points are fetched. If no update is available we wait for new values. This is done with the Monitor.Wait and Monitor.Pulse mechanism:
// copy image
m_imagePoints[i].x = m_imagePointsBuffer[i].X;
m_imagePoints[i].y = m_imagePointsBuffer[i].Y;
After getting the new image values the cvPOSIT function is invoked:
Because the algorithm is an iterative algorithm the MCvTermCriteria defines when the algorithm should terminate. Here I defined that it should either terminate when either 500 iteration steps have been reached or when the difference of the values from the former iterationen are smaller than 0.00001. You can play arround with these value to see how the tracking accuracy reacts. Besides the termination createria you have to pass the cvPOSIT function the IntPtr to the posit object, the image points and the camera focal length in millimeter. As result you get a 9 float array for the rotation matrix and a 3 float array for the translation result.
Because the rotation values should later be filtered they are translated to euler angels in the EulerAngles function. Euler angles define the rotation by giving the rotation about each coordinate axis. Before storing the final values there is some axis swapping and scaling according to the settings in the tracker.xml.
Because the resolution of the Wiimote camera is not very hight and optical tracking always to some degree noisy the transformation results have quite strong jitters. To reduce the jittering the result values have to be filtered. As a side effect of strong filtering in tracking the virtual object seems not to follow the tracked objects movements directly and feels like swimming behind. A good compromise between jitter reduction and direct response is using Kalman filters. Kalman filters use a mathematical model to predict the change of the values and then uses the measured data to correct its prediction. A good introduction to Kalman Filters is the Siggraph 2001 Course from Greg Welch – see References. Anyhow, determine the best parameters for the filter for non-mathematicians is difficult. A good reference how to apply the filter in the tracking domain is the dissertation of Ronald Azuma “Predictive Tracking for Augmented Reality”. Please refer to that document if you want to learn the meaning of the parameters. For the Wiimote VR-Desktop the Kalman parameters are defined in the tracking.xml file:
0.005, 0, 1"
The Kalman implementation used is again part of the OpenCV library. The EmguCV wrapper comes with a complete wrapper for this function. In my implementation there is an interface for data filers, the IDataFilter. The KalmanFilter class implements this interface. Beside the Kalman filter there is also a simple AvarageFilter in the library. The initialization of the Kalman filter is like this:
m_kalman = new Kalman(2, 1, 0);
filterDesc = (KalmanFilterDesc)desc;
// set A - second parameter is frames per second
m_kalman.TransitionMatrix.Data.SetValue(filterDesc.A, 0, 0);
m_kalman.TransitionMatrix.Data.SetValue(filterDesc.A, 0, 1);
m_kalman.TransitionMatrix.Data.SetValue(filterDesc.A, 1, 0);
m_kalman.TransitionMatrix.Data.SetValue(filterDesc.A, 1, 1);
// set H
m_kalman.MeasurementMatrix.Data.SetValue(1.0f, 0, 0);
m_kalman.MeasurementMatrix.Data.SetValue(0.0f, 0, 1);
// set Q
CvInvoke.cvSetIdentity(m_kalman.ProcessNoiseCovariance.Ptr, new MCvScalar(1));
m_kalman.ProcessNoiseCovariance.Data.SetValue(filterDesc.process_noise_cov_1, 0, 0);
m_kalman.ProcessNoiseCovariance.Data.SetValue(filterDesc.process_noise_cov_2, 1, 0);
// set R
CvInvoke.cvSetIdentity(m_kalman.MeasurementNoiseCovariance.Ptr, new MCvScalar(1e-5));
m_kalman.MeasurementNoiseCovariance.Data.SetValue(filterDesc.measurement_noise_cov, 0, 0);
CvInvoke.cvSetIdentity(m_kalman.ErrorCovariancePost.Ptr, new MCvScalar(500));
m_kalman.ErrorCovariancePost.Data.SetValue(2, 0, 0);
After initializing a float value can be simply filter in the Filter function:
public float Filter(float inData)
// Z measurement
data.Data[0,0] = inData;
return m_kalman.CorrectedState[0, 0];
Because in the resulting transformation of the pose estimation there are 3 float values for the translation and 3 float values for the rotation altogether 6 separate instances of the filter are needed. In the Tvrx library the filtering is done inside the TrackedDevice class, which is a parent class for tracked devices and is derived to the TrackedWiimote:
m_rawTranslation.X = m_translationFilter.Filter(m_rawTranslation.X);
m_rawTranslation.Y = m_translationFilter.Filter(m_rawTranslation.Y);
m_rawTranslation.Z = m_translationFilter.Filter(m_rawTranslation.Z);
m_rawRotation.X = m_rotationFilter.Filter(m_rawRotation.X);
m_rawRotation.Y = m_rotationFilter.Filter(m_rawRotation.Y);
m_rawRotation.Z = m_rotationFilter.Filter(m_rawRotation.Z);
Also in the TrackedDevice class the transformation of the tracking values from the camera world space coordinate system to the actual in game virtual space coordinate system takes place:
Matrix bodyTransformMatrix =
Matrix.CreateFromYawPitchRoll(m_rawRotation.Y, m_rawRotation.X, m_rawRotation.Z)Matrix result = m_TrackerWorldTransform * m_DeviceWorldTransform;
result = bodyTransformMatrix * result;Vector3 scale;
result = m_DeviceLocalTransform * result;
result.Decompose(out scale, out m_Rotation, out m_Translation);
First a transformation matrix of the Euler angels and the translation vector is composed to the Body-Transform. From the tracking.xml configuration file we also have matrices for Tracker-World-Transform, Device-World-Transform and Device-Local-Transform.
In addition to the Tracker-World-Transform translation from the tracker.xml I calculate the rotation angles of the Wiimotes using the acceleration sensors. By that it is possible to rotate the Wiimotes around the x and z axis to better focus the area you want to track and still get correct tracking results automatically.
Finally, to get the correct final transformation the matrices have to be multiplied in the correct order:
Device-Local-Transform * Body-Transform * Tracker-World-Transform * Device-World-Transform
Now the pose estimation transformation result is ready to be read by the TrackerManager.
As I have shown it is possible to create a low-cost Desktop Virtual Reality Setup using two Wiimotes and anaglyph stereo glasses. But because of the low resolution of the Wiimote camera the quality is not comparable to professional monocular tracking systems. However, the quality of the presented system could still be improved by correctly measuring the intrinsic parameters of the Wiimote camera. There are various known algorithms how to measure these parameters by a set of sample data of an object with regular known geometry. For standard cameras usually a checkboard pattern is used. An algorithm for that purpose is also integrated in OpenCV, so applying it to the Wiimote shouldn't be to hard.
Another issue with intrinsic parameters that I completely ignored so far is the lens distortion. All camera lenses do distort the image to a certain degree. By assuming a circular distortion the algorithms that measure the intrinsic parameters also calculate distortion parameters. If those parameters would be measured the Wiimote image points could be easily undistorted and the pose estimation results would be improved.
Technical Articles >