Wiimote Virtual Reality Desktop

In this article I want to introduce a Virtual Reality system that can be build with two Nintendo Wii Remotes (Wiimotes) and just a little extra hardware that costs less than 20 USD. By that VR should be affordable for everyone. The demo software and source can also be downloaded from this website in the download section. Please see the videos for a short introduction to this installation. You can also see my Coding4Fun article.

Virtual Reality Desktop

A VR Desktop setup uses a monitor as 3D output display and is sometimes referred to as fishtank Virtual Reality. I use a stereo monitor that needs a pair of polarization glasses to display stereo images. If you don't own a stereo monitor you can still construct the VR Desktop by using the anaglyph or red/green stereo method. For that you just need a pair of red/green glasses and you can use any monitor. Even though you loose color information the stereo effect is still very good.

For interactivity the two Wiimotes come into play. In this setup there are two kinds of interaction possible. First we have a kind of navigation support. Navigation inside a rendered 3D scenes means the movement of the camera, as the camera represents the eyes of the user. For the Desktop setup this means we have to track the users' eyes or head position to move the 3D scene camera correctly – therefore this is also called head tracking. For the head tracking I mount one Wiimote at the top of the monitor to point at the user's face.

The second Wiimote is mounted from the top in vertical down position facing the area in front of the monitor. This will be used to interact with the 3D scene by manipulating the transformation of a 3D object. As you use the hand for this interaction I call this hand tracking. The Virtual Reality Desktop setup I built looks like this:

Wiimote 6DOF Tracking

The core of this VR setup is using the Nintendo Wiimote Controller as a tracking camera. As one might already know the Wiimote is equipped with an infra-red camera that is able to recognize up to four infra-red lights. The main idea now is to build a device with four infra-red LEDs that can be recognized by the Wiimote. This device I call IR-LED beacon. Using the values of the 4 LEDs as recognized by the Wiimote I use an algorithm to reconstruct the original position and orientation of the IR-LED beacon. Getting all 3 axis translations and all 3 axis rotation angles of the device is called 6 degree of freedom tracking – or short 6DOF tracking.

IR-LED Beacon

Because this setup uses head tracking as well as a tracking for a hand manipulation two beacon have to be built. This are the ones I built:

As can bee seen the beacons just consist of 4 IR-LEDs, a battery holder and wiring. For easy construction and less wiring I use a stripboard. When choosing the IR-LEDs it is important to look out for a very wide angle of radiation. Typical LED just have a small angle, try to find LEDs with angles at least greater than 65°, a good source might be digikey.com. In Germany you can get them from Conrad. Three of the four LEDs are aligned in a line with only slightly different height. The fourth LED is mounted above the line with more height. This special order of the lights is needed by the algorithm to be able to assign the IR-points recognized by the Wiimote to the original LEDs of the beacon. It is also important that the fourth LED has not the same hight, so that the LEDs are not so planar. Please see the following picture for a schematic layout of the beacon. For power supply I just us on AAA battery and connect all LEDs in parallel to the battery poles. For easy handling I use a battery holder which are also available at electronic components supply stores.

After soldering everything together the exact positions of the LEDs have to be measured in the correct order. It is very important to measure the 3D position of the LED lights very accurately. If the data for these coordinates is not accurate the tracking results will be poor. The values have to be in millimeters.

For example, my LEDs have the following measurements:

<point3d value="1, 0, 8.5"/>

<point3d value="29, 0, 11"/>

<point3d value="56, 0, 6"/>

<point3d value="29, 45, 21"/>

Configuring the Software

After position the Wiimotes as shown and constructing and measuring the beacons you have to configure the software using two configuration files. You will find the files in the application's directory.




eyeDistance = "0.02"

switchLeftRight = "False"

fieldOfView = "60"

antiAlias = "False"

stereoMode = "lineInterlaced"

fullscreen = "False"

resolution = "1280,1024"

displayDevice = "Screen"

anaglyph = "True"

windowPosition = "0,0" />


If you own a line-interlaced 3D Monitor from Zalman, you can change the anaglyph to False. If you have difficulties adjusting your eyes to the stereo image you can try changing the value for the eyeDistance. Reducing the value makes it easier to adjust but might reduce the 3D effect. The other parameters should work as you might expect from the name.


In this file the configuration for the tracking is stored. It defines the Wiimotes and its positions and the IR-LED beacons. Furthermore it defines filtering parameters. Here I will point to the places where you need to adjust the values for your setup.

First you will need to specify the exact position of your Wiimotes relative to the center of the screen. You need to measure this distance for each Wiimote in millimeter. Probably the values will be similar to the default values:


id="2" cameramodel="Wiimote"

translation="0,200,50" rotation="0,0,0" scale="0.001"

xAxis="x" yAxis="y" zAxis="z">



id="1" cameramodel="Wiimote"

translation="0,350,350" rotation="0,0,0" scale="0.001"

xAxis="x" yAxis="z" zAxis="y">


The Wiimote with id=2 is the one mounted at the top of the monitor to do the head tracking. In the translation field enter the distance of the front tip of the Wiimote to the center of the screen. In the default settings the Wiimote is 200mm above (y-value) and 50mm in front (z-value) of the center. Do the same for the second Wiimote in the following entry with id=1. You can see, that here the y-axis is assigned to the z-axis and the z-axis is assigned to the y-axis. This is necessary because the Wiimote is mounted in vertical position. If the software recognizes the Wiimotes in wrong order you can exchange the order of the definition blocks.

The second part where you need to make changes is the definition of the IR-LED-Beacons:


<WiiMarkerBody id="0" name="WiiMote Head Beacon" nearClip="20" farClip="1500"

translation="0,0,0" rotation="0,0,0">

<point3d value="0, 4, 7"/>

<point3d value="40, 4.5, 10.5"/>

<point3d value="83.5, 5, 7.5"/>

<point3d value="38, 45, 18"/>

</WiiMarkerBody> <WiiMarkerBody id="4" name="WiiMote Hand Beacon" nearClip="20" farClip="1500"

translation="0,0,0" rotation="0,0,0">

<point3d value="1, 0, 8.5"/>

<point3d value="29, 0, 11"/>

<point3d value="56, 0, 6"/>

<point3d value="29, 45, 21"/>



Here you need to change the values for the beacon points in millimeter according to your measurements. Please note that the correct order of the points is necessary. They have to be from LED1 to LED4 according to the schematic in Picture 4.

Furthermore you will find a description for the tracked device:

<TrackedDevice id="4" type="WiiMote" rotation="True" translation="True">








<WorldRotation>0,0,0 </WorldRotation>




Here you assign which Wiimote (TrackedCamId) to use with which IR-LED-Beacon (MarkerBodyId). The only value that you might want to adjust from these settings is the LocalTranslation. The translation and rotation values that are calculated need to have their reference point on the beacon. By default the first LED will be this reference. Usually you want to have a different reference point, like the center of the beacon. Therefore measure the distance from the first LED to your preferred reference point in the same metric as the beacon values and write it in the LocalTranslation.

There are lot of more parameters you can tweak inside this configuration file, but to make the VR-Desktop run this shouldn't be necessary.

Run the Software

Before running the software it is necessary to connect the Wiimotes with the computer. For that the computer needs to be equipped with a compatible Bluetooth adapter. For more detailed description on how to do that please refer to Brian Peek's Wiimote Library article:


After putting in your values in the configuration files and placement and connection of the two Wiimotes you can run the binary installation by clicking on the VRDesktoDemo in the start menu. If you want to run from the source code you have to copy the OpenCV dlls from (VRDesktopSrc)\ExtLibs\OpenCV\opencvlib to the binary destination directory of the compiled project, e.g. (VRDesktopSrc)\VRDesktopDemo\bin\x86\Release before you can start the application.

Using the Library

Using the VRDesktop in your own XNA application is very easy. Here I will point out the relevant steps from the VRDesktopDemo application.

To start from scratch you will have to create a new XNA Windows Game Project. First include the references to the two libraries Tgex and Tvrx. Then open the created Game class.

At the top the namespaces for libraries have to be added:

using Tgex.Graphics;

using Tgex;

using Tvrx;

Then the parent class has to be changed from Game to VRGame:

public class VRDesktop : VRGame

VRGame is part of the Tgex library and adds the support for the stereo display. It creates a stereo camera and the application window according to the settings file. You can use the class more or less similar to the original game class. The main difference is, that for the drawing you must not override the Draw(GameTime time) function but the new DrawScene(GameTime time) because the Draw function of the VRGame class takes care about the stereo rendering.

For storing the transformation matrix of the hand tracker we define a variable:

Matrix modelTransform = Matrix.Identity;

And in this simple example we define a variable for the model:

Model model;

In the Initialize function the initialization of the TrackerManager needs to be done:

protected override void Initialize()


// The tracker manager is a singleton but needs to be initialized once.


// initialize base clase.



In the LoadContent function the model is loaded and the tracking is started:

protected override void LoadContent()


// for this demo just load the coordinate cross

model = Content.Load<Model>("coordinate");

modelTransforms = new Matrix[model.Bones.Count];

// start tracking now



The Update function is for the main game logic. First we allow the user to exit the game and the tracker to stop regularly:

// Allows the default game to exit on Xbox 360 and Windows

if ((GamePad.GetState(PlayerIndex.One).Buttons.Back == ButtonState.Pressed)

|| (Keyboard.GetState().IsKeyDown(Keys.Escape)))





Before getting the latest tracking data we have to call an update on the TrackerManager:


To get the transformation data we have to call the GetProxyTransform(indexNumber) from the manager. The proxies are defined in the tracking.xml file. In the example we call:

// tracking proxy with id 1 is the hand tracker

modelTransform = TrackerManager.Instance.GetProxyTransform(1);

// tracking proxy with id 0 is the head tracker - change eye position.

m_camera.EyePosition = TrackerManager.Instance.GetProxyTransform(0).Translation;

the m_camera is defined in the VRGame parent class. The camera class will also make the necessary camera transformation adjustments for screen projected head-tracking by creating a perspective of center projection matrix.

Finally int the DrawScene function the model mesh is drawn. Here we need to pass the model transformation matrix as well as the camera matrices to the effect:

//Draw the model, a model can have multiple meshes, so loop

foreach (ModelMesh mesh in model.Meshes)


//This is where the mesh orientation is set, as well as our camera and projection

foreach (BasicEffect effect in mesh.Effects)



effect.World = modelTransforms[mesh.ParentBone.Index]

* Matrix.CreateScale(0.01f)

* modelTransform;

effect.View = m_camera.ViewMatrix;

effect.Projection = m_camera.ProjectionMatrix;


//Draw the mesh, will use the effects set above.



Thats all what has to be done to make use of the Wiimote Virtual Reality Desktop in your own application.

How does it work

For the interested reader I will explain now in more detail how the Wiimote tracking actually works. Anyhow, I will not go into the mathematical details of the core algorithm but provide you with the necessary references. I will just focus on the Wiimote tracking part and will not go into details of neither the tracking library Tvrx nor the game library Tgex.

Pose Estimation

Technically speaking, the Wiimote tracking is an optical marker based tracking. It is optical because we use the Wiimote camera and it is marker based, because we don't use the whole camera image. Instead the Wiimote returns the position of up to four infrared LEDs which represent our markers.

The four LED positions are given in 2D coordinates which represent the markers as seen by the camera. Technically this is a 2D projection of the real LEDs according to the focus of the camera lens, this projection plane is called image plane - the 2D coordinated are therefore called image points.

The main task now is to calculate the position and rotation of the IR-LED Beacon in real space according to the image points of the LEDs measured by the Wiimote. This task is called Pose Estimation and has been investigated by scientists since many years. One main application of pose estimation is computer vision for robotics. The pose estimation algorithm used for the Wiimote tracking has already been published in 1995 and still performs nice for our purpose. If you are interested in the exact details about how the pose estimation algorithm works, you can study the original paper, please see the references.

To be able to perform the pose estimation calculation you need the exact information about the image plane. Actually, the plane is nothing else then the image sensor of the camera. The picture is focused on the image sensor by the camera lens. For the calculation of the projection we need a mathematical model for the camera, in computer vision usually a pinhole camera model is used. The pinhole is the projection point origin. The image plane is at a certain distance from the origin. This distance is called focus length.

Now, what we need for the calculation is the focal length and the size of image plane, the camera sensor. These values are also called the intrinsic camera values. Unfortunately Nintendo does not publish these values for the Wiimote, so the values I use are assumptions:

focal length in pixel = 1380

// assume 1/4" cd sensor (even though it probably is not)

pixel size in mm = 0.0035

chip resolution = 1024x768

// Wiimote center (approx.)

principal point = 512x384

The resolution of the values returned by the Wiimote are 1024x768. Obviously this is not the physical resolution because cameras with this resolution would cost more than 1000 USD. The Wiimote has a PixArt Imaging Inc. (http://www.pixart.com.tw) sensor and probably has a resolution of 352x288 or 164x124. However, trying to guess the real values with the help of the PixArt Sensor data sheets did not work out satisfactory so I decided to fix the pixel size and resolution on the above values and estimate the focal length. Even thought the values are not correct, they only need to be correct relative to each other to make the pose estimation work. The principal point is the actual origin of the image plane. Ideally this value should be measured, here I just assume it is the middle of the sensor chip.

Overview of the tracking algorithm

The overall tracking algorithm can be divided in the following steps:

    • Retrieve the image point values of the infrared LEDs from the Wiimote.

    • Assign the image points to the LED beacon lights.

    • Run the pose estimation to calculate the rotation and translation of the LED beacon.

    • Filter the resulting rotation and translation values.

    • Build a transformation matrix and transform the result according to configuration file.

    • (Wiimote position and orientation and local LED beacon transformation)

In the following I will describe the steps in more detail and with code examples.

Retrieve image points

The connection and data retrieval with the Wiimote is done in the WiiMoteTracker class. This class implements the interface IMarkerTracker, that defines the interface for an optical marker based tracker. The Wiimote is initialized in the Initialize() function and connected when StartTracking() is called :

public void Initialize()


// test static variable for first time call

if (m_wiimoteCount == 0)




if (m_wiimoteCollection.Count <= m_wiimoteCount)


ErrorHandler.Report("Invalid WiimoteTracker count, only "

+ m_wiimoteCollection.Count.ToString() + " Wiimotes found");



wm = m_wiimoteCollection.ElementAt(m_wiimoteCount);

m_wiimoteId = m_wiimoteCount;


// setup the event to handle state changes

wm.WiimoteChanged += wm_WiimoteChanged;

// setup the event to handle insertion/removal of extensions

wm.WiimoteExtensionChanged += wm_WiimoteExtensionChanged;

// create filter for accelerator values

AverageFilterDesc filterDesc = new AverageFilterDesc();

filterDesc.numOfValues = 1000;

for (int i = 0; i < 3; i++)


m_acceleratorFilter[i] = new AverageFilter();



// create filter for image points

filterDesc = new AverageFilterDesc();

filterDesc.numOfValues = 5;

for (int i = 0; i < 8; i++)


m_imagePointsFilter[i] = new AverageFilter();




public void StartTracking()


if (!m_isTracking)


m_isTracking = true;

// connect to the Wiimote




// set the report type to return the IR sensor and accelerometer data (buttons always come back)

wm.SetReportType(InputReport.IRAccel, true);





ErrorHandler.Report("Cannot connect to Wiimote");

m_isTracking = false;




To receive the Wiimote data the wm_WiimoteChanged Callback has been registered. This function is then called whenever the Wiimote has updated values. Inside this function the infrared LED values are read and the LED beacon light assignment is done.

For that, first a list with with Vector2 for the image points is created:

// put in list


for (int i = 0; i < 4; i++) irList.Add(

new Vector2((float)(ws.IRState.IRSensors[i].Position.X * m_resolution.X) - m_principalPoint.X,

(float)(ws.IRState.IRSensors[i].Position.Y * m_resolution.Y) - m_principalPoint.Y));

Assign image points

Then the values have to be assigned to the IR-LEDs by putting them into the right order. This is done by a simple geometric pattern recognition. The idea is to have a geometric pattern that is invariant to the projection from 3D to 2D. As can be seen in picture 4, the 3 LEDs of the LED beacon are arranged in more or less a line and the 4th LED is above of the line. In the 2D image data of the Wiimote the 3 LEDs also form more or less a line. Therefore, the first step in the assignment algorithm is to find the 3 image points which are most close to form a line.The line test is done in the following function:

void TestPoints(Vector2 lineStartPoint,

Vector2 lineEndPoint,

Vector2 onLinePoint,

Vector2 freePoint)


float lambda;

float dist = onLinePoint.DistanceToLine(lineStartPoint, lineEndPoint, out lambda);

// check if projected point is between line end and start point

if ((lambda > 0) && (lambda < 1))


// if distance is short, make this combination the result

if (dist < m_pointLineDist)


m_pointLineDist = dist;

m_lineStartPoint = lineStartPoint;

m_lineEndPoint = lineEndPoint;

m_onLinePoint = onLinePoint;

m_freePoint = freePoint;




The function is called with the four image points as input. It assumes the first point to be the line start point and the second point to be the line end point. Then the distance of the third point to the line is calculated. This is done by using a C# 3.0 extension function of Vector2:

public static float DistanceToLine(this Vector2 point,

Vector2 startLinePoint, Vector2 endLinePoint, out float lambda)


Vector2 rv = endLinePoint - startLinePoint;

Vector2 p_ap = point - startLinePoint;

float dot_rv = Vector2.Dot(rv, rv);

lambda = Vector2.Dot(p_ap, (rv / dot_rv));

Vector2 distVec = point - (startLinePoint +

lambda * rv);

return distVec.Length();


The line distance test is a typical algorithm as described on: http://mathenexus.zum.de/html/geometrie/abstaende/AbstandPG.htm

It returns the distance to the line and an lambda value, which defines the position of the line projected point. If the projection point is outside the line start and line end point the lambda will be below 0 or greater than 1.

If the line was valid the distance of the third point is compared to the formerly smallest distance, and if it is smaller, then this order of image points will be saved as best solution.

To find the right order of the image points this function has to be called with all possible combinations of the 4 LED image points. In my code I do all the calls explicitly after initializing the minimum distance with float max value:

m_pointLineDist = float.MaxValue;

// write all test cases explicite

// in the end the three point line should be found

TestPoints(irList[0], irList[1], irList[2], irList[3]);

TestPoints(irList[0], irList[1], irList[3], irList[2]);

TestPoints(irList[0], irList[2], irList[1], irList[3]);

TestPoints(irList[0], irList[2], irList[3], irList[1]);

TestPoints(irList[0], irList[3], irList[1], irList[2]);

TestPoints(irList[0], irList[3], irList[2], irList[1]);

TestPoints(irList[1], irList[2], irList[0], irList[3]);

TestPoints(irList[1], irList[2], irList[3], irList[0]);

TestPoints(irList[1], irList[3], irList[0], irList[2]);

TestPoints(irList[1], irList[3], irList[2], irList[0]);

TestPoints(irList[2], irList[3], irList[0], irList[1]);

TestPoints(irList[2], irList[3], irList[1], irList[0]);

Now that we have the right order of the three points that form the line and the 4th point, it is still necessary to determine the right direction of the line. In our LED beacon the 4th LED is above the line. If the start and end point of the line would be interchanged then the 4th LED would be below the line. Mathematically we check if the order of the points is clockwise or counterclockwise:

// only remaing test is check if line start and end point is in right order

// check start and end line point with free point

// if clockwise direction then ok, if counterclockwise then exchange start and end points

Vector2 E1 = m_lineStartPoint - m_freePoint; // P1-P2

Vector2 E2 = m_lineEndPoint - m_freePoint; // P3-P2

bool clockwise;

if ((E1.X * E2.Y - E1.Y * E2.X) >= 0) clockwise = true;

else clockwise = false;

if (!clockwise)


Vector2 tmp = m_lineEndPoint;

m_lineEndPoint = m_lineStartPoint;

m_lineStartPoint = tmp;


The algorithm for the clockwise check is taken from http://www.geocities.com/siliconvalley/2151/math2d.html

Then the correctly ordered points are slightly filtered with a simple average filter over the last 5 values and multiplied with the pixel size the change from pixel metric to millimeter metric. Finally the points are passed to the pose estimation class.

// now write resulting order to image points

m_imagePoints[0].X = m_imagePointsFilter[0].Filter(m_lineStartPoint.X);

m_imagePoints[0].Y = m_imagePointsFilter[1].Filter(m_lineStartPoint.Y);

m_imagePoints[1].X = m_imagePointsFilter[2].Filter(m_onLinePoint.X);

m_imagePoints[1].Y = m_imagePointsFilter[3].Filter(m_onLinePoint.Y);

m_imagePoints[2].X = m_imagePointsFilter[4].Filter(m_lineEndPoint.X);

m_imagePoints[2].Y = m_imagePointsFilter[5].Filter(m_lineEndPoint.Y);

'm_imagePoints[3].X = m_imagePointsFilter[6].Filter(m_freePoint.X);

m_imagePoints[3].Y = m_imagePointsFilter[7].Filter(m_freePoint.Y);

for (int i = 0; i < 4; i++)


m_imagePoints[i] *= m_pixelSize;


// send points to estimations


Run pose estimation

The pose estimation is done by the class Posit. This class implements the IPoseEstimate interface:

public interface IPoseEstimate


void InitializeCameraParameter(double focalLengthMM, bool flipImage, float scale,

int[] assignAxis, int[] assignAxisSign);

void InitializeMarkerBody(Vector3[] markerPoints);

void UpdateImagePoints(Vector2[] imagePoints);

void GetTransform(out Vector3 position, out Vector3 rotation);

void StartEstimation();

void StopEstimation();


The pose estimation has to be initialized with the focal length of the tracking camera. The 3D positions of the real device – in our case the LEDs of the LED beacon – are passed in the InitializateMarkerBody function. The measured image points are passed with the UpdateImagePoints call and the calculated result can be taken from the GetTransform function. Because the pose estimation itself runs in an own thread asynchronously it has to be started and stopped by StartEstimation and StopEstimation. By using the interface it is easy to plug in different pose estimation algorithms.

As mentioned before the pose estimation algorithm used here is the PosIt algorithm published by D. DeMenthon. I use the implementation from the OpenCV computer vision library. As this library is C code it has to wrapped to manage code. I use the freely available wrapper EmguCV. Before the pose estimation can be done a pose estimation object has to be created. This is done when the 3D positions of the markers are passed:

public void InitializeMarkerBody(Vector3[] markerPoints)


m_numOfMarker = markerPoints.Length;

MCvPoint3D32f[] worldMarker = new MCvPoint3D32f[m_numOfMarker];

for (int i = 0; i < m_numOfMarker; i++)


worldMarker[i].x = markerPoints[i].X;

worldMarker[i].y = markerPoints[i].Y;

worldMarker[i].z = markerPoints[i].Z;


m_positObject = CvInvoke.cvCreatePOSITObject(

worldMarker, m_numOfMarker);

m_imagePoints = new MCvPoint2D32f[m_numOfMarker];

m_imagePointsBuffer = new Vector2[m_numOfMarker];


MCvPoint32f is a managed structure for the OpenCV CvPoint32f and similar to a Vector3f. The CvInvoke class of the EmguCV wrapper is a collection of static functions to invoke the original OpenCV functions. Because the pose estimation algorithm was not included in the class I had to insert the following functions:

/// <summary>

/// Create pose estimation object

/// </summary>


public static extern IntPtr cvCreatePOSITObject(

MCvPoint3D32f[] points,

int point_count);

/// <summary>

/// Do pose estimation

/// </summary>


public static extern void cvPOSIT(

IntPtr posit_object,

MCvPoint2D32f[] image_points,

double focal_length,

MCvTermCriteria criteria,

float[] rotation_matrix,

float[] translation_vector);

/// <summary>

/// Release pose esitmation object

/// </summary>


public static extern void cvReleasePOSITObject(

IntPtr posit_object);

The returned object of the CvInvoke.cvCreatePOSITObject function call is a simple IntPtr and is used later for the pose estimation function.

The pose estimation itself is done in an own thread in the PoseEstimate() function. First the new image points are fetched. If no update is available we wait for new values. This is done with the Monitor.Wait and Monitor.Pulse mechanism:

// copy image points


if (!m_imagePointsUpdate)




for (int i = 0; i < m_numOfMarker; i++)


m_imagePoints[i].x = m_imagePointsBuffer[i].X;

m_imagePoints[i].y = m_imagePointsBuffer[i].Y;


m_imagePointsUpdate = false;


After getting the new image values the cvPOSIT function is invoked:

MCvTermCriteria criteria;


criteria.epsilon = 0.00001;

criteria.max_iter = 500;



m_imagePoints, m_focalLengthMM,

criteria, POSITRot, POSITTrans);

Because the algorithm is an iterative algorithm the MCvTermCriteria defines when the algorithm should terminate. Here I defined that it should either terminate when either 500 iteration steps have been reached or when the difference of the values from the former iterationen are smaller than 0.00001. You can play arround with these value to see how the tracking accuracy reacts. Besides the termination createria you have to pass the cvPOSIT function the IntPtr to the posit object, the image points and the camera focal length in millimeter. As result you get a 9 float array for the rotation matrix and a 3 float array for the translation result.

Because the rotation values should later be filtered they are translated to euler angels in the EulerAngles function. Euler angles define the rotation by giving the rotation about each coordinate axis. Before storing the final values there is some axis swapping and scaling according to the settings in the tracker.xml.

Filter the estimation results

Because the resolution of the Wiimote camera is not very hight and optical tracking always to some degree noisy the transformation results have quite strong jitters. To reduce the jittering the result values have to be filtered. As a side effect of strong filtering in tracking the virtual object seems not to follow the tracked objects movements directly and feels like swimming behind. A good compromise between jitter reduction and direct response is using Kalman filters. Kalman filters use a mathematical model to predict the change of the values and then uses the measured data to correct its prediction. A good introduction to Kalman Filters is the Siggraph 2001 Course from Greg Welch – see References. Anyhow, determine the best parameters for the filter for non-mathematicians is difficult. A good reference how to apply the filter in the tracking domain is the dissertation of Ronald Azuma “Predictive Tracking for Augmented Reality”. Please refer to that document if you want to learn the meaning of the parameters. For the Wiimote VR-Desktop the Kalman parameters are defined in the tracking.xml file:

<Kalman class="KalmanFilter" id="4" A="1, 0.005, 0, 1" measurement_noise_cov="1.0"

process_noise_cov_1="0.0000001" process_noise_cov_2="0.0000001"/>

The Kalman implementation used is again part of the OpenCV library. The EmguCV wrapper comes with a complete wrapper for this function. In my implementation there is an interface for data filers, the IDataFilter. The KalmanFilter class implements this interface. Beside the Kalman filter there is also a simple AvarageFilter in the library. The initialization of the Kalman filter is like this:

public void SetFilterDesc(DataFilterDesc desc)


m_kalman = new Kalman(2, 1, 0);

filterDesc = (KalmanFilterDesc)desc;

// set A - second parameter is frames per second

m_kalman.TransitionMatrix.Data.SetValue(filterDesc.A[0], 0, 0);

m_kalman.TransitionMatrix.Data.SetValue(filterDesc.A[1], 0, 1);

m_kalman.TransitionMatrix.Data.SetValue(filterDesc.A[2], 1, 0);

m_kalman.TransitionMatrix.Data.SetValue(filterDesc.A[3], 1, 1);

// set H

m_kalman.MeasurementMatrix.Data.SetValue(1.0f, 0, 0);

m_kalman.MeasurementMatrix.Data.SetValue(0.0f, 0, 1);

// set Q

CvInvoke.cvSetIdentity(m_kalman.ProcessNoiseCovariance.Ptr, new MCvScalar(1));

m_kalman.ProcessNoiseCovariance.Data.SetValue(filterDesc.process_noise_cov_1, 0, 0);

m_kalman.ProcessNoiseCovariance.Data.SetValue(filterDesc.process_noise_cov_2, 1, 0);

// set R

CvInvoke.cvSetIdentity(m_kalman.MeasurementNoiseCovariance.Ptr, new MCvScalar(1e-5));

m_kalman.MeasurementNoiseCovariance.Data.SetValue(filterDesc.measurement_noise_cov, 0, 0);

CvInvoke.cvSetIdentity(m_kalman.ErrorCovariancePost.Ptr, new MCvScalar(500));

m_kalman.ErrorCovariancePost.Data.SetValue(2, 0, 0);


After initializing a float value can be simply filter in the Filter function:

public float Filter(float inData)


// Z measurement

data.Data[0,0] = inData;



return m_kalman.CorrectedState[0, 0];


Because in the resulting transformation of the pose estimation there are 3 float values for the translation and 3 float values for the rotation altogether 6 separate instances of the filter are needed. In the Tvrx library the filtering is done inside the TrackedDevice class, which is a parent class for tracked devices and is derived to the TrackedWiimote:

public virtual void Filter()


m_rawTranslation.X = m_translationFilter[0].Filter(m_rawTranslation.X);

m_rawTranslation.Y = m_translationFilter[1].Filter(m_rawTranslation.Y);

m_rawTranslation.Z = m_translationFilter[2].Filter(m_rawTranslation.Z);

m_rawRotation.X = m_rotationFilter[0].Filter(m_rawRotation.X);

m_rawRotation.Y = m_rotationFilter[1].Filter(m_rawRotation.Y);

m_rawRotation.Z = m_rotationFilter[2].Filter(m_rawRotation.Z);


Final transformations

Also in the TrackedDevice class the transformation of the tracking values from the camera world space coordinate system to the actual in game virtual space coordinate system takes place:

public virtual void TransformToVirtualSpace()


Matrix bodyTransformMatrix =

Matrix.CreateFromYawPitchRoll(m_rawRotation.Y, m_rawRotation.X, m_rawRotation.Z)

* Matrix.CreateTranslation(m_rawTranslation);

Matrix result = m_TrackerWorldTransform * m_DeviceWorldTransform;

result = bodyTransformMatrix * result;

result = m_DeviceLocalTransform * result;

Vector3 scale;

result.Decompose(out scale, out m_Rotation, out m_Translation);


First a transformation matrix of the Euler angels and the translation vector is composed to the Body-Transform. From the tracking.xml configuration file we also have matrices for Tracker-World-Transform, Device-World-Transform and Device-Local-Transform.

In addition to the Tracker-World-Transform translation from the tracker.xml I calculate the rotation angles of the Wiimotes using the acceleration sensors. By that it is possible to rotate the Wiimotes around the x and z axis to better focus the area you want to track and still get correct tracking results automatically.

Finally, to get the correct final transformation the matrices have to be multiplied in the correct order:

Device-Local-Transform * Body-Transform * Tracker-World-Transform * Device-World-Transform

Now the pose estimation transformation result is ready to be read by the TrackerManager.

Conclusion and Limitation

As I have shown it is possible to create a low-cost Desktop Virtual Reality Setup using two Wiimotes and anaglyph stereo glasses. But because of the low resolution of the Wiimote camera the quality is not comparable to professional monocular tracking systems. However, the quality of the presented system could still be improved by correctly measuring the intrinsic parameters of the Wiimote camera. There are various known algorithms how to measure these parameters by a set of sample data of an object with regular known geometry. For standard cameras usually a checkboard pattern is used. An algorithm for that purpose is also integrated in OpenCV, so applying it to the Wiimote shouldn't be to hard.

Another issue with intrinsic parameters that I completely ignored so far is the lens distortion. All camera lenses do distort the image to a certain degree. By assuming a circular distortion the algorithms that measure the intrinsic parameters also calculate distortion parameters. If those parameters would be measured the Wiimote image points could be easily undistorted and the pose estimation results would be improved.


    • D. DeMenthon and L.S. Davis, "Model-Based Object Pose in 25 Lines of Code", International Journal of Computer Vision, 15, pp. 123-141, June 1995.

    • Paper about the used pose estimation algorithm by Daniel DeMenthon. You can find this brilliant paper on his homepage.

    • OpenCV: Open Source computer vision library. Includes implementation of the pose estimation algorithm and Kalman filter.

    • EmguCV: C# wrapper for OpenCV

    • WiimoteLib: Managed Library for Nintendo's Wiimote by Brian Peek

    • XNAnimation: Very nice library for animations in XNA. I didn't use it for the actual Wiimote tracking, but used his animation demo application in the demonstration video.s

    • XNA: Game development library in C# that I use as base for my applications.

    • An Introduction to the Kalman Filter. Greg Welch and Gary Bishop. Siggraph 2001 Course 8.

    • Predictive Tracking for Augmented Reality. Ronald Tadao Azuma. Dissertation. University of North Carolina. February 1995.