# Adding Image Constraints to Inverse Kinematics for Human Motion Capture

- Antoni Jaume-i-Capó
^{1}Email author, - Javier Varona
^{1}, - Manuel González-Hidalgo
^{1}and - Francisco J. Perales
^{1}

**2010**:142354

https://doi.org/10.1155/2010/142354

© Antoni Jaume-i-Capó et al. 2010

**Received: **15 May 2009

**Accepted: **8 July 2009

**Published: **1 September 2009

## Abstract

In order to study human motion in biomechanical applications, a critical component is to accurately obtain the 3D joint positions of the user's body. Computer vision and inverse kinematics are used to achieve this objective without markers or special devices attached to the body. The problem of these systems is that the inverse kinematics is "blinded" with respect to the projection of body segments into the images used by the computer vision algorithms. In this paper, we present how to add image constraints to inverse kinematics in order to estimate human motion. Specifically, we explain how to define a criterion to use images in order to guide the posture reconstruction of the articulated chain. Tests with synthetic images show how the scheme performs well in an ideal situation. In order to test its potential in real situations, more experiments with task specific image sequences are also presented. By means of a quantitative study of different sequences, the results obtained show how this approach improves the performance of inverse kinematics in this application.

## Keywords

## 1. Introduction

In biomechanical applications that aim to study human motion, a critical component is to accurately obtain the 3D joints' positions of the user's body. Usually, the most common methods to obtain the joints' positions require a laboratory environment and the attachment of markers to the body. Modern biomechanical and clinical applications require the accurate capture of normal and pathological human movement without the artifacts associated with standard marker-based motion capture techniques such as soft tissue artifacts and the risk of artificial stimulus of taped on or strapped on markers [1]. Emerging techniques and research in computer vision are leading to the rapid development of the markerless approach to motion capture [2].

In computer vision, algorithms are designed to allow the system to analyze one or multiple image streams in order to recover human motion. However, the images are 2D and the human body representation is in 3D. This fact leads to the presence of ambiguities; there are a number of possible 3D configurations of the human body that could explain a single image. In addition, these images can be noisy or incomplete (some joints or limbs are not visible). Therefore, we can only estimate the users posture. Inverse kinematics approaches can solve the body posture from their 3D position if we can clearly locate visible body parts such as face and hands. For example, in the work of Zou et al. [3], the angles of joints are estimated by inverse kinematics based on human skeleton constraints, and the coordinates of pixels in the body segments in the scene are determined by forward kinematics. Finally the human motion pose can be reconstructed by histogram matching. Their main drawback is that the algorithm does not handle human motion in the direction perpendicular to the image plane displacement. In the case of multiple cameras, ambiguities appear to be less significant. For example, by using two cameras to recover the user's posture in order to recognize the user's gestures for Human-Computer Interaction applications [4]. This work is also based on computer vision and inverse kinematics in order to recover human body posture.

In order to show the viability of this scheme of 3D human posture recognition, different experiments have been conducted. First, using synthetic images in order to show how the scheme works in an ideal situation. This simple case shows how theoretically the scheme performs correctly. Next, in order to test its potential in real situations, experiments with real sequences are also presented. In these experiments an annotated sequence and a known database of human motions that contains motion capture data are used to make a quantitative study of different sequences in order to evaluate the performance of the presented approach.

This paper is organized as follows. In next section, current inverse kinematics approach is reviewed in order to introduce in Section 3 the image constraints. Section 4 analyzes the obtained results in order to demonstrate the viability of this approach. Finally, conclusions are presented in the last section.

## 2. Inverse Kinematics

In order to capture human motion, the human body is usually modeled as an articulated chain, which consists of a set of rigid objects, called links, joined together by joints. To control the movement of an articulated chain it is common to use inverse kinematics (IK). IK is exploited to reconstruct an anatomically correct posture of the user (i.e., its joint state) considering the 3D locations of selected end-effectors which are used to constrain the posture.

For the moment, let us consider only a single frame of motion. Write the vector of joint angles as . Assume that we would like to meet a set of constraints on joints positions as as functions of the joints degrees of freedom . The problem of inverse kinematics is to obtain a such that . Closed forms solutions are available for at least some parameters, when the limbs of the articulated chain are considered independently [5]. More often, one must see this as a numerical root finding problem based on the linearization of the set of constraints on joint positions, , considering small displacements about the current configuration, ,

The resulting Jacobian matrix is inverted to map the desired constraint variation to a corresponding posture variation . Using the pseudoinverse, noted by , the norm of the solution mapped by is minimal, that is, it is the smallest posture variation realizing the desired constraint variation:

Since there is an infinite number of solutions. For the positioning and animation of articulated figures in computer graphics, the weighting strategy [6] is frequently employed. In the field of robotics however, the strategy is to solve this inverse kinematics redundancy adding a secondary term (usually defined as secondary task) to (3) in order to minimize a criterion . In this formulation, redundancy solution is accomplished by moving the joints such that the end-effectors are moved in the desired way and the criterion is always kept at a minimum. This was first exploited by Liégeois [7] who added a secondary task by projecting the negative gradient of into the null space of , see (4),

where is the identity matrix, and is a positive gain factor which is configuration dependent. Definition of the secondary task by means of the criterion depends on the application. In following section, a criterion based on images will be defined in order to capture human motion.

## 3. The Image-Based constraint

As explained in previous section, it is possible to constrain the solutions of inverse kinematics by adding a scalar criterion . Next, we explain how to define this criterion by using the images in order to guide the posture reconstruction of the articulated chain for human motion capture applications.

The definition of the image-based constraint is inspired in the works of Visual Servo Control [8]. Specifically, Marchand and Courty define different secondary tasks for controlling a camera in virtual environments [9]. For motion capture purposes, it should be taken into account that the human structure is highly redundant and, therefore, a large solution space exists. A solution is to generalize (4) to include more tasks by using the priority strategy [10]. In this case, the solution guarantees that a task associated with a high priority will be achieved as much as possible, while a low-priority constraint will be optimized only on the reduced solution space that does not disturb all higher priority tasks. However, for the sake of clarity, we only consider two tasks. It is straightforward to extend to more tasks when the image constraint has low priority. In addition, it is possible to use the Extended Jacobian method [11] in order to give a high priority to the image constraint.

In order to complete the definition of (5), let us to define the function
, which is the projection of the articulated chain into the image
. If
are the coordinates of the *i* th joint in the 3D-space, and assuming knowing the calibration data in order to project the 3D coordinates into the 2D images, we define
such as the 2D image coordinates of the projected *i* th joint into the image
. Assuming that the joints are ordered in a consecutive way, the
function is defined as follows:

## 4. Performance Evaluation

The proposed approach is evaluated using three different tests. The first test uses a virtual environment to show how the presented approach runs well in an ideal situation. The second test applies the proposed approach on a sequence of user's motions to show how the presented approach performs well using real images. Finally, the third test compares the evaluation of the inverse kinematics approach, with and without image constraints, by using HumanEva dataset [14]. This dataset comprises four subjects performing six different types of actions recorded in seven calibrated video sequences from different viewpoints. Additionally, the video sequences are synchronized with their corresponding motion captured 3D pose parameters.

In addition, the complete algorithm (with and without image constraint) has been implemented in Visual C++ using the OpenCV libraries [15] and it has been tested in a realtime interaction context on an Intel Core2 QUAD Q6600 under Windows Vista. First, without the image constraint, we have obtained a performance of 21 frames per second (with 15 steps of convergence). Second, with the image constraint, we have obtained a performance of 19 frames per second (with 15 steps of convergence). Therefore, its use in human-computer interaction applications is also possible. For other uses in non real-time applications the accuracy could be improved adding more steps of convergence.

### 4.1. Virtual Environment

First, we test the system in a virtual environment to show how the presented approach works in an ideal situation. We define an articulated chain in 2D space, composed by 4 segments, with a total of 4 rotational joints of 1 DOF each one (i.e., a rotational joint for each segment). To test the system, we generate an initial configuration and an objective configuration of the articulated chain. Next, we apply the inverse kinematics approach, from initial configuration with and without image constraint, to estimate the objective configuration of the articulated chain.

### 4.2. Using Real Images

In this test we apply the inverse kinematics with image constraint approach on a real stereoscopic sequence of human motions. Besides, the 3D joints' positions of the sequence are manually annotated for a quantitative comparison. The sequence has 450 frames corresponding to 15 seconds in real-time. The main objective of this test is to show that the proposed approach performs well with real images. In addition, this experiment shows that using this approach it is possible to solve the problem using only one image.

In order to perform a quantitative evaluation the mean squared error is used. Formally, the error between an estimated 3D joint and the truly performed one from ground truth data is computed as

Comparison by using the manually annotated sequence.

PIK (mm) | IBIK-1 view (mm) | IBIK-2 views (mm) | |
---|---|---|---|

Left elbow | 46.54 | 20.05 | 19.81 |

Right elbow | 42.40 | 19.86 | 19.07 |

### 4.3. HumanEva Test

This test evaluates our system using two views of two sequences of real motions, *walking* and *box*, of the subject 1 of HumanEva dataset. These sequences have a total of 3050 frames per view. Due to the fact that this database also contains the 3D positions of the joints by using markers, the objective of these experiments is to make a quantitative evaluation of our approach. We define an articulated chain in 3D space, with 2 rotational joints of 3 DOF each one, in order to estimate the configurations of the arm (for the *box* sequence) and the leg (for the *walking* sequence).

Overall error of the estimation of the 3D positions of the internal joints for the two sequences (the elbow in the case of the arm and the knee in the case of the leg) for the two sequences.

IK (mm) | IBIK (mm) | |
---|---|---|

| 47.72 | 21.39 |

| 40.69 | 16.35 |

*box*sequence are shown in Figures 10 and 11, where it is possible to see how the inverse kinematics approach loses the elbow position, and how, by adding the image-based constraints, the elbow's estimation lies inside the silhouette. This fact can also be observed in the graphic of Figure 12 where the elbow's estimation error by frame of the two approaches is displayed.

*walking*sequence, where a similar performance than previous sequence can also be observed . In this case, by adding the image-based contraint the recovered motion of the leg is more natural, avoiding the artifacts caused by the inverse kinematics approach in the knee's estimation.

## 5. Conclusion

The invasiveness of the sensor system, the high dimension of the posture space, and the modeling approximations in the mechanical model of the human body are sources of errors that accumulate and result in an approximate posture that could not be sufficient in biomechanical applications that study human motion precisely. Specifically, applications based on computer vision and inverse kinematics approaches presents the problem that no information was available to locate the internal joints and this forced the IK approach to make a somewhat arbitrary decision about what was the optimal angle for these joints.

In this paper, we present how to add image constraints to the inverse kinematics formulation in order to solve this problem. We have proposed a criterion that tries to guide the articulated chain to the body projection into the image. In this way, impossible chain configurations are avoided. Experiments using synthetic images show how this approximation performs correctly and, how to solve difficult situations that occur when there are motions that do not imply to the end-effectors. Besides, we have evaluated our approach using real images, including sequences of a known human motion database in order to compute quantitative results. The computed error, about 2 centimeters, can be considered as sufficiently small to permit its use in motion capture applications. Moreover, adding the image constraint implies that the solution of the kinematic chain is more independent on initial configuration.

As future work, we plan to generalize this approach to include more tasks by using the priority strategy. In this way, it would be possible to use more complex models of the human body in to order to achieve better estimations.

## Declarations

### Acknowledgment

This work has been supported by the Spanish MEC under projects TIN2007-67993 and TIN2007-67896. Dr. J. Varona also acknowledge the support of a Ramon y Cajal (cofunded by the European Social Fund) Postdoctoral fellowship from the Spanish MEC.

## Authors’ Affiliations

## References

- Mündermann L, Corazza S, Andriacchi TP: The evolution of methods for the capture of human movement leading to markerless motion capture for biomechanical applications.
*Journal of NeuroEngineering and Rehabilitation*2006., 3, article 6:Google Scholar - Moeslund TB, Hilton A, Krüger V: A survey of advances in vision-based human motion capture and analysis.
*Computer Vision and Image Understanding*2006, 104(2-3):90-126. 10.1016/j.cviu.2006.08.002View ArticleGoogle Scholar - Zou B, Chen S, Shi C, Providence UM: Automatic reconstruction of 3D human motion pose from uncalibrated monocular video sequences based on markerless human motion tracking.
*Pattern Recognition*2009, 42(7):1559-1571. 10.1016/j.patcog.2008.12.024View ArticleMATHGoogle Scholar - Varona J, Jaume-i-Capó A, Gonzàlez J, Perales FJ: Toward natural interaction through visual recognition of body gestures in real-time.
*Interacting with Computers*2009, 21(1-2):3-10. 10.1016/j.intcom.2008.10.001View ArticleGoogle Scholar - Tolani D, Goswami A, Badler NI: Real-time inverse kinematics techniques for anthropomorphic limbs.
*Graphical Models*2000, 62(5):353-388. 10.1006/gmod.2000.0528View ArticleMATHGoogle Scholar - Zhao J, Badler NI: Inverse kinematics positioning using nonlinear programming for highly articulated figures.
*ACM Transactions on Graphics*1994, 13(4):313-336. 10.1145/195826.195827View ArticleGoogle Scholar - Liégeois A: Automatic supervisory control of the configuration and behavior of multibody mechanisms.
*IEEE Transactions on Systems, Man and Cybernetics*1977, 7(12):868-871.View ArticleMATHGoogle Scholar - Chaumette F, Hutchinson S: Visual servo control. II. Advanced approaches [Tutorial].
*IEEE Robotics and Automation Magazine*2007, 14(1):109-118.View ArticleGoogle Scholar - Marchand E, Courty N: Controlling a camera in a virtual environment.
*Visual Computer*2002, 18(1):1-19. 10.1007/s003710100122View ArticleMATHGoogle Scholar - Baerlocher P, Boulic R: An inverse kinematics architecture enforcing an arbitrary number of strict priority levels.
*Visual Computer*2004, 20(6):402-417.View ArticleGoogle Scholar - Klein CA, Chu-Jenq C, Ahmed S: A new formulation of the extended Jacobian method and its use in mapping algorithmic singularities for kinematically redundant manipulators.
*IEEE Transactions on Robotics and Automation*1995, 11(1):50-55. 10.1109/70.345937View ArticleGoogle Scholar - Horprasert T, Harwood D, Davis LS: A statistical approach for real-time robust background subtraction and shadow detection.
*Proceedings of the 7th IEEE International Conference on Computer Vision, Frame Rate Workshop (ICCV '99), September 1999, Kerkyra, Greece*1-19.Google Scholar - Bailey DG: An efficient euclidean distance transform.
*Proceedings of the 10th International Workshop Combinatorial Image Analysis (IWCIA '04), December 2004, Auckland, New Zealand*394-408.View ArticleGoogle Scholar - Sigal L, Black MJ:
*Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion.*Brown University, Providence, RI, USA; 2006.Google Scholar - Bradski GR, Pisarevsky V: Intel's computer vision library: applications in calibration, stereo, segmentation, tracking, gesture, face and object recognition.
*Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '00), 2000, Hilton Head Island, SC, USA*2: 796-797.Google Scholar - Boulic R, Varona J, Unzueta L, Peinado M, Suescun A, Perales F: Evaluation of on-line analytic and numeric inverse kinematics approaches driven by partial vision input.
*Virtual Reality*2006, 10(1):48-61. 10.1007/s10055-006-0024-8View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.