# Visual sensor fusion for active security in robotic industrial environments

- Sandra Robla
^{1}Email author, - Jose R Llata
^{1}, - Carlos Torre-Ferrero
^{1}, - Esther G Sarabia
^{1}, - Victor Becerra
^{2}and - Juan Perez-Oria
^{1}

**2014**:88

https://doi.org/10.1186/1687-6180-2014-88

© Robla et al.; licensee Springer. 2014

**Received: **16 December 2013

**Accepted: **23 May 2014

**Published: **12 June 2014

## Abstract

This work presents a method of information fusion involving data captured by both a standard charge-coupled device (CCD) camera and a time-of-flight (ToF) camera to be used in the detection of the proximity between a manipulator robot and a human. Both cameras are assumed to be located above the work area of an industrial robot. The fusion of colour images and time-of-flight information makes it possible to know the 3D localization of objects with respect to a world coordinate system. At the same time, this allows to know their colour information. Considering that ToF information given by the range camera contains innacuracies including distance error, border error, and pixel saturation, some corrections over the ToF information are proposed and developed to improve the results. The proposed fusion method uses the calibration parameters of both cameras to reproject 3D ToF points, expressed in a common coordinate system for both cameras and a robot arm, in 2D colour images. In addition to this, using the 3D information, the motion detection in a robot industrial environment is achieved, and the fusion of information is applied to the foreground objects previously detected. This combination of information results in a matrix that links colour and 3D information, giving the possibility of characterising the object by its colour in addition to its 3D localisation. Further development of these methods will make it possible to identify objects and their position in the real world and to use this information to prevent possible collisions between the robot and such objects.

## Keywords

## 1 Introduction

Since the 1960s, industrial robots have been used in the manufacturing industry and they have substituted humans in various repetitive, dangerous, hostile tasks. A consequence associated with the incorporation of robots in industry is the emergence of new risks of accidents for workers. The normatives which incorporate, among many other aspects, these robot-related risks, include the international standard ISO 10218, the American ANSI/RIA R15.06, the European EN 775, and national normatives such as the Spanish UNE-EN 755. To prevent accidents, the selection of a security system must be based on the analysis of these risks. Traditionally, these security systems separate the robot workspace from the human one. One example of this requirement was reflected in the Spanish normative UNE-EN 755:1996 [1]. It is established that sensor systems have to be incorporated to prevent the entrance of humans in a hazardous area in case the operating state of the robotic system implies dangers to the human. According to traditional normatives, maintenance, repair, or programming personnel can only be inside the robot workspace if the industrial robot is not in automatic mode.

However, in recent years, due in part to the flexible design of products, the optimization of production methods, and the introduction of new technologies, the tasks performed by industrial robots are no longer restricted to the transfer of objects, or other repetitive tasks. Instead, there is an increasing number of tasks in which humans and robots combine their skills in collaborative work.

To enable collaboration between human and robot, safety measures that establish a rigid separation between human and robot workspaces have to be removed. Instead, the introduction of other types of security systems is required so that collisions can be avoided by detecting obstacles as well as their dynamic characteristics, and harm to the human can be mitigated in case of an unexpected impact. For this reason, research in this field is directed towards changing the way a human interacts with a robot so that the trend is that both human and robot can share the same workspace at the same time. This change in the working relationship is reflected in the updates carried out from the year 2006 in the international normatives ISO10218 [2] and guidelines for the implementation of these regulations, such as [3]. In these guidelines, new concepts are presented, such as collaborative robots, collaborative operations, and spaces of collaborative work.

Taking into account that security is a fundamental aspect in the design of robotic manufacturing systems, the development of systems and security strategies that allow safe collaborative work between human and robot is essential. The aim of this paper is to contribute at the initial stage of the design of a system for collision prevention between a human and a robot manipulator sharing a workspace at the same time. A method for processing of information acquired from two different types of vision sensors located above an industrial robot environment is proposed. The method, which is mainly focused on information captured from a time-of-flight camera, allows the fusion of both colour and 3D information, as an initial step towards the development of an active security system for application in an industrial robotics environment. This information fusion generates a colour and 3D information matrix which allows simultaneously estimating colour characteristics from an object and its three-dimensional position in a world coordinate frame. At a later step, the use of this combination of information will allow to associate a security volume around each characterised object, in order to prevent possible collisions between industrial robot and human.

## 2 Related work on shared human robot workspaces

Security systems in industrial robotic environments can be classified as passive and active. Passive security systems are hazard warning elements which do not alter the robot behaviour. These systems are audible or visible signals such as alarms or lights or systems that prevent the inadvertent access to a restricted area. Active security systems in industrial robotic environments can be defined as the methods used to prevent the intrusion of humans to the robot workspace when it is in automatic mode. The difference with the passive methods is that active methods can modify the robot behaviour. Historically, devices such as movement, proximity, force, acceleration, or light sensors are used to detect human access to the robot workspace and to stop the execution of the robot task. However, as it has been discussed previously, research in this field is moving towards allowing humans and robots to share workspaces.

### 2.1 Collision avoidance

A further way to enhance safety in shared human/robot work/workspaces is to implement collision avoidance systems. Robots have been provided with sensors capturing local information. Ultrasonic sensors [4], capacitive sensors [5, 6], and laser scanner systems [7] have been tried to avoid collisions. However, the information provided by these sensors does not cover the whole scene, and so these systems can only provide a limited contribution to enhance safety in human-robot collaboration tasks [8]. Moreover, geometric representations of human and robotic manipulators have been used to obtain a spatial representation in human-robot collaboration tasks. Numerical algorithms are then used to compute the minimum distance between human and robot and to search for collision-free paths [9–12]. Methods have been proposed involving the combination of different types of devices to help avoid collisions. This idea has been applied into a cell production line for component exchange between human and robot in [13], where the safety module uses commands from light curtain sensors, joint angle sensors, and a control panel to prevent the collision with the human when exchanging an object. The discussion concentrates below in artificial vision systems, range systems, and their combination.

#### 2.1.1 Artificial vision systems

Artificial vision systems have also been used to prevent human-robot collisions. This information can be used on its own or in the combination with information from of others types of devices. In order to achieve safe human-robot collaboration, [14] describes a safety system made up of two modules. One module is based on a camera and computer vision techniques to obtain the human location. The other module, which is based on accelerometers and joint position information, is used to prevent an unexpected robot motion due to a failure of robot hardware or software. Research work such [15] investigates safety strategies for human-robot coexistence and cooperation. The use of a combination of visual information from two cameras and information from a force/torque sensor is proposed. In order to perform collision tests, other work has used visual information acquired by cameras [16, 17] to generate a 3D environment. Also, visual information is used to separate humans and other dynamic unknown objects from the background [18] or to alter the behaviour of the robot [19]. In [20–22], visual information has been used to develop safety strategies based on fuzzy logic, probabilistic methods, or the calculation of warning index, respectively.

#### 2.1.2 Range systems

The depth map of a scene can be obtained by using depth sensors such as laser range finders and stereo camera systems. The results of using a laser time-of-flight (ToF) sensor are presented in [23] and [24] with the latter using several depth sensors in combination with presence sensors. Recently, a new type of camera has become available. These cameras, denominated as range-imaging cameras, 3D ToF cameras, or PMD cameras, capture information providing a 3D point cloud, among other information. They are starting to be used in active security systems for robotic industrial environments, among other applications. An example is a single framework for human-robot cooperation whose purpose is to achieve a scene reconstruction of a robotic environment by markerless kinematic estimation. For example, [8, 25] use the information delivered by a 3D ToF camera mounted to the top of a robotic cell. This information is employed with the purpose of extracting robust features from the scene, which are the inputs to a module that estimates risks and controls the robot. In [26], the fusion of 3D information obtained from several range imaging cameras and the application of the visual hull technique are used to estimate the presence of obstacles within the area of interest. The configurations of a robot model and its future trajectory along with information on the detected obstacles are used to check for possible collisions.

#### 2.1.3 Combination of vision and range systems

This technique is based on the combination of 3D information from range cameras and 2D information from standard charge-coupled device (CCD) cameras. Although this technique is being used in other applications, such as hand following [27, 28] or mixed reality applications [29–31], not much work has been reported using this technique in the area of active security in robotic environments. In [32], an analysis of human safety in cooperation with a robot arm is performed. This analysis is based on information acquired by a 3D ToF camera and a 2D/3D Multicam. This 2D/3D Multicam consists of a monocular hybrid vision system which fuses range data from a PMD ToF sensor, with 2D images from a conventional CMOS grey scale sensor. The proposed method establishes that while the 3D ToF camera monitors the whole area, any motion in the shared zones is analysed using the 2D/3D information from the Multicam. In [33], a general approach is introduced for surveillance of robotic environments using depth images from standard colour cameras or depth cameras. The fusion of data from CCD colour cameras or from ToF cameras is performed to obtain the object hull and its distance with respect to the known geometry of an industrial robot. They also present a comparison between distance information from colour and ToF cameras and a comparison between a ToF camera and ToF information fusion. One of the conclusions of this work is that the fusion of information from several ToF cameras provides better resolution and less noise than the information obtained from a single camera. Finally, [34] describes a hybrid system based on a ToF camera and a stereo camera pair which is proposed to be applied in human-robot collaboration task. Stereo information is used in unreliable ToF data points to generate a depth map which is fused with the depth map from the ToF camera. Colour feature is not taken into account. On the other hand, nearly a decade after that ToF cameras emerged into the industrial trade [35], a new type of 3D sensors (RGB-D sensors), which are fitted with a RGB camera and a 3D depth sensor, were launched for non-commercial use [36]. The RGB-D sensor has several advantages over ToF cameras such as higher resolution, lower price, and the availability of depth and colour information. Hence, its study and application have been objective of research work such as [37] that presents a review of Kinect-based computer vision algorithms and applications. Several topics are presented like preprocessing tasks including a review of Kinect recalibration techniques, object tracking and recognition, and human activity analysis. These authors propose in [38] an adaptive learning methodology to extract spatio-temporal features, simultaneously fusing the RGB and depth information. In addition to this, a review of several solutions to carry out information fusion of RGB-D data is presented. Also, a website for downloading a dataset made of RGB and depth information for hand gesture recognition is introduced. Related to active security system in industrial robotic environments, the use of the Kinect sensor is being incorporated as it is shown in [39] where a real-time collision avoidance approach based on this sensor is presented.

## 3 Method for the fusion of colour and 3D information

The presented method for fusion of acquired information from a ToF camera and a colour camera has a different standpoint from the ones proposed in the consulted papers. According to papers that are not related to active security in robotic industrial environments such as [27], the spatial transformation is performed establishing the ToF camera coordinate system as the reference coordinate system. Therefore, if an object position in a world coordinate system wanted to be known, another calibration should be done to establish the rotation matrix and translation vector that connected both coordinate systems. Nevertheless, in the present paper, this aspect has been considered. Therefore, it was needed to define a common coordinate system for an industrial robot, a colour camera, and a ToF camera, in order to know at the same time 3D object location at the robot arm workspace and its colour feature. According to papers focusing on mixed reality applications as paper [29], the used setup includes a CCD firewire camera, a ToF camera, and a fisheye camera. After performing the calibration and establishing relative transformations between the different cameras, a background model, whose use eliminates the need for chroma keying and also supports planning and alignment of virtual content, was generated allowing to segment the actor from the scene. Paper [31] presents a survey of ToF basic measurement principles of ToF cameras including, among other issues, camera calibration, range image preprocessing, and sensor fusion. Several studies which study different combinations of high-resolution cameras and lower-resolution ToF cameras are mentioned.

In relation to the paper focused on active security, the most closely related to our work is [32]. Though a common world coordinate system for cameras and robot is also used, the method seem to present certain differences because a spatial transform function is identified in order to map the image coordinates of the 2D sensor to the corresponding coordinates of the PMD sensor. Moreover, saturated pixels errors do not seem to have been considered. Here, the presented work shows a different standpoint since the obtained parameters from the cameras calibration are used to transform 3D point cloud given in the ToF camera coordinate system to the world coordinate system, and finally, the obtained internal and external parameters are used to achieve the reprojection of corrected 3D points (distance error, saturated pixels, and jump edge effect) into colour images.

With the aim of allowing any researcher to implement the proposed method of fusion of information exactly like it that has been carried out at the present work, this paper gives a mathematical detailed description of the steps involved in the proposed method.

In what follows, it is assumed that a 3D ToF camera and a colour camera are fixed and placed over the workspace of a robot arm and that the fields of view of both cameras are overlapped. Also, it is assumed that external temperature conditions are constant, and that the integration time parameter of the 3D ToF camera is automatically updated at each data acquisition. Image and 3D data from the scene is captured and processed as described in the next sub-sections. Assume that the ToF camera has a resolution *n*_{
x
}×*n*_{
y
} and that the CCD camera has a resolution ${\widehat{n}}_{x}\times {\widehat{n}}_{y}$.

In what follows, vectors and matrices are denoted by Roman bold characters (e.g. **x**). The *j* th element of a vector **x** is denoted as *x*_{
j
}, element (*i*,*k*) of a matrix **A** is denoted as *A*_{i,k}, a super-index in parenthesis (*j*) denotes a node within a range of distances, a sub-index within square brackets such as [i] denotes an element of a set.

### 3.1 Reduction TOF range camera errors

The reduction of range camera errors is a fundamental step to achieve an acceptable fusion of colour and 3D information. The existence of these errors cause the fused information to have issues that range from minor, such as border inaccuracy, to serious such as the loss of information in saturated pixels coordinates.

#### 3.1.1 Distance error reduction

As it is well documented that ToF cameras suffer from a non-linear distance error, several experiments have been developed in order to model and correct the distance error (or circular error) [35, 40–43]. With the purpose of decreasing the influence of this error in distance measurements, a procedure is described below to correct the ToF distance values based on a study of the the behaviour of the camera. This study requires a ToF camera to be positioned parallel to the floor, and a flat panel of light colour and low reflectance, to be mounted on a robot arm. The panel position is also parallel to the floor. The robot arm allows to displace the panel along a distance range and ToF data at different distances can be captured.

- 1.
*Image capture*. Since distance measurements are influenced by the camera internal temperature, a minimum time period is necessary to obtain stable measurements [43]. After the camera warms up, ToF information is captured at each of the*P*different nodes in which the distance range*D*was divided. Each captured data is defined by an amplitude matrix**A**of dimensions*n*_{ x }×*n*_{ y }, and 3D information made up of three coordinates matrices**X**,**Y**, and**Z**, each one of dimensions*n*_{ x }×*n*_{ y }. In order to generate a model of distance error, a set ${{\mathcal{Z}}_{\mathcal{T}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}=\left\{{\mathbf{Z}}_{T\left[1\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)},{\mathbf{Z}}_{T\left[2\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)},\dots ,{\mathbf{Z}}_{T\left[N\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}\right\}$ of distance information in the*z*axis is formed by capturing*N*images at each node*j*, with*j*=1,…,*P*. Similarly, sets of distance information for training are defined for the*x*and*y*axes, which are denoted as ${{\mathcal{X}}_{\mathcal{T}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}$ and ${{\mathcal{Y}}_{\mathcal{T}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}$, respectively. In order to validate the model so obtained, a set ${{\mathcal{Z}}_{\mathcal{V}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}=\left\{{\mathbf{Z}}_{V\left[1\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)},{\mathbf{Z}}_{V\left[2\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)},\dots {\mathbf{Z}}_{V\left[M\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}\right\}$ of distance information is also formed by capturing*M*additional images at each node*j*, with*j*=1,…,*P*. Similarly, sets of distance information for validation are defined for the*x*and*y*axes, which are denoted as ${{\mathcal{X}}_{\mathcal{V}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}$ and ${{\mathcal{Y}}_{\mathcal{V}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}$, respectively. In this article, the sets of information ${\mathcal{Z}}_{\mathcal{T}}$ and ${\mathcal{Z}}_{\mathcal{V}}$ are also called*ToF**distance**images*and are defined as ${\mathcal{Z}}_{\mathcal{T}}=\left\{{{\mathcal{Z}}_{\mathcal{T}}}^{\left(1\right)},\dots ,{{\mathcal{Z}}_{\mathcal{T}}}^{\left(P\right)}\right\}$, and ${\mathcal{Z}}_{\mathcal{V}}=\left\{{{\mathcal{Z}}_{\mathcal{V}}}^{\left(1\right)},\dots ,{{\mathcal{Z}}_{\mathcal{V}}}^{\left(P\right)}\right\}$. - 2.
*Angle correction*. Correction angles are applied to the ToF information sets for each axis*x*,*y*, and*z*, with the aim of compensating for any 2D angular deviation between the the (*x*,*y*) plane of the range camera and the plane defined by the floor. This 2D angular deviation is denoted by the angles*θ*_{ x }and*θ*_{ y }. This correction allows obtaining parameter values as if both camera and panel were perfectly parallel.

*x*axis distance image

**X**

_{ T }of dimensions

*n*

_{ x }×

*n*

_{ y }, define its sub-matrix $\widehat{\mathbf{x}}$ of dimensions

*n*

_{1}×

*n*

_{2}, where

*n*

_{1}<int(

*n*

_{ x }/2) and

*n*

_{2}<int(

*n*

_{ y }/2), as a matrix formed such that its top left element ${\widehat{x}}_{1,1}$ corresponds to element ${{X}_{T}}_{{i}_{c},{j}_{c}}$. Index

*i*

_{ c }is chosen as int(

*n*

_{ x }/2), and index

*j*

_{ c }is chosen as int(

*n*

_{ y }/2). Similarly, sub-matrices $\widehat{\mathbf{y}}$ and $\widehat{\mathbf{z}}$ are defined for axes

*y*and

*z*, respectively. Define $\stackrel{\u0304}{\mathbf{x}}$, $\stackrel{\u0304}{\mathbf{y}}$, and $\stackrel{\u0304}{\mathbf{z}}$ as the column-wise vectorised forms of sub-matrices $\widehat{\mathbf{x}}$, $\widehat{\mathbf{y}}$, $\widehat{\mathbf{z}}$, each with dimension

*n*×1, where

*n*=

*n*

_{1}

*n*

_{2}, with

*n*as the number of pixels from the selected area. This central region is taken from each ToF distance image to estimate and correct the 2D angle inclination between the panel and the ToF camera. Hence, for each image region, 3D points are modified using the rotation matrices

*R*

_{ x }and

*R*

_{ y }:

**G**has dimensions 3×

*n*. The transformed image region for the

*z*coordinate is obtained from the rows of

**G**:

and in this way, a vector ${\stackrel{\u0304}{\mathbf{z}}}^{\prime}$ of dimensions *n*×1 is defined.

*y*axis such that

*y*coordinate is obtained from the rows of

**H**:

*n*×1. Since the above rotation causes a displacement of the 3D points along the

*y*axis, the $\stackrel{\u0304}{\mathbf{y}}$ vector is used to represent ToF information after angle correction. Then, in this way, the 3D ToF vectors after angle correction are $\stackrel{\u0304}{\mathbf{x}},\stackrel{\u0304}{\mathbf{y}},{\stackrel{\u0304}{\mathbf{z}}}^{\mathrm{\prime \prime}}$, each one of dimensions

*n*×1.

- 3.
If the pixel position is not considered, then:

- (a)
*Discrepancy curve calculation stage*. In order to test the angle correction effect over the distance error, the same procedure is applied using data before and after angle correction. However, the method is described using data after angle correction. The selected area is used to calculate several parameters including the mean distance value, discrepancy distance value, and mean squared error (MSE). Define a set of distances after angle correction ${\stackrel{\u0304}{\mathcal{Z}}}^{\mathrm{\prime \prime}\left(\phantom{\rule{0.3em}{0ex}}j\right)}=\left\{{\stackrel{\u0304}{\mathbf{z}}}_{\left[1\right]}^{\mathrm{\prime \prime}\left(\phantom{\rule{0.3em}{0ex}}j\right)},{\stackrel{\u0304}{\mathbf{z}}}_{\left[2\right]}^{\mathrm{\prime \prime}\left(\phantom{\rule{0.3em}{0ex}}j\right)},\dots ,{\stackrel{\u0304}{\mathbf{z}}}_{\left[N\right]}^{\mathrm{\prime \prime}\left(\phantom{\rule{0.3em}{0ex}}j\right)}\right\}$ at each node*j*, with*j*=1…*P*. The mean distance ToF over the selected area in all ToF distance images, ${\stackrel{\u0304}{\mathbf{Z}}}_{j}$, at each node*j*, is calculated by means of:${\stackrel{\u0304}{Z}}_{j}=\frac{1}{\mathit{\text{nN}}}\sum _{i=1}^{N}\sum _{k=1}^{n}{\stackrel{\u0304}{z}}_{\left[i\right]k}^{\mathrm{\prime \prime}\left(\phantom{\rule{0.3em}{0ex}}j\right)}\phantom{\rule{1em}{0ex}}$(6)

where the resulting $\stackrel{\u0304}{\mathbf{Z}}$ is a vector with dimensions *P*×1.

*L*

_{ j }as a distance value obtained by a laser distance meter at each node

*j*(henceforth this value is treated as ground truth), and a vector

**L**=[

*L*

_{1},…,

*L*

_{ P }]

^{ T }, with dimensions

*P*×1. Then, the discrepancy distance vector,

*δ*

_{ d }, is calculated as the difference between the mean distance from the ToF camera after angle correction, $\stackrel{\u0304}{\mathbf{z}}$, and the ground truth vector

**L**:

*s*that passes through all the points $\left(\stackrel{\u0304}{\mathbf{Z},}{\delta}_{\mathbf{d}}\right)$ and at each interval $[{\stackrel{\u0304}{Z}}_{j},{\stackrel{\u0304}{Z}}_{j+1}]$ and is expressed as a polynomial.

*j*=1,…,

*P*−1. For each sub-interval, the coefficients

*a*

_{0},

*a*

_{1},

*a*

_{2},

*a*

_{3}are calculated so that the curve passes through the points $({\stackrel{\u0304}{Z}}_{j},{\delta}_{{d}_{j}})$ and $({\stackrel{\u0304}{Z}}_{j+1},{\delta}_{{d}_{j}+1})$[44]. The resulting spline, henceforth called the

*discrepancy*

*curve*, allows to estimate the discrepancy correction value, given a ToF distance.

- (b)
*Discrepancy correction*. In order to reduce the errors in the distance estimates obtained from the ToF information, the set of ToF distance images for validation ${\mathcal{Z}}_{\mathcal{V}}$ is used to validate the discrepancy curve. To this end, a vector of validation ToF distance images after angle correction ${\stackrel{\u0304}{\mathbf{z}}}_{v}^{\mathrm{\prime \prime}}$ (dimension*n*×1) is defined and evaluated on the discrepancy curve to obtain the vector of correction values**C**(dimension*n*×1). Then, the corrected distance value for a distance image after its angle correction is calculated as follows:$\stackrel{\circ}{\mathbf{z}}={\stackrel{\u0304}{\mathbf{z}}}_{v}^{\mathrm{\prime \prime}}-\mathbf{C}\phantom{\rule{1em}{0ex}}$(9)

*j*, with

*j*=1…

*P*, the mean value after discrepancy correction for the

*M*ToF distance images obtained at each

*j*node is calculated as follows:

with *j*=1…*P* and where the resulting $\stackrel{\circ}{\mathbf{Z}}$ is a vector with dimensions *P*×1.

*j*a vector with the corresponding laser distance meter values

**L**

^{′}

^{(j)}=[

*L*

^{(j)},…,

*L*

^{(j)}]

^{ T }with dimension

*n*×1 (treated here as ground truth), then the mean squared error at each pixel

*k*and for node

*j*can be calculated as

*N*

^{′}is the number of ToF distance images used,

**Z**is a vector of ToF distance values that can be substituted by the angle corrected vector ${\stackrel{\u0304}{\mathbf{z}}}^{\mathrm{\prime \prime}}$ of each distance image, or by the discrepancy corrected vector $\stackrel{\circ}{\mathbf{Z}}$ of each distance image, each one with dimension

*n*×1, and with

*j*=1,…,

*P*. The set of MSE${\phantom{\rule{0.1em}{0ex}}}_{k}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}$ values for

*k*=1,…,

*n*gives an indication of the planar distribution of the distance error for a given node

*j*. Then, for a given node

*j*, it is possible to average the mean square errors to obtain an indication of the error depending on the node position

- 4.
If the position of each pixel is taken into account, then:

- (a)
*Discrepancy curves calculation stage*. Using the*N*angle corrected ToF distance images represented by ${\stackrel{\u0304}{\mathbf{z}}}^{\mathrm{\prime \prime}}$, a discrepancy curve is calculated for each pixel at each distance node. At this stage, using*N*images at each node*j*, the mean value of each pixel*k*, where*k*=1,…,*n*, is calculated as follows:${\stackrel{\u0304}{V}}_{k,j}=\frac{1}{N}\sum _{i=1}^{N}{\stackrel{\u0304}{z}}_{\left[i\right]k}^{\mathrm{\prime \prime}\left(\phantom{\rule{0.3em}{0ex}}j\right)}\phantom{\rule{1em}{0ex}}$(13)

where the resulting $\stackrel{\u0304}{\mathbf{V}}$, whose elements are the values ${\stackrel{\u0304}{V}}_{k,j}$, is a matrix with dimensions *n*×*P*.

*n*×

*P*which is obtained by replicating

*n*times the laser distances vector

**L**

^{ T }as follows:

*δ*

_{ v }for all the

*j*nodes is calculated for each pixel

*k*=1,…,

*n*as the difference between the mean distance from the ToF camera after angle correction, $\stackrel{\u0304}{\mathbf{V}}$, and the ground truth distance vector ${\mathbf{L}}^{\mathrm{\prime \prime}}$ obtained using a laser distance meter:

with *δ*_{
v
} of dimension *n*×*P*.

*n*correction values to be applied to any new ToF distances images, a cubic spline is calculated to fit this discrepancy information along the distance range for each pixel. The cubic spline is modelled at each pixel

*k*using Equation 8 and the data points $(\stackrel{\u0304}{\mathbf{V}},{\delta}_{v})$.

- (b)
*Correction using a discrepancy curve at each pixel*. In order to reduce the errors in the ToF distances images, the set of ToF distances images ${\mathcal{Z}}_{\mathcal{V}}$ is used to validate each discrepancy curve at each pixel. To this end, each pixel*k*of the validation vector after angle corrections ${\stackrel{\u0304}{\mathbf{z}}}_{v}^{\mathrm{\prime \prime}}$ (dimension*n*×1) is evaluated on its discrepancy curve to obtain the vector of correction values**C**_{ v }(dimension*n*×1). Then, the corrected distance vector $\stackrel{\circ}{\mathbf{v}}$ (dimension*n*×1) is obtained using the expression$\stackrel{\circ}{\mathbf{v}}={\stackrel{\u0304}{\mathbf{z}}}_{v}^{\mathrm{\prime \prime}}-{\mathbf{C}}_{v}\phantom{\rule{1em}{0ex}}$(16)

*k*for each

*j*node is calculated as follows:

with *k*=1,…,*n*, *j*=1,…,*P*, and where the resulting $\stackrel{\circ}{\mathbf{v}}$ is a matrix with elements ${\stackrel{\circ}{V}}_{k,j}$ and dimensions *n*×*P* of mean ToF distances values at each pixel for each node.

The mean squared error is obtained by means of Equation 11, where **Z** is replaced by the corrected values $\stackrel{\circ}{\mathbf{v}}$.

A comparison of the MSE values for discrepancy corrected and non-corrected measurements gives a measure of improvement in accuracy due to the discrepancy correction. If no such improvement is detected, then it is recommended to revise the experimental conditions as this may indicate the existence of problems with the experiment.

#### 3.1.2 Correcting the values of saturated pixels

Information from range cameras can be affected by pixel saturation, which is caused by an excessive reflectance of light over objects. Though its effect can be reduced by an automatic updated of the integration time parameter of the ToF camera [31], in some circumstances like the presence of metal or reflecting paints, this tool is not enough.

- 1.
*Looking for saturated pixels*. According to [45], pixel saturation occurs when the amplitude values are greater than a given threshold value*ζ*, which depends on the camera being employed. Hence, the amplitude image is searched for values greater or equal than this value in order to generate a saturation binary mask**M**with ones at the positions of the saturated pixels and zeros elsewhere. To be able to perform the correction on pixels located at the edges of the image, the amplitude and 3D information matrices are augmented by replicating rows and columns located at the edges of the matrix. Define*p*as the number of rows and columns of**A**to be replicated. Define the*p*upper rows of**A**as*B*_{i,j}=*A*_{i,j}, such that**B**is of dimension*p*×*n*_{ y }, where*i*=1,…,*p*and*j*=1,…,*n*_{ y }, and the*p*lower rows as ${B}_{i,j}^{\prime}={A}_{{n}_{x}-i+1,j}$, such that**B**^{′}is of dimension*p*×*n*_{ y }, where*i*=1,…,*p*, and*j*=1,…,*n*_{ y }. Define the intermediate matrix $\widehat{\mathbf{A}}$ as follows:$\widehat{\mathbf{A}}=\left[\begin{array}{c}\mathbf{B}\\ \mathbf{A}\\ {B}^{\prime}\end{array}\right]$(18)

*p*+

*n*

_{ x }×

*n*

_{ y }. Then, define the left

*p*columns of

**A**as ${B}_{i,j}^{\mathrm{\prime \prime}}={A}_{i,j}$, such that ${\mathbf{B}}^{\mathrm{\prime \prime}}$ is of dimension 2

*p*+

*n*

_{ x }×

*p*, where

*i*=1,…,2

*p*+

*n*

_{ x }and

*j*=1,…,

*p*and the

*p*right columns as ${B}_{i,j}^{\mathrm{\prime \prime \prime}}={A}_{i,{n}_{y}-j+1}$ such that ${\mathbf{B}}^{\mathrm{\prime \prime \prime}}$ is of dimension 2

*p*+

*n*

_{ x }×

*p*, where

*i*=1,…,2

*p*+

*n*

_{ x }and

*j*=1,…,

*p*. Then, the augmented amplitude matrix $\stackrel{~}{\mathbf{A}}$ of dimensions 2

*p*+

*n*

_{ x }×2

*p*+

*n*

_{ y }is given by:

*p*+

*n*

_{ x }×2

*p*+

*n*

_{ y }, is defined by

where *i*=1,…,*n*_{
x
}+2*p*, *j*=1,…,*n*_{
y
}+2*p*.

- 2.
*Correction of saturated pixels*. In order to replace an incorrect value with the average of its neighbours, the saturation binary mask is used to find the coordinates of saturated values in the amplitude and 3D matrices and to calculate the mean value of surrounding pixels. Saturated values are not taken into account in this calculation. Define a window-mask ${\stackrel{\circ}{M}}_{i,j}={\stackrel{~}{M}}_{r-p+i-1,c-p+j-1}$, with*i*=1,…2*p*+1 and*j*=1,…,2*p*+1, of dimensions 2*p*+1×2*p*+1, whose center is each saturated pixel with position $(r,c)\in \mathcal{Q}$. In order to calculate a new pixel value to replace a saturated pixel value, define a window of amplitude values ${\stackrel{\circ}{A}}_{i,j}={\xc3}_{r-p+i-1,c-p+j-1}$, with*i*=1,…2*p*+1 and*j*=1,…,2*p*+1, of dimensions 2*p*+1×2*p*+1, whose center corresponds to each saturated pixel with position (*p*+1,*p*+1). The new value ${\xc3}_{r,c}$ for each saturated pixel $(r,c)\in \mathcal{Q}$ is calculated as$\begin{array}{c}{\xc3}_{r,c}=\frac{1}{(2p+1)(2p+1)-1}\left(\sum _{k=1}^{2p+1}\sum _{\text{\u0142}=1}^{2p+1}{\stackrel{\circ}{A}}_{k,l}-{\stackrel{\circ}{A}}_{p+1,p+1}\right)\end{array}$(22)

Define (**X**,**Y**) as the initial ToF data, $\stackrel{\circ}{z}$ as the distance TOF data after discrepancy correction, and using the index set
of amplitude saturated values, a similar procedure to correct the corresponding values of these matrices is applied, obtaining matrices $(\stackrel{~}{\mathbf{X}},\stackrel{~}{\mathbf{Y}},\stackrel{~}{\mathbf{Z}})$, as these values are affected by the amplitude saturation. Once saturated pixels are corrected, all matrices are resized to their initial dimensions by removing the rows and columns previously added, which results in matrices **X**^{′},**Y**^{′},**Z**^{′}, and **A**^{′}.

#### 3.1.3 Jump edge reduction

*jump edge*. This error produces spurious pixels which are 3D inaccurate measures of the real scene. In order to reduce this effect, the use of a median filter followed by a jump edge filter based on a local neighbourhood is proposed in [46]. Other solutions which implement non-local means filter or edge-directed re-sampling techniques are enumerated in paper [31]. In the present work, the use of 2D techniques applied to 3D points is proposed to prevent border inaccuracy in fused information. Traditionally, the technique of morphological gradient is used in grey scale images to emphasize transitions of grey levels [47, 48]. In this work, only distance values from 3D data are used, generating a distance image. With the objective of finding pixels suffering from this effect, the morphological gradient is calculated, using the following expression [48]:

where **g** is of dimension *n*_{
x
}×*n*_{
y
}, **f** is a ToF distance matrix of same dimension as **g**, **S** is a 3×3 generalised dilation or erosion mask, and ⊕ and ⊗ are dilation and erosion operations, respectively.

**g**is transformed into a new distance image

**G**with values ranging from 0 to 255, by means of the following transformation:

**G**is calculated and then smoothed by means of a Butterworth filter. Finally, a threshold value

*η*is defined by searching along the smoothed histogram for the first minimum to the right of the first maximum. A new distance matrix

**f**

^{′}is generated by forcing to zero spurious pixels which are found and keeping the same distance values for the remaining pixels:

When performing the fusion of ToF and colour information, jump edge reduction is carried out after scaling up the ToF information, as discussed below.

### 3.2 Colour and 3D information fusion

Information fusion from a standard CCD camera and a ToF camera allows the simultaneous use of 3D and colour information. This can be achieved by means of the reprojection of 3D ToF points into a colour image. In an active security system, moving objects, such as robots and humans, have to be detected to prevent possible collisions between them. To obtain information about these objects and develop the algorithms that make it possible to avoid collisions, the foreground detection is carried out in such way that the fused information is obtained only through those pixels classified previously as foreground pixels. The foreground object detection in a scene is carried out using 2D techniques over 3D ToF points, and subsequently, colour and 3D information from foreground objects is fused.

#### 3.2.1 3D information analysis for detecting foreground objects

*t*ToF distance images after discrepancy and pixel saturation correction captured in a time period , such that ${\mathcal{Z}}^{\prime}=\{{\mathbf{Z}}_{1}^{\prime},{\mathbf{Z}}_{2}^{\prime},\dots ,{\mathbf{Z}}_{t}^{\prime}\}$, then, the background reference image is calculated as

*n*is the number of pixels in each ToF distance image. With the aim of detecting pixels that show motion, the difference image

**Z**

*d*′ between the reference and a current image

**Z**

*c*′ is calculated as:

where |·| indicates an element-wise absolute value operation.

**Z**

*d*′, exceeds a threshold value,

*T*

_{ h }, which results in a binary image

**Z**

*b*′. In order to automatically determine

*T*

_{ h }, the distance matrix

**Z**

*d*′ is processed as if it was 2D information by means of Equation 24, where

**g**is replaced by

**Z**

*b*′, resulting in a grey scale image

**G**

^{′}. Then, the calculation of the smoothed histogram of

**G**

^{′}and the search for threshold value are carried out in a similar way as presented in the ‘Jump edge reduction’ section. The binarisation process to detect pixels that show motion is given by

**Z**

^{′}to set the maximum value to the coordinate of 3D points whose coordinates in the binary image are considered as background (0 value) and to leave as real

*Z*values those 3D points whose coordinates in the binary image are considered as foreground (1 value), then a new ToF distance matrix ${\mathbf{Z}}^{\mathrm{\prime \prime}}$ is obtained. Figure 3 illustrates this method for the background and foreground 3D value assignment and selection.

#### 3.2.2 Reprojection of 3D ToF information into a colour image

With the aim of giving additional colour information to the 3D foreground points previously detected, the reprojection of these points into a colour image was carried out. Using colour and amplitude images, both cameras are calibrated with respect to the world coordinate frame. Since both cameras can be represented by the pinhole camera model [42, 52], a tool such as the *Camera Calibration Toolbox for Matlab*[53] can be used to extract internal and external parameters for both cameras. External parameters are used to transform 3D ToF information given in the camera coordinate system into the world coordinate system. On the other hand, internal an external parameters are used to reproject 3D information into colour images. Hence, based on calibration camera theory [48, 54, 55] and after the range camera error reduction, the reprojection process is applied over the corrected and transformed 3D points following the transformations described below.

**P**

_{ c }=[

*X*

_{ c },

*Y*

_{ c },

*Z*

_{ c }]

^{ T }is given by

where extrinsic parameters are expressed by the 3×3 rotation matrix **R** and by the 1×3 translation vector **T**.

Frequently, standard CCD colour cameras have a higher resolution than range cameras, so the reprojection of 3D points does not have a one-to-one equivalence. Hence, the ToF information is scaled up by bilinear interpolation. In addition to this, as only information of foreground 3D points will be extracted, the automatic thresholding process is applied to the 3D points **P**_{
c
} in order to remove those points classified as background, which results in a new 3D point cloud **P** *c*′=[*X* *c*′,*Y* *c*′,*Z* *c*′]^{
T
}.

*x*

_{ u },

*y*

_{ u }) is given by

where the intrinsic parameter *f* is the focal length in millimetres.

*x*

_{ d },

*y*

_{ d }) and without distortion (

*x*

_{ u },

*y*

_{ u }), considering the radial

*D*

^{(r)}, and tangential

*D*

^{(t)}distortions are defined by pinhole model as

where the intrinsic parameters *k*_{
u
} and *k*_{
v
} are the number of pixels per millimetre (horizontally and vertically, respectively), *s* is the skew factor whose value is usually zero in most cameras, and (*u*_{0},*v*_{0}) are the coordinates of the centre of projection.

*u*,

*v*) of the 3D foreground points, these values are adjusted into pixel values by rounding them to the nearest integer to the values obtained. Furthermore, as the captured area by both cameras (ToF and colour camera) is not exactly the same, pixels in non-common areas are eliminated. A diagram which illustrates the proposed method is shown in Figure 4.

## 4 Experiments

### 4.1 Experimental setting

In this article, a method for the fusion of colour and 3D information that is suitable for active security systems in industrial robotic environments is presented. To verify the proposed methods, a colour camera, AXIS 205, and a range camera, SR4000, have been located over the workspace of the robot arm FANUC ARC MATE 100iBe. The AXIS 205 Network Camera used has a resolution of 640 ×480 pixels and a pixel size of 5.08 ×3.81 mm. The SR4000 range camera has a resolution of 176 ×140 pixels and a pixel size of 40 ×40 *μ* m. This camera has a modulation frequency of 29/20/31 Mhz and a detection range from 0.1 to 5 m.

### 4.2 Camera calibration

This initial stage is intended to obtain extrinsic and intrinsic parameters from the standard CCD camera and ToF camera by means of a calibration process using a common reference frame.

### 4.3 Reduction of distance error

*x*and

*y*coordinates have been estimated and their effects have been corrected. Figure 5 shows the effect of the angle and displacement corrections in a 3D point cloud. Discrepancy curves which do not take into account pixel position have been calculated before and after angular correction. These curves are shown in Figure 6. It can be seen that the distance error is a function of the measured distance, and discrepancies values show a small improvement after the angle correction, as it was expected.

To take into account the effect of pixel position in the distance error, a discrepancy curve has been generated for each pixel. These curves are tested by using the set ${\mathcal{Z}}_{\mathcal{V}}$ as input, resulting in a correction value to be applied at each pixel. Figure 6 shows in green colour the discrepancy values after the correction with a cubic spline for each pixel.

### 4.4 Value correction of saturated pixels

### 4.5 3D analysis for detecting foreground objects and coordinate frame transformation

*Z*values. Figure 11b shows three-dimensional information resulting from subtracting the reference distance matrix from the real scene distance matrix, in which positive values indicate possible motion points. In order to take into account only foreground 3D points, an automatic thresholding process and the proposed method for the background and foreground 3D values assignment and selection have been applied using Equation 28. After that, the modified 3D points are shown in red in Figure 12, whereas the initial 3D points are shown in cyan colour. It can observed that in the modified 3D points, all background points have equal

*Z*values. However, as the points of interest are the foreground points, the background points are not taken into account; therefore, the final scene 3D representation is not affected by those equal

*Z*values. After coordinate frame transformation using Equation 30, and another automatic thresholding process to remove points classified as background, the result achieved in this example is shown in Figure 13, where the foreground object detected is represented in the world coordinate system.

### 4.6 Resolution increase

As the standard CCD camera employed provides a colour image which has higher resolution (480×640) than the 3D ToF information (176×144) provided by the range camera, the reprojection of 3D points does not have a one-to-one equivalence. Then, ToF matrices dimensions have been scaled up using a bilinear interpolation and reprojected to the colour image using Equations 30 to 32.

### 4.7 Jump edge reduction

**S**as follows

### 4.8 Rreprojection of 3D foreground points into colour images

## 5 Discussion

The aim of this work is to achieve the fusion of colour and 3D ToF information in order to apply it in active security tasks for industrial robotic environments, so given the coordinates of a 3D point, this fusion allows knowing colour information and 3D position in a common world coordinate system to both cameras and the robot arm, at the same time.

After obtaining intrinsic and extrinsic parameters by a calibration process, the proposed method of distance error reduction improves the distance measurement values, and the achieved effect in the scene can be observed after the application of the information fusion method. The correction curves obtained are consistent with curves reported by other authors such as [41] and also consistent with the use of cubic splines in order to approach and correct the distance error [42]. This consistence occurs despite some differences in experimental setup, such as a reduced range of measurement, different ToF camera models, target material, and camera configuration parameters.

As a second stage, saturation error correction must be performed given that in industrial environments, certain materials such as metal or reflecting paints are often present and can produce saturated pixels in the range camera information. The results obtained show that this method works well as it allows the correct visualization of the amplitude image, and more importantly, it corrects values of saturated pixels of 3D points. If these points have incorrect values, the reprojection stage would fail in these positions, as the 3D values would be reprojected as 0, and so its 2D information would be lost.

In order to detect foreground objects, the reference image technique applied to 3D data, after error corrections, has been used and presented as a simple and fast method which yields acceptable results. The 3D points of foreground objects are correctly identified and only a few false positives are detected which can be removed easily using 2D image morphological operations. Traditionally, this technique is used in colour and grey scale images, but illumination variations result in false foreground detection. The advantage of using ToF information is that it has a more stable behaviour in these illumination conditions. In addition, this technique has a short computational time, which is an important factor in order to be develop a suitable strategy for active security of robotic industrial environments. Then, using extrinsic parameters, the transformation of foreground 3D points from the camera to the world reference frame is carried out and scaled up by bilinear interpolation. The proposed method of jump edge reduction, applied to the resulting distance points, minimises false positives and false negatives around an object edge which arise in the pixel reprojection process as a consequence of the presence of spurious pixels that do not have correct 3D values. The achieved results can be considered acceptable since most spurious pixels are removed without changing the object shape, and therefore, a softer 3D point reprojection over objects edges in colour images is achieved.Finally, the reprojection of the resulting 3D points to the colour image is performed. Nevertheless, as can be seen in Figure 16, this reprojection is not perfect, since in spite of having applied distance error reduction, the position of the pixels in the image has not been taken into account, and a single correction value is applied which is a function of the measurement distance but not of the pixel position in the image.

## 6 Conclusions

This paper aims to contribute to the research area of active security systems in industrial robotic environments using ToF cameras.

Despite the fact that active security in robotic industrial environments is a wellstudied topic, few previously published methods have dealt with this subject using the combination ToF cameras and colour cameras. The paper describes the development of methods for the fusion of colour and 3D ToF information as an initial step in the design of a system for collision prevention between human and manipulator robot sharing a workspace at the same time. Furthermore, this work provides a detailed mathematical description of the steps involved in the proposed method, so that any researcher can implement it.

The presented method has a different standpoint from the methods previously proposed in the literature, since a common coordinate system is defined for a robot arm, colour camera and ToF camera. The obtained calibration parameters are used to transform the 3D points from the ToF camera coordinate system into the defined common coordinate system, which are reprojected in 2D colour images. This procedure has the advantage that it gives a single matrix made of colour and three-dimensional information; therefore, 3D coordinates of objects inside the robot arm’s workspace are known at the same time as their colour information. In addition to this, the proposed method for jump edge error detection, which is based on morphological gradient, allows the detection and reduction of jump edge error at points which are affected by this error. Also, in order to obtain a suitable fusion of information, a method for detection and reduction of saturated pixels, which is based on neighbour pixels information, has been proposed.

As future work, in order to improve the accuracy of fused information, a modification of the applied distance correction method is suggested. A preliminary study carried out with a small range of distances shows the influence of the pixel position in the distance measurements. Hence, a suggestion for future work is to modify the error correction so that it takes into account the position of the 3D point (measured distance and pixel location).

A possible application to prevent collisions between an industrial robot and a human would be to use colour information to characterise the detected foreground objects and to associate a security volume around each object.

## Declarations

### Acknowledgements

This work has been supported by the Ministry of Economy and Competitiveness of the Spanish Government (project DPI2012-36959).

## Authors’ Affiliations

## References

- UNE-EN 755:
*Robots manipuladores industriales. Seguridad. Ed. by AENOR*. Asociacion Española de Normalizacion y Certificacion, Madrid; 1996.Google Scholar - ISO 10218-1:
*Robots for industrial enviroments. Safety requirements. Part 1: Robots. Ed. by ISO*. International Organization for Standardization, Switzerland; 2006.Google Scholar - RIA TR R15.206-2008:
*Guidelines for implementing ANS/RIA/ISO 10218-1-2007. For industrial robots and robot systems. Safety requirements. Ed. by RIA*. Robotic Industries Association, USA; 2008.Google Scholar - Llata JR, Sarabia EG, Arce J, Oria JP: Fuzzy controller for obstacle avoidance in robotic manipulators using ultrasonic sensors. In
*Advanced Motion Control, 1998. AMC ‘98-Coimbra., 1998 5th International Workshop On*. IEEE; 1998:647-652.Google Scholar - Feddema JT, Novak JL: Whole arm obstacle avoidance for teleoperated robots.
*Robotics and Automation, 1994. Proceedings., 1994 IEEE International Conference On*1994, 3303-33094.View ArticleGoogle Scholar - Novak JL, Feddema IT: A capacitance-based proximity sensor for whole arm obstacle avoidance.
*Robotics and Automation, 1992. Proceedings., 1992 IEEE International Conference On*1992, 1307-13142.View ArticleGoogle Scholar - Yu Y, Gupta K: Sensor-based roadmaps for motion planning for articulated robots in unknown environments: some experiments with an eye-in-hand system. In
*Intelligent Robots and Systems, 1999. IROS ‘99. Proceedings 1999, IEEE/RSJ International Conference On*. Kyongju, Korea; 1999:1707-17143.Google Scholar - Puls S, Graf J, Wörn H:
*Cognitive Robotics in Industrial Environments. Human Machine Interaction - Getting Closer (InTech, 2012)*. . http://www.intechopen.com/books/human-machine-interaction-getting-closer/cognitive-robotics-in-industrial-environments - Martinez-Salvador B, del Pobil AP, Perez-Francisco M: A hierarchy of detail for fast collision detection. In
*Intelligent Robots and Systems, 2000. (IROS 2000). Proceedings 2000 IEEE/RSJ International Conference On*. Takamatsu, Japan; 2000:745-7501.Google Scholar - Balan L, Bone GM: Real-time 3d collision avoidance method for safe human and robot coexistence. In
*Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference On*. Beijing, China; 2006:276-282.Google Scholar - Corrales JA, Torres F, Candelas FA:
*IEEE ICRA 2010 Workshop on Multimodal Human-Robot Interfaces.*Anchorage, Alaska; 2010.Google Scholar - Corrales JA, Candelas FA, Torres F: Safe human-robot interaction based on dynamic sphere-swept line bounding volumes.
*Robot. Comput. Integr. Manuf*2011, 27(1):177-185. 10.1016/j.rcim.2010.07.005View ArticleGoogle Scholar - Nakabo Y, Saito H, Ogure T, Jeong SH, Yamada Y: Development of a safety module for robots sharing workspace with humans. In
*Intelligent Robots and Systems. IROS 2009. IEEE/RSJ International Conference On*. St. Louis, MO, USA; 2009:5345-5349.View ArticleGoogle Scholar - Baerveldt A-J: Cooperation between man and robot: interface and safety. In
*Robot and Human Communication, 1992. Proceedings. IEEE International Workshop On*. Tokyo; 1992:183-187.View ArticleGoogle Scholar - Kuhn S, Gecks T, Henrich D: Velocity control for safe robot guidance based on fused vision and force/torque data. In
*2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems*. Heidelberg, Germany; 2006:485-492.View ArticleGoogle Scholar - Ebert DM, Henrich DD: Safe human-robot-cooperation: image-based collision detection for industrial robots. In
*Intelligent Robots and Systems, 2002. IEEE/RSJ International Conference On*. Lausanne, Switzerland; 2002:1826-18312.View ArticleGoogle Scholar - Gecks T, Henrich D: Simero: camera supervised workspace for service robots. In
*2nd Workshop on Advances in Service Robotics, Fraunhofer IPA*. Germany; 2004.Google Scholar - Gecks T, Henrich D: Multi-camera collision detection allowing for object occlusions. In
*37th International Symposium on Robotics (ISR 2006)/4th German Conference on Robotics (Robotik 2006) VDI/VDE-Gesellschaft Mess- und Automatisierungstechnik*. München, Germany; 2006.Google Scholar - Henrich D, Kuhn S: Modeling intuitive behavior for safe human/robot coexistence cooperation. In
*Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE International Conference On*. Orlando, FL; 2006:3929-3934.View ArticleGoogle Scholar - Fevery B, Wyns B, Boullart L, Llata JR, Torre-Ferrero C: Industrial robot manipulator guarding using artificial vision. In
*Robot Vision*. Edited by: Ales U. In-Tech, Vukovar; 2010:429-454.Google Scholar - Bascetta L, Ferretti G, Rocco P, Ardo H, Bruyninckx H, Demeester E, Lello ED: Towards safe human-robot interaction in robotic cells: an approach based on visual tracking and intention estimation. In
*Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on*. San Francisco, CA; 2011:2971-2978.View ArticleGoogle Scholar - Kulic D, Croft EA: Real-time safety for human robot interaction.
*Robot. Autonom. Syst*2006, 54(1):1-12. 10.1016/j.robot.2005.10.005View ArticleGoogle Scholar - Bascetta L, Magnani G, Rocco P, Migliorini R, Pelagatti M: Anti-collision systems for robotic applications based on laser time-of-flight sensors. In
*In Proceedings of the IEEE/ASME International Conference on Advanced Intelligent Mechatronics*. (AIM). Montreal, Canada; 2010:278-284.Google Scholar - Flacco F, De Luca A: Multiple depth/presence sensors: integration and optimal placement for human/robot coexistence. In
*Robotics and Automation (ICRA), 2010 IEEE International Conference On*. Anchorage, Alaska; 2010:3916-3923.View ArticleGoogle Scholar - Graf J, Czapiewski P, Woern H: Evaluating risk estimation methods and path planning for safe human- robot cooperation. In
*Proceedings for the joint conference of ISR 2010 (41st Internationel Symposium on Robotics) und ROBOTIK 2010 (6th German Conference on Robotics)*. VDE Verlag, Munich; 2010:1-7.Google Scholar - Fischer M, Henrich D: 3d collision detection for industrial robots and unknown obstacles using multiple depth images. In
*Depth Images, German Workshop on Robotics*. Technical University of Braunschweig. Braunschweig, Germany; 2009.Google Scholar - Van den Bergh M, Van Gool L: Combining rgb and tof cameras for real-time 3d hand gesture interaction. In
*Applications of Computer Vision (WACV), 2011 IEEE Workshop On*. Kona, Hawaii; 2011:66-72.View ArticleGoogle Scholar - Park S, Yu S, Kim J, Kim S, Lee S: 3d hand tracking using kalman filter in depth space.
*EURASIP J. Adv. Sig. Proc*2012, 36-2012.Google Scholar - Bartczak B, Schiller I, Beder C, Koch R: Integration of a time-of-flight camera into a mixed reality system for handling dynamic scenes, moving viewpoints and occlusions in real-time.
*In Proceedings of the International Symposium on 3D Data Processing, Visualization and Transmission Workshop. Georgia Institute of Technology, Atlanta, GA, USA*2008.Google Scholar - Kolb A, Barth E, Koch R: Tof-sensors: new dimensions for realism and interactivity. In
*Computer Vision and Pattern Recognition Workshops, 2008. CVPRW’08. IEEE Computer Society Conference On*. Anchorage, Alaska, USA; 2008:1-6.Google Scholar - Kolb A, Barth E, Koch R, Larsen R: Time-of-flight cameras in computer graphics.
*Comput. Graph. Forum*2010, 29(1):141-159. 10.1111/j.1467-8659.2009.01583.xView ArticleGoogle Scholar - Ghobadi S, Loepprich O, Lottner O, Hartmann K, Loffeld O, Weihs W: Analysis of the personnel safety in a man-machine-cooperation using 2d/3d images. In
*Proceedings of the EURON/IARP International Workshop on Robotics for Risky Interventions and Surveillance of the Environment*. Edited by: Cervera Y, Baudoin EMR, Pender J. Benicassim – Spain; 2008:59-59.Google Scholar - Fischer M, Henrich D: Surveillance of robots using multiple colour or depth cameras with distributed processing. In
*Distributed Smart Cameras, 2009. ICDSC 2009. Third ACM/IEEE International Conference On*. Como, Italy; 2009:1-8.View ArticleGoogle Scholar - Walter C, Vogel C, Elkmann N: A stationary sensor system to support manipulators for safe human-robot interaction. In
*Robotics (ISR), 41st International Symposium on and 2010 6th German Conference on Robotics (ROBOTIK)*. Curran Associates, Inc., sMunich; 2010:1-6.Google Scholar - Lange R:
*3d time-of-flight distance measurement with custom solid-state image sensors in cmos/ccd-technology*. University of Siegen, Siegen, Germany; 2000.Google Scholar - Clayton S: Kinect for Windows SDK to Arrive Spring 2011. . Accessed 21 April 2014 http://blogs.technet.com/b/microsoft_blog/archive/2011/02/21/kinect-for-windows-sdk-to-arrive-spring-2011.aspx
- Han J, Shao L, Xu D, Shotton J: Enhanced computer vision with microsoft kinect sensor: a review.
*IEEE Trans. Cybern*2013, 43(5):1318-1334.View ArticleGoogle Scholar - Liu L, Shao L: Learning discriminative representations from rgb-d video data. In
*Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence*. AAAI Press/International Joint Conferences on Artificial Intelligence, Beijing; 2013:1493-1500.Google Scholar - Flacco F, Kroger T, De A, Luca Khatib O: A depth space approach to human-robot collision avoidance. In
*Robotics and Automation (ICRA), 2012 IEEE International Conference On*. St Paul, MN, USA; 2012:338-345.View ArticleGoogle Scholar - Fuchs S:
*Calibration and multipath mitigation for increased accuracy of time-of-flight camera measurements in robotic applications. PhD thesis*. Technische Universität Berlin; 2012.Google Scholar - Kahlmann T:
*Range imaging metrology: Investigation, calibration and development. PhD thesis, Institute of Geodesy and Photogrammetry*. ETH Zurich; 2007.Google Scholar - Lindner M, Kolb A: Lateral and depth calibration of pmd-distance sensors.
*Ger. Res*2006, 4292(4292/2006):524-533.Google Scholar - Chiabrando F, Chiabrando R, Piatti D, Rinaudo F: Sensors for 3d imaging: metric evaluation and calibration of a ccd/cmos time-of-flight camera.
*Sensors*2009, 9(12):10080-10096. 10.3390/s91210080View ArticleGoogle Scholar - Powell MJD:
*Approximation Theory and Methods*. Cambridge University Press, New York; 1981.Google Scholar - MESA Imaging AG, Zurich, Switzerland; 2008.Google Scholar
- May S: 3D Time-of-flight ranging for robotic perception in dynamic environments (Doctoral Dissertation). Dusseldorf VDI-Verl.
*Univ. Osnabrü,ck, Germany*2009.Google Scholar - Robla S, Llata JR, Torre C, Sarabia EG: An approach for tracking oil slicks by using active contours on satellite images.
*OCEANS 2009 - EUROPE*2009, 1-8.View ArticleGoogle Scholar - Davies ER:
*Machine Vision, Third Edition: Theory, Algorithms, Practicalities (Signal Processing and Its Applications), 3rd edn*. Elsevier, Morgan Kaufmann, University of London, UK; 2005.Google Scholar - Friedman N, Russell S: Image segmentation in video sequences: a probabilistic approach. In
*Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence*. UAI’97, Morgan Kaufmann, San Francisco, Providence; 1997:175-181.Google Scholar - Piccardi M: Background subtraction techniques: a review. In
*Systems, Man and Cybernetics, 2004 IEEE International Conference On*. The Hague, The Netherlands; 2004:3099-31044.Google Scholar - Cristani M, Farenzena M, Bloisi D, Murino V: Background subtraction for automated multisensor surveillance: a comprehensive review.
*EURASIP J. Adv. Sig. Proc*2010, 2010: 1-24.View ArticleGoogle Scholar - Fuchs S, May S: Calibration and registration for precise surface reconstruction with tof cameras. In
*International Journal of Intelligent Systems Technologies and Applications*. Inderscience Publishers; 2008:274-284.Google Scholar - Bouguet J-Y: Camera Calibration Toolbox for Matlab. 2000.http://www.vision.caltech.edu/bouguetj/calib_doc/ . Accessed 03 March 2010Google Scholar
- Hartley R, Zisserman A:
*Multiple View Geometry in Computer Vision*. Cambridge University Press, New York; 2003.Google Scholar - Heikkila J, Silven O: A four-step camera calibration procedure with implicit image correction.
*Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference On*1997, 1106-1112.View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.