 Research
 Open Access
 Published:
Visual sensor fusion for active security in robotic industrial environments
EURASIP Journal on Advances in Signal Processing volume 2014, Article number: 88 (2014)
Abstract
This work presents a method of information fusion involving data captured by both a standard chargecoupled device (CCD) camera and a timeofflight (ToF) camera to be used in the detection of the proximity between a manipulator robot and a human. Both cameras are assumed to be located above the work area of an industrial robot. The fusion of colour images and timeofflight information makes it possible to know the 3D localization of objects with respect to a world coordinate system. At the same time, this allows to know their colour information. Considering that ToF information given by the range camera contains innacuracies including distance error, border error, and pixel saturation, some corrections over the ToF information are proposed and developed to improve the results. The proposed fusion method uses the calibration parameters of both cameras to reproject 3D ToF points, expressed in a common coordinate system for both cameras and a robot arm, in 2D colour images. In addition to this, using the 3D information, the motion detection in a robot industrial environment is achieved, and the fusion of information is applied to the foreground objects previously detected. This combination of information results in a matrix that links colour and 3D information, giving the possibility of characterising the object by its colour in addition to its 3D localisation. Further development of these methods will make it possible to identify objects and their position in the real world and to use this information to prevent possible collisions between the robot and such objects.
1 Introduction
Since the 1960s, industrial robots have been used in the manufacturing industry and they have substituted humans in various repetitive, dangerous, hostile tasks. A consequence associated with the incorporation of robots in industry is the emergence of new risks of accidents for workers. The normatives which incorporate, among many other aspects, these robotrelated risks, include the international standard ISO 10218, the American ANSI/RIA R15.06, the European EN 775, and national normatives such as the Spanish UNEEN 755. To prevent accidents, the selection of a security system must be based on the analysis of these risks. Traditionally, these security systems separate the robot workspace from the human one. One example of this requirement was reflected in the Spanish normative UNEEN 755:1996 [1]. It is established that sensor systems have to be incorporated to prevent the entrance of humans in a hazardous area in case the operating state of the robotic system implies dangers to the human. According to traditional normatives, maintenance, repair, or programming personnel can only be inside the robot workspace if the industrial robot is not in automatic mode.
However, in recent years, due in part to the flexible design of products, the optimization of production methods, and the introduction of new technologies, the tasks performed by industrial robots are no longer restricted to the transfer of objects, or other repetitive tasks. Instead, there is an increasing number of tasks in which humans and robots combine their skills in collaborative work.
To enable collaboration between human and robot, safety measures that establish a rigid separation between human and robot workspaces have to be removed. Instead, the introduction of other types of security systems is required so that collisions can be avoided by detecting obstacles as well as their dynamic characteristics, and harm to the human can be mitigated in case of an unexpected impact. For this reason, research in this field is directed towards changing the way a human interacts with a robot so that the trend is that both human and robot can share the same workspace at the same time. This change in the working relationship is reflected in the updates carried out from the year 2006 in the international normatives ISO10218 [2] and guidelines for the implementation of these regulations, such as [3]. In these guidelines, new concepts are presented, such as collaborative robots, collaborative operations, and spaces of collaborative work.
Taking into account that security is a fundamental aspect in the design of robotic manufacturing systems, the development of systems and security strategies that allow safe collaborative work between human and robot is essential. The aim of this paper is to contribute at the initial stage of the design of a system for collision prevention between a human and a robot manipulator sharing a workspace at the same time. A method for processing of information acquired from two different types of vision sensors located above an industrial robot environment is proposed. The method, which is mainly focused on information captured from a timeofflight camera, allows the fusion of both colour and 3D information, as an initial step towards the development of an active security system for application in an industrial robotics environment. This information fusion generates a colour and 3D information matrix which allows simultaneously estimating colour characteristics from an object and its threedimensional position in a world coordinate frame. At a later step, the use of this combination of information will allow to associate a security volume around each characterised object, in order to prevent possible collisions between industrial robot and human.
2 Related work on shared human robot workspaces
A brief summary of different types of security applied to industrial robotic environments is provided in order to give the context to the work presented in this paper. With the aim of giving context to the work presented in this paper, Figure 1 presents a possible classification of these types of security, as well as goals to achieve for each type of security, systems and devices used, and actions to apply on the robotic system.
Security systems in industrial robotic environments can be classified as passive and active. Passive security systems are hazard warning elements which do not alter the robot behaviour. These systems are audible or visible signals such as alarms or lights or systems that prevent the inadvertent access to a restricted area. Active security systems in industrial robotic environments can be defined as the methods used to prevent the intrusion of humans to the robot workspace when it is in automatic mode. The difference with the passive methods is that active methods can modify the robot behaviour. Historically, devices such as movement, proximity, force, acceleration, or light sensors are used to detect human access to the robot workspace and to stop the execution of the robot task. However, as it has been discussed previously, research in this field is moving towards allowing humans and robots to share workspaces.
2.1 Collision avoidance
A further way to enhance safety in shared human/robot work/workspaces is to implement collision avoidance systems. Robots have been provided with sensors capturing local information. Ultrasonic sensors [4], capacitive sensors [5, 6], and laser scanner systems [7] have been tried to avoid collisions. However, the information provided by these sensors does not cover the whole scene, and so these systems can only provide a limited contribution to enhance safety in humanrobot collaboration tasks [8]. Moreover, geometric representations of human and robotic manipulators have been used to obtain a spatial representation in humanrobot collaboration tasks. Numerical algorithms are then used to compute the minimum distance between human and robot and to search for collisionfree paths [9–12]. Methods have been proposed involving the combination of different types of devices to help avoid collisions. This idea has been applied into a cell production line for component exchange between human and robot in [13], where the safety module uses commands from light curtain sensors, joint angle sensors, and a control panel to prevent the collision with the human when exchanging an object. The discussion concentrates below in artificial vision systems, range systems, and their combination.
2.1.1 Artificial vision systems
Artificial vision systems have also been used to prevent humanrobot collisions. This information can be used on its own or in the combination with information from of others types of devices. In order to achieve safe humanrobot collaboration, [14] describes a safety system made up of two modules. One module is based on a camera and computer vision techniques to obtain the human location. The other module, which is based on accelerometers and joint position information, is used to prevent an unexpected robot motion due to a failure of robot hardware or software. Research work such [15] investigates safety strategies for humanrobot coexistence and cooperation. The use of a combination of visual information from two cameras and information from a force/torque sensor is proposed. In order to perform collision tests, other work has used visual information acquired by cameras [16, 17] to generate a 3D environment. Also, visual information is used to separate humans and other dynamic unknown objects from the background [18] or to alter the behaviour of the robot [19]. In [20–22], visual information has been used to develop safety strategies based on fuzzy logic, probabilistic methods, or the calculation of warning index, respectively.
2.1.2 Range systems
The depth map of a scene can be obtained by using depth sensors such as laser range finders and stereo camera systems. The results of using a laser timeofflight (ToF) sensor are presented in [23] and [24] with the latter using several depth sensors in combination with presence sensors. Recently, a new type of camera has become available. These cameras, denominated as rangeimaging cameras, 3D ToF cameras, or PMD cameras, capture information providing a 3D point cloud, among other information. They are starting to be used in active security systems for robotic industrial environments, among other applications. An example is a single framework for humanrobot cooperation whose purpose is to achieve a scene reconstruction of a robotic environment by markerless kinematic estimation. For example, [8, 25] use the information delivered by a 3D ToF camera mounted to the top of a robotic cell. This information is employed with the purpose of extracting robust features from the scene, which are the inputs to a module that estimates risks and controls the robot. In [26], the fusion of 3D information obtained from several range imaging cameras and the application of the visual hull technique are used to estimate the presence of obstacles within the area of interest. The configurations of a robot model and its future trajectory along with information on the detected obstacles are used to check for possible collisions.
2.1.3 Combination of vision and range systems
This technique is based on the combination of 3D information from range cameras and 2D information from standard chargecoupled device (CCD) cameras. Although this technique is being used in other applications, such as hand following [27, 28] or mixed reality applications [29–31], not much work has been reported using this technique in the area of active security in robotic environments. In [32], an analysis of human safety in cooperation with a robot arm is performed. This analysis is based on information acquired by a 3D ToF camera and a 2D/3D Multicam. This 2D/3D Multicam consists of a monocular hybrid vision system which fuses range data from a PMD ToF sensor, with 2D images from a conventional CMOS grey scale sensor. The proposed method establishes that while the 3D ToF camera monitors the whole area, any motion in the shared zones is analysed using the 2D/3D information from the Multicam. In [33], a general approach is introduced for surveillance of robotic environments using depth images from standard colour cameras or depth cameras. The fusion of data from CCD colour cameras or from ToF cameras is performed to obtain the object hull and its distance with respect to the known geometry of an industrial robot. They also present a comparison between distance information from colour and ToF cameras and a comparison between a ToF camera and ToF information fusion. One of the conclusions of this work is that the fusion of information from several ToF cameras provides better resolution and less noise than the information obtained from a single camera. Finally, [34] describes a hybrid system based on a ToF camera and a stereo camera pair which is proposed to be applied in humanrobot collaboration task. Stereo information is used in unreliable ToF data points to generate a depth map which is fused with the depth map from the ToF camera. Colour feature is not taken into account. On the other hand, nearly a decade after that ToF cameras emerged into the industrial trade [35], a new type of 3D sensors (RGBD sensors), which are fitted with a RGB camera and a 3D depth sensor, were launched for noncommercial use [36]. The RGBD sensor has several advantages over ToF cameras such as higher resolution, lower price, and the availability of depth and colour information. Hence, its study and application have been objective of research work such as [37] that presents a review of Kinectbased computer vision algorithms and applications. Several topics are presented like preprocessing tasks including a review of Kinect recalibration techniques, object tracking and recognition, and human activity analysis. These authors propose in [38] an adaptive learning methodology to extract spatiotemporal features, simultaneously fusing the RGB and depth information. In addition to this, a review of several solutions to carry out information fusion of RGBD data is presented. Also, a website for downloading a dataset made of RGB and depth information for hand gesture recognition is introduced. Related to active security system in industrial robotic environments, the use of the Kinect sensor is being incorporated as it is shown in [39] where a realtime collision avoidance approach based on this sensor is presented.
3 Method for the fusion of colour and 3D information
The presented method for fusion of acquired information from a ToF camera and a colour camera has a different standpoint from the ones proposed in the consulted papers. According to papers that are not related to active security in robotic industrial environments such as [27], the spatial transformation is performed establishing the ToF camera coordinate system as the reference coordinate system. Therefore, if an object position in a world coordinate system wanted to be known, another calibration should be done to establish the rotation matrix and translation vector that connected both coordinate systems. Nevertheless, in the present paper, this aspect has been considered. Therefore, it was needed to define a common coordinate system for an industrial robot, a colour camera, and a ToF camera, in order to know at the same time 3D object location at the robot arm workspace and its colour feature. According to papers focusing on mixed reality applications as paper [29], the used setup includes a CCD firewire camera, a ToF camera, and a fisheye camera. After performing the calibration and establishing relative transformations between the different cameras, a background model, whose use eliminates the need for chroma keying and also supports planning and alignment of virtual content, was generated allowing to segment the actor from the scene. Paper [31] presents a survey of ToF basic measurement principles of ToF cameras including, among other issues, camera calibration, range image preprocessing, and sensor fusion. Several studies which study different combinations of highresolution cameras and lowerresolution ToF cameras are mentioned.
In relation to the paper focused on active security, the most closely related to our work is [32]. Though a common world coordinate system for cameras and robot is also used, the method seem to present certain differences because a spatial transform function is identified in order to map the image coordinates of the 2D sensor to the corresponding coordinates of the PMD sensor. Moreover, saturated pixels errors do not seem to have been considered. Here, the presented work shows a different standpoint since the obtained parameters from the cameras calibration are used to transform 3D point cloud given in the ToF camera coordinate system to the world coordinate system, and finally, the obtained internal and external parameters are used to achieve the reprojection of corrected 3D points (distance error, saturated pixels, and jump edge effect) into colour images.
With the aim of allowing any researcher to implement the proposed method of fusion of information exactly like it that has been carried out at the present work, this paper gives a mathematical detailed description of the steps involved in the proposed method.
In what follows, it is assumed that a 3D ToF camera and a colour camera are fixed and placed over the workspace of a robot arm and that the fields of view of both cameras are overlapped. Also, it is assumed that external temperature conditions are constant, and that the integration time parameter of the 3D ToF camera is automatically updated at each data acquisition. Image and 3D data from the scene is captured and processed as described in the next subsections. Assume that the ToF camera has a resolution n_{ x }×n_{ y } and that the CCD camera has a resolution {\widehat{n}}_{x}\times {\widehat{n}}_{y}.
In what follows, vectors and matrices are denoted by Roman bold characters (e.g. x). The j th element of a vector x is denoted as x_{ j }, element (i,k) of a matrix A is denoted as A_{i,k}, a superindex in parenthesis (j) denotes a node within a range of distances, a subindex within square brackets such as [i] denotes an element of a set.
3.1 Reduction TOF range camera errors
The reduction of range camera errors is a fundamental step to achieve an acceptable fusion of colour and 3D information. The existence of these errors cause the fused information to have issues that range from minor, such as border inaccuracy, to serious such as the loss of information in saturated pixels coordinates.
3.1.1 Distance error reduction
As it is well documented that ToF cameras suffer from a nonlinear distance error, several experiments have been developed in order to model and correct the distance error (or circular error) [35, 40–43]. With the purpose of decreasing the influence of this error in distance measurements, a procedure is described below to correct the ToF distance values based on a study of the the behaviour of the camera. This study requires a ToF camera to be positioned parallel to the floor, and a flat panel of light colour and low reflectance, to be mounted on a robot arm. The panel position is also parallel to the floor. The robot arm allows to displace the panel along a distance range and ToF data at different distances can be captured.
The distance error analysis from the acquired data can be performed in two ways: a global analysis of all the pixels without taking pixel position into account and an analysis which takes into account the position of each pixel. The first analysis is easier to perform as it only requires a relatively small panel; it is assumed that there is no error due to pixel localization and only a reduced region of the 3D ToF data is analysed. The second analysis can be carried out to check the suitability of the assumption of negligible error due to pixel localization of the first analysis. The second analysis requires a larger panel, as the distance image captured by the camera has to be based only on the panel for different distances. Both methods are described in the steps below.

1.
Image capture. Since distance measurements are influenced by the camera internal temperature, a minimum time period is necessary to obtain stable measurements [43]. After the camera warms up, ToF information is captured at each of the P different nodes in which the distance range D was divided. Each captured data is defined by an amplitude matrix A of dimensions n _{ x }×n _{ y }, and 3D information made up of three coordinates matrices X,Y, and Z, each one of dimensions n _{ x }×n _{ y }. In order to generate a model of distance error, a set {{\mathcal{Z}}_{\mathcal{T}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}=\left\{{\mathbf{Z}}_{T\left[1\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)},{\mathbf{Z}}_{T\left[2\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)},\dots ,{\mathbf{Z}}_{T\left[N\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}\right\} of distance information in the z axis is formed by capturing N images at each node j, with j=1,…,P. Similarly, sets of distance information for training are defined for the x and y axes, which are denoted as {{\mathcal{X}}_{\mathcal{T}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)} and {{\mathcal{Y}}_{\mathcal{T}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}, respectively. In order to validate the model so obtained, a set {{\mathcal{Z}}_{\mathcal{V}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}=\left\{{\mathbf{Z}}_{V\left[1\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)},{\mathbf{Z}}_{V\left[2\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)},\dots {\mathbf{Z}}_{V\left[M\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}\right\} of distance information is also formed by capturing M additional images at each node j, with j=1,…,P. Similarly, sets of distance information for validation are defined for the x and y axes, which are denoted as {{\mathcal{X}}_{\mathcal{V}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)} and {{\mathcal{Y}}_{\mathcal{V}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}, respectively. In this article, the sets of information {\mathcal{Z}}_{\mathcal{T}} and {\mathcal{Z}}_{\mathcal{V}} are also called ToF distance images and are defined as {\mathcal{Z}}_{\mathcal{T}}=\left\{{{\mathcal{Z}}_{\mathcal{T}}}^{\left(1\right)},\dots ,{{\mathcal{Z}}_{\mathcal{T}}}^{\left(P\right)}\right\}, and {\mathcal{Z}}_{\mathcal{V}}=\left\{{{\mathcal{Z}}_{\mathcal{V}}}^{\left(1\right)},\dots ,{{\mathcal{Z}}_{\mathcal{V}}}^{\left(P\right)}\right\}.

2.
Angle correction. Correction angles are applied to the ToF information sets for each axis x,y, and z, with the aim of compensating for any 2D angular deviation between the the (x,y) plane of the range camera and the plane defined by the floor. This 2D angular deviation is denoted by the angles θ _{ x } and θ _{ y }. This correction allows obtaining parameter values as if both camera and panel were perfectly parallel.
Given an x axis distance image X_{ T } of dimensions n_{ x }×n_{ y }, define its submatrix \widehat{\mathbf{x}} of dimensions n_{1}×n_{2}, where n_{1}<int(n_{ x }/2) and n_{2}<int(n_{ y }/2), as a matrix formed such that its top left element {\widehat{x}}_{1,1} corresponds to element {{X}_{T}}_{{i}_{c},{j}_{c}}. Index i_{ c } is chosen as int(n_{ x }/2), and index j_{ c } is chosen as int(n_{ y }/2). Similarly, submatrices \widehat{\mathbf{y}} and \widehat{\mathbf{z}} are defined for axes y and z, respectively. Define \stackrel{\u0304}{\mathbf{x}}, \stackrel{\u0304}{\mathbf{y}}, and \stackrel{\u0304}{\mathbf{z}} as the columnwise vectorised forms of submatrices \widehat{\mathbf{x}}, \widehat{\mathbf{y}}, \widehat{\mathbf{z}}, each with dimension n×1, where n=n_{1}n_{2}, with n as the number of pixels from the selected area. This central region is taken from each ToF distance image to estimate and correct the 2D angle inclination between the panel and the ToF camera. Hence, for each image region, 3D points are modified using the rotation matrices R_{ x } and R_{ y }:
such that
where G has dimensions 3×n. The transformed image region for the z coordinate is obtained from the rows of G:
and in this way, a vector {\stackrel{\u0304}{\mathbf{z}}}^{\prime} of dimensions n×1 is defined.
A second rotation transformation is applied around the y axis such that
The transformed image region for the y coordinate is obtained from the rows of H:
where {\stackrel{\u0304}{\mathbf{z}}}^{\mathrm{\prime \prime}} is of dimension n×1. Since the above rotation causes a displacement of the 3D points along the y axis, the \stackrel{\u0304}{\mathbf{y}} vector is used to represent ToF information after angle correction. Then, in this way, the 3D ToF vectors after angle correction are \stackrel{\u0304}{\mathbf{x}},\stackrel{\u0304}{\mathbf{y}},{\stackrel{\u0304}{\mathbf{z}}}^{\mathrm{\prime \prime}}, each one of dimensions n×1.

3.
If the pixel position is not considered, then:

(a)
Discrepancy curve calculation stage. In order to test the angle correction effect over the distance error, the same procedure is applied using data before and after angle correction. However, the method is described using data after angle correction. The selected area is used to calculate several parameters including the mean distance value, discrepancy distance value, and mean squared error (MSE). Define a set of distances after angle correction {\stackrel{\u0304}{\mathcal{Z}}}^{\mathrm{\prime \prime}\left(\phantom{\rule{0.3em}{0ex}}j\right)}=\left\{{\stackrel{\u0304}{\mathbf{z}}}_{\left[1\right]}^{\mathrm{\prime \prime}\left(\phantom{\rule{0.3em}{0ex}}j\right)},{\stackrel{\u0304}{\mathbf{z}}}_{\left[2\right]}^{\mathrm{\prime \prime}\left(\phantom{\rule{0.3em}{0ex}}j\right)},\dots ,{\stackrel{\u0304}{\mathbf{z}}}_{\left[N\right]}^{\mathrm{\prime \prime}\left(\phantom{\rule{0.3em}{0ex}}j\right)}\right\} at each node j, with j=1…P. The mean distance ToF over the selected area in all ToF distance images, {\stackrel{\u0304}{\mathbf{Z}}}_{j}, at each node j, is calculated by means of:
{\stackrel{\u0304}{Z}}_{j}=\frac{1}{\mathit{\text{nN}}}\sum _{i=1}^{N}\sum _{k=1}^{n}{\stackrel{\u0304}{z}}_{\left[i\right]k}^{\mathrm{\prime \prime}\left(\phantom{\rule{0.3em}{0ex}}j\right)}\phantom{\rule{1em}{0ex}}(6)
where the resulting \stackrel{\u0304}{\mathbf{Z}} is a vector with dimensions P×1.
Defining L_{ j } as a distance value obtained by a laser distance meter at each node j (henceforth this value is treated as ground truth), and a vector L=[L_{1},…,L_{ P }]^{T}, with dimensions P×1. Then, the discrepancy distance vector, δ_{ d }, is calculated as the difference between the mean distance from the ToF camera after angle correction, \stackrel{\u0304}{\mathbf{z}}, and the ground truth vector L:
In order to obtain correction values to be applied in new ToF distances images, a cubic spline is used for fitting this discrepancy information for each distance. The cubic spline is modelled as a function s that passes through all the points \left(\stackrel{\u0304}{\mathbf{Z},}{\delta}_{\mathbf{d}}\right) and at each interval [{\stackrel{\u0304}{Z}}_{j},{\stackrel{\u0304}{Z}}_{j+1}] and is expressed as a polynomial.
where j=1,…,P−1. For each subinterval, the coefficients a_{0},a_{1},a_{2},a_{3} are calculated so that the curve passes through the points ({\stackrel{\u0304}{Z}}_{j},{\delta}_{{d}_{j}}) and ({\stackrel{\u0304}{Z}}_{j+1},{\delta}_{{d}_{j}+1})[44]. The resulting spline, henceforth called the discrepancy curve, allows to estimate the discrepancy correction value, given a ToF distance.

(b)
Discrepancy correction. In order to reduce the errors in the distance estimates obtained from the ToF information, the set of ToF distance images for validation {\mathcal{Z}}_{\mathcal{V}} is used to validate the discrepancy curve. To this end, a vector of validation ToF distance images after angle correction {\stackrel{\u0304}{\mathbf{z}}}_{v}^{\mathrm{\prime \prime}} (dimension n×1) is defined and evaluated on the discrepancy curve to obtain the vector of correction values C (dimension n×1). Then, the corrected distance value for a distance image after its angle correction is calculated as follows:
\stackrel{\circ}{\mathbf{z}}={\stackrel{\u0304}{\mathbf{z}}}_{v}^{\mathrm{\prime \prime}}\mathbf{C}\phantom{\rule{1em}{0ex}}(9)
Define {\stackrel{\circ}{\mathcal{Z}}}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}=\left\{{\stackrel{\circ}{z}}_{\left[1\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)},{\stackrel{\circ}{\mathbf{z}}}_{\left[2\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)},\mathrm{...},{\stackrel{\circ}{\mathbf{z}}}_{\left[M\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}\right\} as a set of distances after discrepancy correction for each node j, with j=1…P, the mean value after discrepancy correction for the M ToF distance images obtained at each j node is calculated as follows:
with j=1…P and where the resulting \stackrel{\circ}{\mathbf{Z}} is a vector with dimensions P×1.
In order to observe the effect that these corrections have over the 3D ToF points, the MSE can be calculated before and after the discrepancy correction. Defining for each node j a vector with the corresponding laser distance meter values L^{′}^{(j)}=[L^{(j)},…,L^{(j)}]^{T} with dimension n×1 (treated here as ground truth), then the mean squared error at each pixel k and for node j can be calculated as
where ∥.∥ is the euclidean norm, N^{′} is the number of ToF distance images used, Z is a vector of ToF distance values that can be substituted by the angle corrected vector {\stackrel{\u0304}{\mathbf{z}}}^{\mathrm{\prime \prime}} of each distance image, or by the discrepancy corrected vector \stackrel{\circ}{\mathbf{Z}} of each distance image, each one with dimension n×1, and with j=1,…,P. The set of MSE{\phantom{\rule{0.1em}{0ex}}}_{k}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)} values for k=1,…,n gives an indication of the planar distribution of the distance error for a given node j. Then, for a given node j, it is possible to average the mean square errors to obtain an indication of the error depending on the node position

4.
If the position of each pixel is taken into account, then:

(a)
Discrepancy curves calculation stage. Using the N angle corrected ToF distance images represented by {\stackrel{\u0304}{\mathbf{z}}}^{\mathrm{\prime \prime}}, a discrepancy curve is calculated for each pixel at each distance node. At this stage, using N images at each node j, the mean value of each pixel k, where k=1,…,n, is calculated as follows:
{\stackrel{\u0304}{V}}_{k,j}=\frac{1}{N}\sum _{i=1}^{N}{\stackrel{\u0304}{z}}_{\left[i\right]k}^{\mathrm{\prime \prime}\left(\phantom{\rule{0.3em}{0ex}}j\right)}\phantom{\rule{1em}{0ex}}(13)
where the resulting \stackrel{\u0304}{\mathbf{V}}, whose elements are the values {\stackrel{\u0304}{V}}_{k,j}, is a matrix with dimensions n×P.
Define a new matrix {\mathbf{L}}^{\mathrm{\prime \prime}} of dimension n×P which is obtained by replicating n times the laser distances vector L^{T} as follows:
Then, the discrepancy distance vector δ_{ v } for all the j nodes is calculated for each pixel k=1,…,n as the difference between the mean distance from the ToF camera after angle correction, \stackrel{\u0304}{\mathbf{V}}, and the ground truth distance vector {\mathbf{L}}^{\mathrm{\prime \prime}} obtained using a laser distance meter:
with δ_{ v } of dimension n×P.
In order to obtain n correction values to be applied to any new ToF distances images, a cubic spline is calculated to fit this discrepancy information along the distance range for each pixel. The cubic spline is modelled at each pixel k using Equation 8 and the data points (\stackrel{\u0304}{\mathbf{V}},{\delta}_{v}).

(b)
Correction using a discrepancy curve at each pixel. In order to reduce the errors in the ToF distances images, the set of ToF distances images {\mathcal{Z}}_{\mathcal{V}} is used to validate each discrepancy curve at each pixel. To this end, each pixel k of the validation vector after angle corrections {\stackrel{\u0304}{\mathbf{z}}}_{v}^{\mathrm{\prime \prime}} (dimension n×1) is evaluated on its discrepancy curve to obtain the vector of correction values C _{ v } (dimension n×1). Then, the corrected distance vector \stackrel{\circ}{\mathbf{v}} (dimension n×1) is obtained using the expression
\stackrel{\circ}{\mathbf{v}}={\stackrel{\u0304}{\mathbf{z}}}_{v}^{\mathrm{\prime \prime}}{\mathbf{C}}_{v}\phantom{\rule{1em}{0ex}}(16)
Define {\stackrel{\circ}{\mathcal{V}}}^{\left(j\right)}=\left\{{\stackrel{\circ}{\mathbf{v}}}_{\left[1\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)},{\stackrel{\circ}{\mathbf{v}}}_{\left[2\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)},\mathrm{...},{\stackrel{\circ}{\mathbf{v}}}_{\left[M\right]}^{\left(\phantom{\rule{0.3em}{0ex}}j\right)}\right\} as the set of distances after discrepancy correction, where the mean value at each pixel k for each j node is calculated as follows:
with k=1,…,n, j=1,…,P, and where the resulting \stackrel{\circ}{\mathbf{v}} is a matrix with elements {\stackrel{\circ}{V}}_{k,j} and dimensions n×P of mean ToF distances values at each pixel for each node.
The mean squared error is obtained by means of Equation 11, where Z is replaced by the corrected values \stackrel{\circ}{\mathbf{v}}.
A comparison of the MSE values for discrepancy corrected and noncorrected measurements gives a measure of improvement in accuracy due to the discrepancy correction. If no such improvement is detected, then it is recommended to revise the experimental conditions as this may indicate the existence of problems with the experiment.
3.1.2 Correcting the values of saturated pixels
Information from range cameras can be affected by pixel saturation, which is caused by an excessive reflectance of light over objects. Though its effect can be reduced by an automatic updated of the integration time parameter of the ToF camera [31], in some circumstances like the presence of metal or reflecting paints, this tool is not enough.
The saturation of range camera information affects the amplitude and distance values returned by the range camera. These values are very different from the remaining pixel values of the scene. The proposed strategy to detect saturated pixels is based on this fact, and an analysis of amplitude signal is made. The method has two stages.

1.
Looking for saturated pixels. According to [45], pixel saturation occurs when the amplitude values are greater than a given threshold value ζ, which depends on the camera being employed. Hence, the amplitude image is searched for values greater or equal than this value in order to generate a saturation binary mask M with ones at the positions of the saturated pixels and zeros elsewhere. To be able to perform the correction on pixels located at the edges of the image, the amplitude and 3D information matrices are augmented by replicating rows and columns located at the edges of the matrix. Define p as the number of rows and columns of A to be replicated. Define the p upper rows of A as B _{i,j}=A _{i,j}, such that B is of dimension p×n _{ y }, where i=1,…,p and j=1,…,n _{ y }, and the p lower rows as {B}_{i,j}^{\prime}={A}_{{n}_{x}i+1,j}, such that B ^{′} is of dimension p×n _{ y }, where i=1,…,p, and j=1,…,n _{ y }. Define the intermediate matrix \widehat{\mathbf{A}} as follows:
\widehat{\mathbf{A}}=\left[\begin{array}{c}\mathbf{B}\\ \mathbf{A}\\ {B}^{\prime}\end{array}\right](18)
where \widehat{\mathbf{A}} is matrix of dimension 2p+n_{ x }×n_{ y }. Then, define the left p columns of A as {B}_{i,j}^{\mathrm{\prime \prime}}={A}_{i,j}, such that {\mathbf{B}}^{\mathrm{\prime \prime}} is of dimension 2p+n_{ x }×p, where i=1,…,2p+n_{ x } and j=1,…,p and the p right columns as {B}_{i,j}^{\mathrm{\prime \prime \prime}}={A}_{i,{n}_{y}j+1} such that {\mathbf{B}}^{\mathrm{\prime \prime \prime}} is of dimension 2p+n_{ x }×p, where i=1,…,2p+n_{ x } and j=1,…,p. Then, the augmented amplitude matrix \stackrel{~}{\mathbf{A}} of dimensions 2p+n_{ x }×2p+n_{ y } is given by:
To represent saturated pixels in \stackrel{~}{\mathbf{A}}, the binary mask \stackrel{~}{\mathbf{M}} matrix of dimensions 2p+n_{ x }×2p+n_{ y }, is defined by
where i=1,…,n_{ x }+2p, j=1,…,n_{ y }+2p.
The set of index pairs indicating the positions of saturated pixels is defined as follows
where \mathcal{I}=[1,\dots ,{n}_{x}+2p], \mathcal{J}=[1,\dots ,{n}_{y}+2p].

2.
Correction of saturated pixels. In order to replace an incorrect value with the average of its neighbours, the saturation binary mask is used to find the coordinates of saturated values in the amplitude and 3D matrices and to calculate the mean value of surrounding pixels. Saturated values are not taken into account in this calculation. Define a windowmask {\stackrel{\circ}{M}}_{i,j}={\stackrel{~}{M}}_{rp+i1,cp+j1}, with i=1,…2p+1 and j=1,…,2p+1, of dimensions 2p+1×2p+1, whose center is each saturated pixel with position (r,c)\in \mathcal{Q}. In order to calculate a new pixel value to replace a saturated pixel value, define a window of amplitude values {\stackrel{\circ}{A}}_{i,j}={\xc3}_{rp+i1,cp+j1}, with i=1,…2p+1 and j=1,…,2p+1, of dimensions 2p+1×2p+1, whose center corresponds to each saturated pixel with position (p+1,p+1). The new value {\xc3}_{r,c} for each saturated pixel (r,c)\in \mathcal{Q} is calculated as
\begin{array}{c}{\xc3}_{r,c}=\frac{1}{(2p+1)(2p+1)1}\left(\sum _{k=1}^{2p+1}\sum _{\text{\u0142}=1}^{2p+1}{\stackrel{\circ}{A}}_{k,l}{\stackrel{\circ}{A}}_{p+1,p+1}\right)\end{array}(22)
With the aim of selecting and replacing values in the amplitude/3D information matrices, Figure 2 shows an example of the movement of a search window obtained from the binary saturation mask.
Define (X,Y) as the initial ToF data, \stackrel{\circ}{z} as the distance TOF data after discrepancy correction, and using the index set of amplitude saturated values, a similar procedure to correct the corresponding values of these matrices is applied, obtaining matrices (\stackrel{~}{\mathbf{X}},\stackrel{~}{\mathbf{Y}},\stackrel{~}{\mathbf{Z}}), as these values are affected by the amplitude saturation. Once saturated pixels are corrected, all matrices are resized to their initial dimensions by removing the rows and columns previously added, which results in matrices X^{′},Y^{′},Z^{′}, and A^{′}.
3.1.3 Jump edge reduction
Another error that may affect the 3D data from a range camera is known as jump edge. This error produces spurious pixels which are 3D inaccurate measures of the real scene. In order to reduce this effect, the use of a median filter followed by a jump edge filter based on a local neighbourhood is proposed in [46]. Other solutions which implement nonlocal means filter or edgedirected resampling techniques are enumerated in paper [31]. In the present work, the use of 2D techniques applied to 3D points is proposed to prevent border inaccuracy in fused information. Traditionally, the technique of morphological gradient is used in grey scale images to emphasize transitions of grey levels [47, 48]. In this work, only distance values from 3D data are used, generating a distance image. With the objective of finding pixels suffering from this effect, the morphological gradient is calculated, using the following expression [48]:
where g is of dimension n_{ x }×n_{ y }, f is a ToF distance matrix of same dimension as g, S is a 3×3 generalised dilation or erosion mask, and ⊕ and ⊗ are dilation and erosion operations, respectively.
A threshold value to discriminate nondesirable pixels from the remaining ones is then searched. With this aim, the distance image g is transformed into a new distance image G with values ranging from 0 to 255, by means of the following transformation:
After that, the histogram of G is calculated and then smoothed by means of a Butterworth filter. Finally, a threshold value η is defined by searching along the smoothed histogram for the first minimum to the right of the first maximum. A new distance matrix f^{′} is generated by forcing to zero spurious pixels which are found and keeping the same distance values for the remaining pixels:
When performing the fusion of ToF and colour information, jump edge reduction is carried out after scaling up the ToF information, as discussed below.
3.2 Colour and 3D information fusion
Information fusion from a standard CCD camera and a ToF camera allows the simultaneous use of 3D and colour information. This can be achieved by means of the reprojection of 3D ToF points into a colour image. In an active security system, moving objects, such as robots and humans, have to be detected to prevent possible collisions between them. To obtain information about these objects and develop the algorithms that make it possible to avoid collisions, the foreground detection is carried out in such way that the fused information is obtained only through those pixels classified previously as foreground pixels. The foreground object detection in a scene is carried out using 2D techniques over 3D ToF points, and subsequently, colour and 3D information from foreground objects is fused.
3.2.1 3D information analysis for detecting foreground objects
Background subtraction methods for detecting moving objects have been proposed, analysed, and employed to locate object motion in a 2D image sequence [49–51]. In this work, for the purpose of motion detection in 3D point cloud, and considering that ToF camera is static, and illumination changes do not affect the acquired 3D points, the background subtraction technique has been considered suitable to be adapted and applied to threedimensional information. Therefore, after performing distance and saturated pixel correction, a background subtraction method based on the reference image model is adapted to be used in a 3D point cloud. The goal is to discriminate the static part of the 3D scene from the moving objects, so an offline background reference image {\mathbf{B}}_{\mathcal{T}} is calculated as the average image during a time period \mathcal{T}=1,\dots ,t. Define a set of t ToF distance images after discrepancy and pixel saturation correction captured in a time period , such that {\mathcal{Z}}^{\prime}=\{{\mathbf{Z}}_{1}^{\prime},{\mathbf{Z}}_{2}^{\prime},\dots ,{\mathbf{Z}}_{t}^{\prime}\}, then, the background reference image is calculated as
where n is the number of pixels in each ToF distance image. With the aim of detecting pixels that show motion, the difference image Z d′ between the reference and a current image Z c′ is calculated as:
where · indicates an elementwise absolute value operation.
Foreground detection is performed in those pixels whose distance value, Z d′, exceeds a threshold value, T_{ h }, which results in a binary image Z b′. In order to automatically determine T_{ h }, the distance matrix Z d′ is processed as if it was 2D information by means of Equation 24, where g is replaced by Z b′, resulting in a grey scale image G^{′}. Then, the calculation of the smoothed histogram of G^{′} and the search for threshold value are carried out in a similar way as presented in the ‘Jump edge reduction’ section. The binarisation process to detect pixels that show motion is given by
In the resulting binary image, isolated pixels are removed using morphological operations (dilation, hole filling, and erosion). This enhanced binary image is used as a mask over the 3D points of Z^{′} to set the maximum value to the coordinate of 3D points whose coordinates in the binary image are considered as background (0 value) and to leave as real Z values those 3D points whose coordinates in the binary image are considered as foreground (1 value), then a new ToF distance matrix {\mathbf{Z}}^{\mathrm{\prime \prime}} is obtained. Figure 3 illustrates this method for the background and foreground 3D value assignment and selection.
3.2.2 Reprojection of 3D ToF information into a colour image
With the aim of giving additional colour information to the 3D foreground points previously detected, the reprojection of these points into a colour image was carried out. Using colour and amplitude images, both cameras are calibrated with respect to the world coordinate frame. Since both cameras can be represented by the pinhole camera model [42, 52], a tool such as the Camera Calibration Toolbox for Matlab[53] can be used to extract internal and external parameters for both cameras. External parameters are used to transform 3D ToF information given in the camera coordinate system into the world coordinate system. On the other hand, internal an external parameters are used to reproject 3D information into colour images. Hence, based on calibration camera theory [48, 54, 55] and after the range camera error reduction, the reprojection process is applied over the corrected and transformed 3D points following the transformations described below.
The transformation of ToF information after discrepancy and saturation corrections and foreground detection from world frame coordinates {\mathbf{P}}_{w}={[{X}^{\prime},{Y}^{\prime},{Z}^{\mathrm{\prime \prime}}]}^{T} to camera frame coordinates P_{ c }=[X_{ c },Y_{ c },Z_{ c }]^{T} is given by
where extrinsic parameters are expressed by the 3×3 rotation matrix R and by the 1×3 translation vector T.
Frequently, standard CCD colour cameras have a higher resolution than range cameras, so the reprojection of 3D points does not have a onetoone equivalence. Hence, the ToF information is scaled up by bilinear interpolation. In addition to this, as only information of foreground 3D points will be extracted, the automatic thresholding process is applied to the 3D points P_{ c } in order to remove those points classified as background, which results in a new 3D point cloud P c′=[X c′,Y c′,Z c′]^{T}.
Image coordinates are affected by tangential and radial distortions; therefore, the models of this systematic distortions are added to the pinhole model following the method proposed in [55]. The transformation between a threedimensional coordinate frame and the image coordinate frame without distortion (x_{ u },y_{ u }) is given by
where the intrinsic parameter f is the focal length in millimetres.
The relation between image coordinates with (x_{ d },y_{ d }) and without distortion (x_{ u },y_{ u }), considering the radial D^{(r)}, and tangential D^{(t)} distortions are defined by pinhole model as
The transformation between distorted image coordinates to pixel coordinates is given by
where the intrinsic parameters k_{ u } and k_{ v } are the number of pixels per millimetre (horizontally and vertically, respectively), s is the skew factor whose value is usually zero in most cameras, and (u_{0},v_{0}) are the coordinates of the centre of projection.
After obtaining the pixel coordinates (u,v) of the 3D foreground points, these values are adjusted into pixel values by rounding them to the nearest integer to the values obtained. Furthermore, as the captured area by both cameras (ToF and colour camera) is not exactly the same, pixels in noncommon areas are eliminated. A diagram which illustrates the proposed method is shown in Figure 4.
4 Experiments
4.1 Experimental setting
In this article, a method for the fusion of colour and 3D information that is suitable for active security systems in industrial robotic environments is presented. To verify the proposed methods, a colour camera, AXIS 205, and a range camera, SR4000, have been located over the workspace of the robot arm FANUC ARC MATE 100iBe. The AXIS 205 Network Camera used has a resolution of 640 ×480 pixels and a pixel size of 5.08 ×3.81 mm. The SR4000 range camera has a resolution of 176 ×140 pixels and a pixel size of 40 ×40 μ m. This camera has a modulation frequency of 29/20/31 Mhz and a detection range from 0.1 to 5 m.
4.2 Camera calibration
This initial stage is intended to obtain extrinsic and intrinsic parameters from the standard CCD camera and ToF camera by means of a calibration process using a common reference frame.
4.3 Reduction of distance error
To correct for any misalignment between the range camera and the experimental panel employed, the angular deviations in x and y coordinates have been estimated and their effects have been corrected. Figure 5 shows the effect of the angle and displacement corrections in a 3D point cloud. Discrepancy curves which do not take into account pixel position have been calculated before and after angular correction. These curves are shown in Figure 6. It can be seen that the distance error is a function of the measured distance, and discrepancies values show a small improvement after the angle correction, as it was expected.
To take into account the effect of pixel position in the distance error, a discrepancy curve has been generated for each pixel. These curves are tested by using the set {\mathcal{Z}}_{\mathcal{V}} as input, resulting in a correction value to be applied at each pixel. Figure 6 shows in green colour the discrepancy values after the correction with a cubic spline for each pixel.
The results indicate that the improvement achieved using a discrepancy curve at each pixel is almost imperceptible, which can be explained by the selected area being too small and centred in the middle of the image, so the influence of the pixel position is very low. To check the influence of pixel position on the distance error, a larger area has been selected from a reduced set of images taken from {\mathcal{Z}}_{\mathcal{V}}. Figure 7a shows the MSE before these corrections, while the MSE after discrepancy correction using the same discrepancy curve at each pixel is shown in Figure 7b. It can be seen that there is a reduction in the MSE over the selected area. However, the distance error is not just a function of the distance value but also it depends on the location of the pixel, as can be observed. The results obtained using a different discrepancy curve at each pixel (Figure 7c) suggest that this kind of correction leads to better results. Hence, a discrepancy correction which takes into account pixel position and distance value is suggested for future work.Since as a result of using the larger area, there is incomplete information along the whole of the distance range, a discrepancy curve which does not take into account pixel position is used in the experiment. Then, the discrepancy curve calculated after angle correction is tested by using a ToF distance image from a real scene where data from a human and a robot arm are captured and used as input, resulting in a correction value to be applied at each pixel. Figure 8a shows in red colour the discrepancy values selected to use in discrepancy correction together with the generated cubic spline showed in cyan colour. In order to verify the effect of applying the distance correction, the initial 3D point cloud and the results after applying the distance correction are shown the in Figure 8b.
4.4 Value correction of saturated pixels
A real scene in which a human and a robot arm appear is used to illustrate the proposed methods of error corrections and the fusion of 3D and colour information. The value correction of saturated pixels in ToF information captured from this real scene has been carried out. This example of the effect of saturation is illustrated in Figure 9a which shows an amplitude image in which several saturated pixels are located on an area of the robot arm. These high values do not allow the correct visualization of the scene. Figure 9b shows the effect of saturated information over 3D data where saturation produces pixels with zero coordinate values. According to [45], pixel saturation occurs when the amplitude values are greater than 20,000, so this value has been used as threshold in Equation 20. After applying the proposed method to saturated pixels using this threshold value, Figure 10a shows the improvement achieved with this correction, allowing the view of the total scene. Figure 10b shows 3D points in which pixels with zero coordinate values have been corrected.
4.5 3D analysis for detecting foreground objects and coordinate frame transformation
To illustrate the detection of foreground objects using 3D ToF information, the background subtraction method based on the reference image model has been used. Figure 11a shows 3D information with a generated reference matrix with Z values. Figure 11b shows threedimensional information resulting from subtracting the reference distance matrix from the real scene distance matrix, in which positive values indicate possible motion points. In order to take into account only foreground 3D points, an automatic thresholding process and the proposed method for the background and foreground 3D values assignment and selection have been applied using Equation 28. After that, the modified 3D points are shown in red in Figure 12, whereas the initial 3D points are shown in cyan colour. It can observed that in the modified 3D points, all background points have equal Z values. However, as the points of interest are the foreground points, the background points are not taken into account; therefore, the final scene 3D representation is not affected by those equal Z values. After coordinate frame transformation using Equation 30, and another automatic thresholding process to remove points classified as background, the result achieved in this example is shown in Figure 13, where the foreground object detected is represented in the world coordinate system.
4.6 Resolution increase
As the standard CCD camera employed provides a colour image which has higher resolution (480×640) than the 3D ToF information (176×144) provided by the range camera, the reprojection of 3D points does not have a onetoone equivalence. Then, ToF matrices dimensions have been scaled up using a bilinear interpolation and reprojected to the colour image using Equations 30 to 32.
4.7 Jump edge reduction
With the aim to compare some usual edge filters and the morphological filter used in the detecting edge jump effect, ToF information from the scene, after interpolation of 3D points, has been processed. Figure 14a shows the results achieved using a Sobel filter in distance values from 3D information, and Figure 14b shows the results obtained using the morphological filter in distance values, obtained by using Equation 23 and establishing a dilation and erosion mask S as follows
It can be observed that the edges found by applying the Sobel filter are not continuous and also are narrower than the edges found by morphological filter, so using this, most of the spurious pixels can be detected and removed from the 3D ToF points.In order to smooth the histogram of the gray scale distance image, a fistorder lowpass Butterworth filter with normalized cutoff frequency value of 0.5 is used. As an example of the application of the proposed method for the jump edge reduction, Figure 15a shows the spurious pixels produced in the object contours by the jump edge effect and Figure 15b shows the 3D points after the reduction of spurious pixels by the proposed method. Although not all spurious pixels have been eliminated, the results show a significant improvement in the reduction of this effect as most of them have been detected and eliminated.
4.8 Rreprojection of 3D foreground points into colour images
In order to obtain a matrix that contains 3D and 2D information, using the calibration parameters of the cameras, the reprojection of 3D foreground points into a colour image has been carried out. Then, the reprojected points are adjusted into pixel values and those that are in the noncommon area of both cameras are removed. A selection mask is generated by using the resulting pixels, and this mask is used to select the coincident coordinates of the colour pixels. This method makes it possible to achieve a colour segmentation based on 3D information and to have 2D and 3D information in a single matrix. Figure 16 shows foreground segmentation in the colour image based on foreground detection of 3D points in the world coordinate system.
5 Discussion
The aim of this work is to achieve the fusion of colour and 3D ToF information in order to apply it in active security tasks for industrial robotic environments, so given the coordinates of a 3D point, this fusion allows knowing colour information and 3D position in a common world coordinate system to both cameras and the robot arm, at the same time.
After obtaining intrinsic and extrinsic parameters by a calibration process, the proposed method of distance error reduction improves the distance measurement values, and the achieved effect in the scene can be observed after the application of the information fusion method. The correction curves obtained are consistent with curves reported by other authors such as [41] and also consistent with the use of cubic splines in order to approach and correct the distance error [42]. This consistence occurs despite some differences in experimental setup, such as a reduced range of measurement, different ToF camera models, target material, and camera configuration parameters.
As a second stage, saturation error correction must be performed given that in industrial environments, certain materials such as metal or reflecting paints are often present and can produce saturated pixels in the range camera information. The results obtained show that this method works well as it allows the correct visualization of the amplitude image, and more importantly, it corrects values of saturated pixels of 3D points. If these points have incorrect values, the reprojection stage would fail in these positions, as the 3D values would be reprojected as 0, and so its 2D information would be lost.
In order to detect foreground objects, the reference image technique applied to 3D data, after error corrections, has been used and presented as a simple and fast method which yields acceptable results. The 3D points of foreground objects are correctly identified and only a few false positives are detected which can be removed easily using 2D image morphological operations. Traditionally, this technique is used in colour and grey scale images, but illumination variations result in false foreground detection. The advantage of using ToF information is that it has a more stable behaviour in these illumination conditions. In addition, this technique has a short computational time, which is an important factor in order to be develop a suitable strategy for active security of robotic industrial environments. Then, using extrinsic parameters, the transformation of foreground 3D points from the camera to the world reference frame is carried out and scaled up by bilinear interpolation. The proposed method of jump edge reduction, applied to the resulting distance points, minimises false positives and false negatives around an object edge which arise in the pixel reprojection process as a consequence of the presence of spurious pixels that do not have correct 3D values. The achieved results can be considered acceptable since most spurious pixels are removed without changing the object shape, and therefore, a softer 3D point reprojection over objects edges in colour images is achieved.Finally, the reprojection of the resulting 3D points to the colour image is performed. Nevertheless, as can be seen in Figure 16, this reprojection is not perfect, since in spite of having applied distance error reduction, the position of the pixels in the image has not been taken into account, and a single correction value is applied which is a function of the measurement distance but not of the pixel position in the image.
6 Conclusions
This paper aims to contribute to the research area of active security systems in industrial robotic environments using ToF cameras.
Despite the fact that active security in robotic industrial environments is a wellstudied topic, few previously published methods have dealt with this subject using the combination ToF cameras and colour cameras. The paper describes the development of methods for the fusion of colour and 3D ToF information as an initial step in the design of a system for collision prevention between human and manipulator robot sharing a workspace at the same time. Furthermore, this work provides a detailed mathematical description of the steps involved in the proposed method, so that any researcher can implement it.
The presented method has a different standpoint from the methods previously proposed in the literature, since a common coordinate system is defined for a robot arm, colour camera and ToF camera. The obtained calibration parameters are used to transform the 3D points from the ToF camera coordinate system into the defined common coordinate system, which are reprojected in 2D colour images. This procedure has the advantage that it gives a single matrix made of colour and threedimensional information; therefore, 3D coordinates of objects inside the robot arm’s workspace are known at the same time as their colour information. In addition to this, the proposed method for jump edge error detection, which is based on morphological gradient, allows the detection and reduction of jump edge error at points which are affected by this error. Also, in order to obtain a suitable fusion of information, a method for detection and reduction of saturated pixels, which is based on neighbour pixels information, has been proposed.
As future work, in order to improve the accuracy of fused information, a modification of the applied distance correction method is suggested. A preliminary study carried out with a small range of distances shows the influence of the pixel position in the distance measurements. Hence, a suggestion for future work is to modify the error correction so that it takes into account the position of the 3D point (measured distance and pixel location).
A possible application to prevent collisions between an industrial robot and a human would be to use colour information to characterise the detected foreground objects and to associate a security volume around each object.
References
UNEEN 755: Robots manipuladores industriales. Seguridad. Ed. by AENOR. Asociacion Española de Normalizacion y Certificacion, Madrid; 1996.
ISO 102181: Robots for industrial enviroments. Safety requirements. Part 1: Robots. Ed. by ISO. International Organization for Standardization, Switzerland; 2006.
RIA TR R15.2062008: Guidelines for implementing ANS/RIA/ISO 1021812007. For industrial robots and robot systems. Safety requirements. Ed. by RIA. Robotic Industries Association, USA; 2008.
Llata JR, Sarabia EG, Arce J, Oria JP: Fuzzy controller for obstacle avoidance in robotic manipulators using ultrasonic sensors. In Advanced Motion Control, 1998. AMC ‘98Coimbra., 1998 5th International Workshop On. IEEE; 1998:647652.
Feddema JT, Novak JL: Whole arm obstacle avoidance for teleoperated robots. Robotics and Automation, 1994. Proceedings., 1994 IEEE International Conference On 1994, 330333094.
Novak JL, Feddema IT: A capacitancebased proximity sensor for whole arm obstacle avoidance. Robotics and Automation, 1992. Proceedings., 1992 IEEE International Conference On 1992, 130713142.
Yu Y, Gupta K: Sensorbased roadmaps for motion planning for articulated robots in unknown environments: some experiments with an eyeinhand system. In Intelligent Robots and Systems, 1999. IROS ‘99. Proceedings 1999, IEEE/RSJ International Conference On. Kyongju, Korea; 1999:170717143.
Puls S, Graf J, Wörn H: Cognitive Robotics in Industrial Environments. Human Machine Interaction  Getting Closer (InTech, 2012). . http://www.intechopen.com/books/humanmachineinteractiongettingcloser/cognitiveroboticsinindustrialenvironments
MartinezSalvador B, del Pobil AP, PerezFrancisco M: A hierarchy of detail for fast collision detection. In Intelligent Robots and Systems, 2000. (IROS 2000). Proceedings 2000 IEEE/RSJ International Conference On. Takamatsu, Japan; 2000:7457501.
Balan L, Bone GM: Realtime 3d collision avoidance method for safe human and robot coexistence. In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference On. Beijing, China; 2006:276282.
Corrales JA, Torres F, Candelas FA: IEEE ICRA 2010 Workshop on Multimodal HumanRobot Interfaces. Anchorage, Alaska; 2010.
Corrales JA, Candelas FA, Torres F: Safe humanrobot interaction based on dynamic sphereswept line bounding volumes. Robot. Comput. Integr. Manuf 2011, 27(1):177185. 10.1016/j.rcim.2010.07.005
Nakabo Y, Saito H, Ogure T, Jeong SH, Yamada Y: Development of a safety module for robots sharing workspace with humans. In Intelligent Robots and Systems. IROS 2009. IEEE/RSJ International Conference On. St. Louis, MO, USA; 2009:53455349.
Baerveldt AJ: Cooperation between man and robot: interface and safety. In Robot and Human Communication, 1992. Proceedings. IEEE International Workshop On. Tokyo; 1992:183187.
Kuhn S, Gecks T, Henrich D: Velocity control for safe robot guidance based on fused vision and force/torque data. In 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems. Heidelberg, Germany; 2006:485492.
Ebert DM, Henrich DD: Safe humanrobotcooperation: imagebased collision detection for industrial robots. In Intelligent Robots and Systems, 2002. IEEE/RSJ International Conference On. Lausanne, Switzerland; 2002:182618312.
Gecks T, Henrich D: Simero: camera supervised workspace for service robots. In 2nd Workshop on Advances in Service Robotics, Fraunhofer IPA. Germany; 2004.
Gecks T, Henrich D: Multicamera collision detection allowing for object occlusions. In 37th International Symposium on Robotics (ISR 2006)/4th German Conference on Robotics (Robotik 2006) VDI/VDEGesellschaft Mess und Automatisierungstechnik. München, Germany; 2006.
Henrich D, Kuhn S: Modeling intuitive behavior for safe human/robot coexistence cooperation. In Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE International Conference On. Orlando, FL; 2006:39293934.
Fevery B, Wyns B, Boullart L, Llata JR, TorreFerrero C: Industrial robot manipulator guarding using artificial vision. In Robot Vision. Edited by: Ales U. InTech, Vukovar; 2010:429454.
Bascetta L, Ferretti G, Rocco P, Ardo H, Bruyninckx H, Demeester E, Lello ED: Towards safe humanrobot interaction in robotic cells: an approach based on visual tracking and intention estimation. In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on. San Francisco, CA; 2011:29712978.
Kulic D, Croft EA: Realtime safety for human robot interaction. Robot. Autonom. Syst 2006, 54(1):112. 10.1016/j.robot.2005.10.005
Bascetta L, Magnani G, Rocco P, Migliorini R, Pelagatti M: Anticollision systems for robotic applications based on laser timeofflight sensors. In In Proceedings of the IEEE/ASME International Conference on Advanced Intelligent Mechatronics. (AIM). Montreal, Canada; 2010:278284.
Flacco F, De Luca A: Multiple depth/presence sensors: integration and optimal placement for human/robot coexistence. In Robotics and Automation (ICRA), 2010 IEEE International Conference On. Anchorage, Alaska; 2010:39163923.
Graf J, Czapiewski P, Woern H: Evaluating risk estimation methods and path planning for safe human robot cooperation. In Proceedings for the joint conference of ISR 2010 (41st Internationel Symposium on Robotics) und ROBOTIK 2010 (6th German Conference on Robotics). VDE Verlag, Munich; 2010:17.
Fischer M, Henrich D: 3d collision detection for industrial robots and unknown obstacles using multiple depth images. In Depth Images, German Workshop on Robotics. Technical University of Braunschweig. Braunschweig, Germany; 2009.
Van den Bergh M, Van Gool L: Combining rgb and tof cameras for realtime 3d hand gesture interaction. In Applications of Computer Vision (WACV), 2011 IEEE Workshop On. Kona, Hawaii; 2011:6672.
Park S, Yu S, Kim J, Kim S, Lee S: 3d hand tracking using kalman filter in depth space. EURASIP J. Adv. Sig. Proc 2012, 362012.
Bartczak B, Schiller I, Beder C, Koch R: Integration of a timeofflight camera into a mixed reality system for handling dynamic scenes, moving viewpoints and occlusions in realtime. In Proceedings of the International Symposium on 3D Data Processing, Visualization and Transmission Workshop. Georgia Institute of Technology, Atlanta, GA, USA 2008.
Kolb A, Barth E, Koch R: Tofsensors: new dimensions for realism and interactivity. In Computer Vision and Pattern Recognition Workshops, 2008. CVPRW’08. IEEE Computer Society Conference On. Anchorage, Alaska, USA; 2008:16.
Kolb A, Barth E, Koch R, Larsen R: Timeofflight cameras in computer graphics. Comput. Graph. Forum 2010, 29(1):141159. 10.1111/j.14678659.2009.01583.x
Ghobadi S, Loepprich O, Lottner O, Hartmann K, Loffeld O, Weihs W: Analysis of the personnel safety in a manmachinecooperation using 2d/3d images. In Proceedings of the EURON/IARP International Workshop on Robotics for Risky Interventions and Surveillance of the Environment. Edited by: Cervera Y, Baudoin EMR, Pender J. Benicassim – Spain; 2008:5959.
Fischer M, Henrich D: Surveillance of robots using multiple colour or depth cameras with distributed processing. In Distributed Smart Cameras, 2009. ICDSC 2009. Third ACM/IEEE International Conference On. Como, Italy; 2009:18.
Walter C, Vogel C, Elkmann N: A stationary sensor system to support manipulators for safe humanrobot interaction. In Robotics (ISR), 41st International Symposium on and 2010 6th German Conference on Robotics (ROBOTIK). Curran Associates, Inc., sMunich; 2010:16.
Lange R: 3d timeofflight distance measurement with custom solidstate image sensors in cmos/ccdtechnology. University of Siegen, Siegen, Germany; 2000.
Clayton S: Kinect for Windows SDK to Arrive Spring 2011. . Accessed 21 April 2014 http://blogs.technet.com/b/microsoft_blog/archive/2011/02/21/kinectforwindowssdktoarrivespring2011.aspx
Han J, Shao L, Xu D, Shotton J: Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans. Cybern 2013, 43(5):13181334.
Liu L, Shao L: Learning discriminative representations from rgbd video data. In Proceedings of the TwentyThird International Joint Conference on Artificial Intelligence. AAAI Press/International Joint Conferences on Artificial Intelligence, Beijing; 2013:14931500.
Flacco F, Kroger T, De A, Luca Khatib O: A depth space approach to humanrobot collision avoidance. In Robotics and Automation (ICRA), 2012 IEEE International Conference On. St Paul, MN, USA; 2012:338345.
Fuchs S: Calibration and multipath mitigation for increased accuracy of timeofflight camera measurements in robotic applications. PhD thesis. Technische Universität Berlin; 2012.
Kahlmann T: Range imaging metrology: Investigation, calibration and development. PhD thesis, Institute of Geodesy and Photogrammetry. ETH Zurich; 2007.
Lindner M, Kolb A: Lateral and depth calibration of pmddistance sensors. Ger. Res 2006, 4292(4292/2006):524533.
Chiabrando F, Chiabrando R, Piatti D, Rinaudo F: Sensors for 3d imaging: metric evaluation and calibration of a ccd/cmos timeofflight camera. Sensors 2009, 9(12):1008010096. 10.3390/s91210080
Powell MJD: Approximation Theory and Methods. Cambridge University Press, New York; 1981.
MESA Imaging AG, Zurich, Switzerland; 2008.
May S: 3D Timeofflight ranging for robotic perception in dynamic environments (Doctoral Dissertation). Dusseldorf VDIVerl. Univ. Osnabrü,ck, Germany 2009.
Robla S, Llata JR, Torre C, Sarabia EG: An approach for tracking oil slicks by using active contours on satellite images. OCEANS 2009  EUROPE 2009, 18.
Davies ER: Machine Vision, Third Edition: Theory, Algorithms, Practicalities (Signal Processing and Its Applications), 3rd edn. Elsevier, Morgan Kaufmann, University of London, UK; 2005.
Friedman N, Russell S: Image segmentation in video sequences: a probabilistic approach. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence. UAI’97, Morgan Kaufmann, San Francisco, Providence; 1997:175181.
Piccardi M: Background subtraction techniques: a review. In Systems, Man and Cybernetics, 2004 IEEE International Conference On. The Hague, The Netherlands; 2004:309931044.
Cristani M, Farenzena M, Bloisi D, Murino V: Background subtraction for automated multisensor surveillance: a comprehensive review. EURASIP J. Adv. Sig. Proc 2010, 2010: 124.
Fuchs S, May S: Calibration and registration for precise surface reconstruction with tof cameras. In International Journal of Intelligent Systems Technologies and Applications. Inderscience Publishers; 2008:274284.
Bouguet JY: Camera Calibration Toolbox for Matlab. 2000.http://www.vision.caltech.edu/bouguetj/calib_doc/ . Accessed 03 March 2010
Hartley R, Zisserman A: Multiple View Geometry in Computer Vision. Cambridge University Press, New York; 2003.
Heikkila J, Silven O: A fourstep camera calibration procedure with implicit image correction. Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference On 1997, 11061112.
Acknowledgements
This work has been supported by the Ministry of Economy and Competitiveness of the Spanish Government (project DPI201236959).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Robla, S., Llata, J.R., TorreFerrero, C. et al. Visual sensor fusion for active security in robotic industrial environments. EURASIP J. Adv. Signal Process. 2014, 88 (2014). https://doi.org/10.1186/16876180201488
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/16876180201488
Keywords
 Active security
 Industrial robot
 ToF and colour cameras
 Information fusion