Prioritized Multihypothesis Tracking by a Robot with Limited Sensing

To act intelligently in dynamic environments, mobile robots must estimate object positions using information obtained from a variety of sources. We formally describe the problem of estimating the state of objects where a robot can only task its sensors to view one object at a time. We contribute an object tracking method that generates and maintains multiple hypotheses consisting of probabilistic state estimates that are generated by the individual information sources. These di ﬀ erent hypotheses can be generated by the robot’s own prediction model and by communicating robot team members. The multiple hypotheses are often spatially disjoint and cannot simultaneously be veriﬁed by the robot’s limited sensors. Instead, the robot must decide towards which hypothesis its sensors should be tasked by evaluating each hypothesis on its likelihood of containing the object. Our contributed algorithm prioritizes the di ﬀ erent hypotheses, according to rankings set by the expected uncertainty in the object’s motion model, as well as the uncertainties in the sources of information used to track their positions. We describe the algorithm in detail and show extensive empirical results in simulation as well as experiments on actual robots that demonstrate the e ﬀ ectiveness of our approach.


Introduction
Robot perception processing consists of a mapping from sensory data to an estimate of the state of the elements of the environment that are of relevance to the task under execution.For example, a robot traversing a maze needs to estimate the area and position of open space and walls from its sensory data.Similarly in a team of soccer robots, each robot has the potential to estimate the state of the environment based on its own sensing and on the information communicated by its teammates.The complexity of state estimation greatly increases with the task, the dynamics of the environment, and the sensing capabilities of the robots.
In our work, we consider that robots have limited sensing and operate in complex and dynamic environments executing tasks that rely on multiple elements.We investigate robot state estimation as a result of the integration of sensory information obtained from a variety of sources, namely, the robot's own sensors and actions, models and communicated information from teammate robots' sensors and actuators, as well as models of the dynamics of the environment.
Concretely, we investigate the problem when robots have limited and narrow perceptual scope, such that they are only capable of observing a single object (or a reduced set of objects) at a time with their sensors.Thus, the relative size of the robot's sensor scope is small compared to the environment, and while the state of a single object is being updated by the sensors, the evolving state of all other nonsensed objects must be predicted from communicated information or from models learned from observations or provided a priori.
In addition to the complexity of the problem, not all sources of information about a single object can and should be handled equally, as in the traditional sense of weighting those estimates by their covariance.There are times when empirical evidence has proven that some modalities must be ignored as they are unreliable in certain circumstances.
Additionally, nondeterministic effects of actuators can create several distinctly different potential outcomes, each of which must be tracked and reasoned about separately.
To address this challenge, we define a method for reasoning over a disjoint hypothesis space whereby highlevel domain knowledge is used to impose a strict ordering on estimates created by different sources of information.By segmenting the sources of information used to reason about the state of environmental quantities into different classes, each with different state dynamics and expected effect of robot actions, a prioritized hierarchy of state estimates can be inferred.Additionally, when tracking multiple objects simultaneously, the evolving states of those objects must be considered carefully when deciding where to task the robot's sensors.
We describe a hybrid state estimation algorithm that attempts to reduce the complexity of the generated probability density functions over a quantity of interest by factoring the problem into a series of small estimation problems that are tied to the different sources of model world information possessed by the robot.A high-level policy is used to determine where to task the robot's sensors to best track the objects in the environment.Such policies for creating hierarchies can be defined a priori, or they could potentially be learned from data.Using this policy, the decision process that governs each individual robot's actions can easily select the most informative state estimate to use as its input.The priorities are set by the expected uncertainty in the object's motion model as well as the uncertainties in the sources of information used to track their positions.Robot's actions directly affect its perception of the environment as well as the environment itself, and the best estimate is often one that will allow the robot to obtain more information about its surroundings to further clarify its estimate of quantities of interest.This, in turn, provides more information to the robot that further updates the ordered hierarchy of possible estimates.
This paper describes an active state estimation algorithm, as applied to a real-time adversarial multirobot domain, which combines action policies determined from highlevel domain knowledge with multimodal probabilistic state estimators.In this work, we assume that each of the objects that are detected and tracked have unique sensor signatures whereby the additional complexity of the data association problem can be avoided so that we instead focus on analyzing the multiple hypothesis reasoning algorithms.Thus, we contribute an algorithm to address the problem of tracking a single object with multiple hypotheses.We have successfully applied this approach to the RoboCup Four-Legged league where a team of Sony AIBO robots autonomously play soccer against another team of AIBO robots, as shown in Figure 1.

Related Work
We discuss some related work along the three main aspects of our work: (i) probabilistic state estimation; (ii) object tracking; and (iii) reasoning about multiple hypotheses from multiple sensing sources.Most probabilistic estimation techniques follow a Bayesian filtering approach [1] and have been successfully applied to robot state estimation (e.g., [2]).Object tracking using a Bayesian filter formalism relies on an a priori model of the object's motion that allows the algorithm to predict the object motion given noisy observations.One of the most widely used methods for state estimation is the Kalman filter [3], in which the system model is assumed linear and the noise is assumed Gaussian.When the linearity assumption becomes a limitation, the dimension of the state vector can be changed as the tracked object changes its perceived dynamics, such as with a variable state dimension (VSD) filter [4].We also consider the object dynamics, but our approach changes the number of hypotheses, while the specific dimensions of those hypotheses' estimates do not change.Furthermore, we maintain multiple hypotheses independently as potential object locations.
An approach to reasoning about a complex motion model consists of maintaining multiple models.The interacting multiple model (IMM) filter [5] uses a weighted mixture of different process models.Our approach differs in that it maintains a disjoint set of hypotheses which are not merged or fused [6], but are prioritized and visited according to a specific policy.Similar approaches maintain separate estimations based on subjective sensing and other sources (e.g., sensing from robot teammates [7,8]).
A more general approach is the Switching Kalman filter model [9], which represents multiple independent system state dynamics models and switches between them (or linearly combines them) to best fit the observed (or predicted) nonlinear dynamics of the system being modeled.Our approach creates multiple independent belief states (or hypotheses) rather than a single state with multiple potential models.
A multiple hypothesis tracking (MHT) [10,11] approach uses multiple independent state estimators to estimate a multimodal probability density.This approach has been used successfully for challenging mobile robot localization problems [12], where nonparametric distributions are estimated through sampling techniques, such as the particle filtering [13].The number of particles used can be dynamically adjusted as computational resources become available or are needed elsewhere [14].Approaches that factor in a joint state estimation have been used successfully for tracking an object with a mobile robot [15], where the actions of the robot change the process characteristics of the tracked object.Our approach extends the MHT paradigm by reasoning about the different hypotheses as a function of the source of information that generated them.
Finally, object tracking is a complex problem addressed by different approaches that capture connected dynamics of the multiple objects.We address the problem of object tracking from a different perspective, namely, in terms similar to that of sensor planning [16].Sensor (or actuator) planning generally requires that a policy be determined over a state space which dictates the appropriate action to take based on the state of the world and the robot.In [17], the reinforcement learning is used to find the a policy that avoids the problems of a state space explosion as well as the problems associated with missing sensor information.Our problem is defined over a continuous state space (e.g., the space of tracked object poses) whereby the effects of various actions are difficult to quantize into a state space on which a policy could be learned.In [18], a dynamic programming algorithm is proposed by which a static policy state over the entire field is determined, which dictates when the robot should stop its body and localize itself.In our model, the actions of the objects being tracked in the world are highly dynamic and are unlikely to be captured in a single policy over the entire space of poses.In [19,20], the mechanisms for attention control are proposed that use expected information gain and cost to acquire the information as criteria to determine what the robot should do and when.A decision tree is learned to represent the policy of the robot.Our approach uses similar criteria to determine what the robot should do and when though in this work we do not discuss mechanisms for how the knowledge is obtained (e.g., learned offline or hand-coded) but rather focus on the utility of using such concepts and applying them to probabilistic state estimators that operate over a continuous state space.
We consider that the robot has a narrow sensor scope incapable of capturing more than one object at a time.Our algorithm includes a policy for directing the sensor machinery toward multiple objects.Furthermore, we consider different types of objects with different motion models which are used to update the confidence on the state estimation of each individual object.

Challenges of Dynamic World Modeling
We are interested in problems associated with having a robot autonomously build and maintain accurate world models in dynamic environments where the states of many objects must be estimated simultaneously.A robot will be able to make use of multiple sources of information that can describe the motion of objects in the environment.In any environment of reasonable complexity, a robot is incapable of viewing the entire environment at a single time with its sensors.In the extreme case, the robot can only track a single object at a time with its sensors.Figure 2 illustrates the general class of world modeling issues addressed in this work.We are primarily concerned with the issues involved with object tracking rather than issues involved with the complementary field of map building which is not part of our discussion.We consider the challenges of tracking multiple objects, where each object has multiple sources of sensor and model information that are available as a combined problem.In this work, we do not address the additional complexity of the data association problem where multiple objects have identical or ambiguous sensor signatures.In order to keep track of the positions of all objects in the environment, the robot must continually retask its sensors to refresh the models with more accurate position data.Deciding which object to track next is dependent on the expected uncertainty in the motion model for that object as well as the availability and quality of the different sources of information that can provide estimates for the expected position of the object.
To formally describe the problem, we define the following concepts: A: the set of all actions, a(t) ∈ A, including the null action, that the robot is capable of performing at time t; O: the set of all objects in the environment where O j is the jth object of which the robot must keep track, the set includes moving objects of which the robot must maintain an accurate estimate as well as stationary objects with which the robot must maintain periodic contact (such as landmarks for localization); X Oj (t): the estimated position of object O j at time t; L Oj (t): a sensor observation of object O j at time t which can be null in the case that the robot does not perceive object O j ; M Oj : the motion model as a function over objects and robot actions, it defines the expected change in object position over time regardless of whether the robot has obtained a sensor observation: M Oj : X Oj (t), a(t) −→ X Oj (t + 1); (1)

S r
Oj : the sensor model of robot r as a function over objects and sensor observations.Defines the updated object position at the current time: note that in the case of no observations of object O j at time t, the output position is identical to the input position: E rα Oj : a sensor observation from an external source, such as another robot r α : where r α / = r.
For the problems in which we are interested, we identify three different classes of objects that have distinct motion model dynamics.
(1) Static: objects that do not move on their own, such as goal markers and landmarks used for localization.
Even though these landmarks do not move, the robot's own position estimate with respect to these objects can be uncertain.
(2) Quasidynamic: objects which do not move on their own, but which move by being manipulated or pushed by a robot.The motion model for this kind of object encapsulates the actuation dynamics from manipulation when the robot manipulates it, but also must take into account that the object can move unexpectedly when another robot makes contact with it.
(3) Dynamic: objects, such as other robots, which can move under their own power and control.The internal state of these robots is unobservable and their motion can be difficult to predict.

Object Tracking
For any environment of interest, the robot's sensors will not have the capability to view all aspects of an environment at the same time.Thus, the robot must change the direction that its sensors are pointing in order to continually update its world model with new readings of the objects that it is tracking.In the most difficult case, the robot can only track a single object at a time and must predict the positions of the other objects with their motion models.The longer an object is not visible, the less accurate the robot's model will be due to noise and unmodeled dynamic changes in the object's motion.Deciding how and when to retask the robot's sensors depends highly on the objects being tracked as well as the environment in which they exist.Our solution is to define a policy over all objects that describes when the robot should point its camera from one object to the next.A formal description follows: A L : a subset of actions A which cause the robot to change the angle of its sensors in order to gather a new observation of an object; π ob : a policy over the position of a set of objects −→ X Oj that decides which object the robot should track next: π ob takes as input the vector of all estimated object positions X O and computes the best action a j ∈ A L (possibly NULL if no best action exists) that moves the robot's sensors to track an object O j .
Two functions for π ob are considered in this work as follows.
(1) Naïve: takes no notion of object uncertainty into account, and cycles robot's sensor between all objects equally.
(2) Greedy: selects the object with the greatest uncertainty to track.Expected uncertainty is derived from the motion models for the object.
The rest of this work describes the instantiation of these concepts into a set of algorithms and analyzes their performance in simulation and on real robots.

Prioritized Multihypothesis Model Tracking
To effectively estimate the state of objects in the environment, sensor observations S r Oj must be obtained which provide some update as to the position of the object.In the absence of good sensor readings, models M Oj of the expected motion of the objects must be used to predict the change in the object's state.In all but the most degenerate cases, such models will not be able to completely describe the motion of the object.Noise and unexpected changes in the dynamics of the object will cause the robot's estimate to rapidly diverge from the object's true position.
Multiple sources of information exist that a robot can use to search for an object that is not visible in its sensors.Each source represents a potential hypothesis on the location of the object.For example, nondeterministic effects of actuators can create several distinctly different potential outcomes, each of which should be tracked and reasoned about separately.Similarly, teammates may provide some information about the state of an object, but the quality of this information could be quite poor if the position estimate of the teammates is erroneous due to localization errors.
Our approach to the problem of object estimation, where multiple sources of information about the objects are available, is to define a policy over the set of objects which prioritizes when and how the robot should task its sensors based on the kinds of objects being tracked, information returned from sensors about the object, and a priori models of how the robot interacts with the object (such as with its actuators): h i : a hypothesis over the location of object O j , each hypothesis is defined as h i : P(X Oj ), where P(X Oj ) is a probability distribution over X Oj consisting of the pose and uncertainty; H Oj : the set of hypotheses h i that represent the set of possible locations for object O j ; π j mh : a policy for a particular object O j which describes a ranking of the different hypotheses h i that could exist at any given time for it; f mh : a function over the sources of information that can be used to predict object O j 's location at time t: where X Oj (t) is the most highly ranked pose given the set of available hypotheses, and E rα Oj , . . . is the set of all available observations from teammates.Algorithm 1 illustrates the hypothesis ranking function.
The following list illustrates how information returned from the robot's sensors, model information from actuators, and teammate observations can be ranked as follows: (1) robot's own sensors, (2) successful actuation, (3) failed actuation, (4) teammate observations.In this case, the robot's own sensors, such as a camera, are trusted over all other sources of information.All other sources of information are not sensed directly but are instead obtained indirectly through models and teammate information.Actuation is assumed to be done blindly where the contact with the object is invisible to the robot's cameras.Actuator success is assumed over failure.Finally, because of the possibility of poor self-localization, teammate information is listed to only when no other sources of information are available.
Because all hypotheses are represented as probability distributions, their state can be estimated with appropriate probabilistic tracking algorithms, such as the Kalman Filter, Particle Filter, or other Bayesian filter-based approaches.

World Modeling in a Multiagent Dynamic Adversarial Domain
In the RoboCup Four-Legged league, two teams of four Sony AIBO robots autonomously play soccer against one another.While robots on the same team use 802.11b wireless Ethernet to communicate with each other, no additional offboard computation is allowed.A deployed team becomes a distributed sensor processing network.Several sources of information are available to each robot that allow it to build a model of its environment, including its sensors, kinematic models of its own body and actuators, as well as information communicated to it from its teammates.Additionally, each team possesses an a priori map of the field which gives the locations of the markers, goals, and field lines in a global reference frame.Some details of how the robot visually segments the world using its camera to detect and track objects, how the robot localizes its position in the world based on visually-identified landmarks, and the details of some of the control algorithms that determine the robot's general behavior are described here [21,22].Visual observations of fixed environmental features are used to localize the robot on the field using a particle filter localization called sensor resetting localization (SRL) [23].All sensing, world modeling, and behavior selection is performed at 30Hz which is the frame rate of the robot's camera.Our robots are programmed to operate as a team where each member has a different role that dictates its behavior [24].For the experiments in this paper, the robot takes on the role of an "attacker" whose job is to head straight for the ball, intercept it, and then dribble/kick it up field toward the opposing goal.Please see our prior work for details on the behavior strategy.
In the RoboCup domain, knowing the location of the ball at all times is critical for successful play.The problem of knowing the ball's position is a challenging combination of active search and tracking.A number of specific factors serve to confound the modeling problem.Each of these factors contributes a quantity of error that introduces noise that must be contended with.Unfortunately, the full extent of some of the noise factors is extremely difficult to model.These factors include the following: -Inaccurate Sensing: each robot is equipped with a color digital camera, located in the front of its head that it uses to perceive the world.Because the robot is very low to the ground, its view can very easily and quickly be occluded by opponents and/or teammates.When the robot is actively tracking the ball, it is typically unable to localize as often, which contributes to pose uncertainty error.
-Interactions between the Robot and Target: the fourlegged chassis of the AIBO gives it a wide variety of motions that it can use to manipulate objects such as the ball.However, due to slippage of the joints and variability of the initial starting positions of robot and ball, the effects of these actions can vary considerably.Specifically, the effect of an action can have a single successful mode and multiple independent failure modes, each of which has its own dynamic characteristics.-Interactions between the Robot and environment: the four-legged chassis is also a large source of odometric noise as the complex physics of how the robots limbs strike the ground coupled with the fact that the robot is typically jostled heavily during game play means that the robot's confidence in its own position can very quickly become erroneous even if it had very recently correctly localized itself.
-Erroneous Information from Teammates: using their wireless Ethernet, the robots can share local observations made about the environment with their teammates.Because the robots do not have a centralized server, they do not have a method of synchronizing their internal clocks.The lack of accurate timestamps on observations makes fusion of the sensor data much more challenging.Because of the positional uncertainty, global positions of objects reported by teammates can very easily be erroneous if the robot's position or (more importantly) its heading are estimated badly.This source of information is very likely the most problematic as a teammate can broadcast a very tight and accurate covariance estimate even though it has become very poorly localized due to an undetected collision with an opponent.These errors are highly nonlinear as errors in robot orientation contribute greatly to errors in reported ball pose.
Attempting to reason effectively about each of these sources of error directly can be very challenging and difficult to do precisely.Because the robot's actions directly affect its position in the environment and the position of the ball, as well as the amount of information that the robot can obtain from its sensors, the algorithm for selecting the correct action to perform at a given time is extremely important.

Prioritized Multiple Hypothesis Object Tracking.
In our group's long experience with the RoboCup legged league, we have observed many effects of noise on our robots that are caused by such a dynamic environment.In particular, we have identified a number of places where more abstract knowledge about the high-level domain can be helpful when estimating the position of the ball on the field.
(1) Occlusions occur enough that the ball can often be in the robot's visual field even though it is behind another robot.Persistence in searching for the ball in an area last believed to be its location is preferable to immediately giving up the search and looking elsewhere on the field.Thus, the actions performed by the robot are highly dependent upon the source of information used to generate the hypothesis being tracked.
(2) Ball estimates returned from teammates are never as accurate as the robot's own estimates.Our team uses a dual world model [8] where the robot's own perceptions build a model which is kept independent of the model built from its teammate's perceptions.
Our approach tags the contribution of each source of information to the state estimate.This allows additional information, such as the utility of the source of data on the estimate, to play a factor in the decision processes that the robot makes when solving its task.Concretely, the state vector to be estimated is segmented into a set of parallel and independent hypotheses, each of which represents a probability distribution over the state vector.These individual estimates are maintained in parallel and an external decision process chooses which ones to ignore and which one (or ones) to use.

RoboCup Hypothesis Selection Policy.
In order to incorporate domain-specific data into the estimation algorithm that can be used in hypothesis ranking and persistence, we define a specific policy for reasoning about the specific sources of information.In the RoboCup Four-Legged league, there are multiple sources of information that must be accounted for when tracking the ball.The individual sources of information are used to generate a disjoint hypothesis space which is filtered for the most relevant information.The different sources include the following: -Vision: the robot's own camera is the most reliable source of information that allows the robot to compute the ball's position by itself.
-Game Manager: when the ball goes out of bounds, it is immediately replaced in a fixed location depending on the offending team and the quadrant of the field where the ball went out.When the game manager reports a throw-in, the robot can be sure that the ball has jumped to a new location, as the referee will have moved it.
-Actuation: kicks are performed blindly as the ball is usually under the robot's camera.A large library of predefined kicking motions is available to each robot on the team.The robot is typically unable to visually track the ball during a kick because the ball is under the chin and behind the camera when the kick is initiated.Models of the predicted position of the ball after the kick are learned empirically in the lab [25] and used by the estimator to reacquire the ball after a kick has been performed.Because kicks are not always successful due to noise in the interaction between the robot, ball, and the rest of the environment, we distinguish between two effects of actuation, namely: kick success and kick failure.
-Teammate: teammate ball information is typically the worst source because, while their tracked local information is accurate with respect to their local reference frame, the global position can be erroneous if they are mis-localized.
The different sources of information are ranked in order of expected quality and are used to guide the behaviors to search for the ball and track it when found.The hypothesis database encodes all of the relevant domain knowledge that is necessary for segmenting the hypothesis space into the relevant subsets so that the robot can use this information to act effectively.The specific policy defined for our AIBO team for deciding which source of information to use in the hypotheses returned from the estimator is summarized in Algorithm 2.

Empirical Evaluation
Estimating the state of quantities in an environment is typically done through the generation of a complex probability density function.In typical real-world problems of interest, the density functions are typically highly multimodal and can rapidly diverge from the true estimate due to noise.Our hybrid approach to state estimation factors the complete probability density function into smaller subproblems based on the a priori policy function over the problem.We have applied this approach to the challenge of robot soccer in the RoboCup Four-Legged league by first making use of a robust probabilistic algorithm for solving the underlying state estimation problem of simultaneous self-localization and tracking of the ball.Our hypothesis selection algorithm then maintains multiple independent state estimators which are created, updated, or deleted as the robot interacts with its environment and gains information from its sensors and teammates.

1D Simulation Study.
To analyze the prioritized multihypothesis object tracking algorithm described in this paper in a statistically significant fashion, a simple onedimensional version of the tracking problem is implemented in simulation.The simulation contains the following elements: A robot capable of self-locomotion, manipulation (pushing) of object, and tracking different objects one at a time.The robot uses a Kalman Filter for tracking the multiple objects.
Several objects that exhibit stochastic motion models that can be classified as static, quasidynamic, and dynamic.The class of motion to which each object belongs is known to the robot.All object motion is described by a noisy linear dynamical system.One or more "teammate" robots that can provide their own observations of objects to the primary robot.These teammates do not manipulate any objects or affect the environment in any way.Because of localization errors, the reported objects positions may be erroneous.
The task for the robot is to track the positions of all of the objects as closely as possible.
7.1.1.Object Tracking.Before introducing the concept of tracking multiple hypotheses per object, the utility of the described approach for deciding when to track a specific object is evaluated.In this experiment, three different objects with increasingly dynamic motion models are simulated.The robot's task is to maintain a good estimate of each of the three even though it is able to observe only a single object at a time.The robot is not to manipulate any objects and no teammates are present to assist the robot in the tracking problem.As before, it is assumed that each object is uniquely identifiable from the robot's sensors so that there is no data association problem.
The performance of the naïve tracking policy is compared against the greedy tracking policy.In the naïve case, the robot gives equal time to tracking each object regardless of that object's motion model and associated uncertainty.Uncertainty in this work is the covariance associated with the error in the estimated position of the object.For as long as the object is unobserved, the uncertainty of that object's covariance will increase.As the greedy case, the robot tracks the object with the greatest uncertainty at the time.
The simulation is run for 500 trials of 500 timesteps each.The sums of the errors between estimated position and ground truth across all three objects are computed.The average error across all trials for the greedy policy (μ = 19.617,σ = 1.584) is less than the average error across all trials for the naïve policy (μ = 22.300, σ = 2.141).This result is statistically significant (one-tailed, twosample t-test).Figure 3 illustrates an example run of the simulation with three objects being tracked using the greedy policy.

Multihypothesis Object Tracking.
In this simulation, the robot's task is again to track three objects, but additionally, it must also move up to a quasistatic object and manipulate (push) it.After manipulation, the robot must reacquire sensor contact with that object.Several hypotheses are generated after each manipulation: one which reflects a successful manipulation of the object, a second which reflects a failed manipulation, and a third which is the teammate estimate.The simulation is set up such that the actuation succeeds 90% of the time but fails 10% of the time.The physical modeling of the actuation is also corrupted by random noise.The teammate's localization estimate is corrupted by random noise as well as an offset bias which is randomized between trials to reflect the uncertainty in a teammate's localization.Because of the localization error, the positions reported by the teammate are nearly always worse than the robot's own estimates.
Several different multihypothesis tracking policies (shown in Table 1), which describe the order in which the hypotheses are visited by the robot's sensors, are evaluated as part of this experiment.Once again, the naïve and greedy object tracking policies are evaluated as part of this experiment.Figure 4 illustrates a sample run of the robot chasing the object it must actuate (for clarity, the other two objects are not shown).The simulation is run for 10,000 trials of 500 timesteps each.The sums of the errors between the estimated position and ground truth for all three objects As expected, the policy configuration that performs the best over all other policy configurations is the greedy object tracking policy with hypothesis policy 1 (see Table 1 for an explanation).These results are statically significant over all other policy configurations (one-tailed, two-sample ttest).
The best hypothesis selection policy is the one which most closely matches the physics of the true environment.However, if the robot were to possess a damaged actuator which caused the actuation effect to fail more often, or if the teammate could be assured to be well localized (such as a stationary teammate), policies 2 or 3 (respectively) would most likely be the superior choices.Thus, the selection of  the specific hypothesis policy must be done with care after the robot's performance in its chosen environment has been observed and carefully measured.sensor noise is also simulated with models measured from our real-world robots.Figure 5 shows a typical view from the simulator.

2D Simulation
To systematically evaluate the effectiveness of the use of a high-level hypothesis policy to factor a probabilistic state estimation problem into a more tractable form, several hundred robotic runs were performed with our simulation package.The underlying estimation algorithm used in this study was a Rao-Blackwellized particle filter (RBPF), similar to the algorithms reported in [15,26], where the state of the ball as well as motion model of the ball is stochastically sampled based on the expected activities of the robot and its sensor information.
In these experiments, single robot kicks the ball between several different waypoints on the field.The robot's sensor readings as well as the actuation models are stochastically corrupted with noise.We compared the performance of the RBPF which estimated the full state of the ball and motion model against our hybrid approach where each motion model is given its own independent state estimate (also using a RBPF for each) and the robot chooses which model to track based on its actions and expected performances therein.
The ground truth of the ball's position on the field was recorded and compared to the robot's current estimate.After several hundred experiments, we found that the hybrid policy estimator outperformed the single estimator in a statistically significant fashion (0.732 m error on average for the single RBPF versus 0.592 m on average for the policy selection algorithm).We note that the parameters for all of the RBPFs were kept the same for both experiments.Figure 6 illustrates an example estimate from both approaches.Note in Figure 6(b) how the density from the policy selection algorithm is focused mainly around the areas from the expected outcomes of the actions where the density of the particles in Figure 6(a) is more spread out.
A series of simulation experiments were performed to evaluate the effectiveness of the policy selection algorithm on a variety of different environment.In this study, a single robot was required to find the soccer ball on the field, and manipulate it with its kicking mechanism through a series of waypoints.As with the 1D simulation study, a number of different policies were evaluated for where the robot should task its sensors in order to find the ball when it was not in view of the robot's sensors.Table 3 illustrates the list of different policies.
In the simulation study, four different environmental cases were studied.These included several different environmental cases that we have observed in real RoboCup soccer matches as follows.The kick success is directly affected by the state of the environment whereby the texture, friction, and dampening of the soccer field will directly affect how well the kicking action works.The effects of teammate localization are also heavily dependent on the state of the lighting in the environment.
Each trial of the simulation consisted of the robot approaching, grabbing, and kicking the ball such that it could manipulate it through a series of waypoints on the field continuously for 10 minutes.A stationary teammate robot tracked the ball and relayed its observations when the ball was in view.The results were evaluated on how well the kicking robot's state estimate matched the global ground truth of the world and by how much time the robot actually had the ball in its view.The results are summarized in Table 4 for the error in the robot's ball estimates and Table 5 for the amount of time that the robot had the ball in view of its sensors.
In general, the appropriate policies performed well in the environmental conditions where they were placed.We did not expect that the outcomes of the different policies would have the same ranking in performance for both the average error in the estimated ball position as well as the average time that the ball was visible in the robot's sensors.However, when looking at the results, it can be seen that the rankings of the 3 policies are the same for both metrics.
In cases A and B, the teammate robot was unable to localize itself very well, and as a result, the policy that made use of that information first performed the most poorly in those cases.However, in cases C and D, the opposite was true.The challenge must be faced by any team of robots is to decide when to trust the information returned by their teammates.Teammate information, particularly in the RoboCup environment where individual robots are crowded and jostled by opponents, can very easily be corrupted without the teammate being aware of it until it attempted to relocalize itself.When the teammate's position is corrupted with error, any information about tracked objects that are converted from the robot's egocentric coordinate systems to a global coordinate system will also be corrupted.This is a very serious problem because in addition to translational error, any error in the orientation will generate a significant additional error in the global pose of the object.
The only results that were not statistically significant were the times that the ball was visible in case B for policies 1 and 3. Case B was probably the hardest for the robot because its kicking actions were the most likely to fail and the teammate's reported ball position was very error-prone.Thus, if the robot did not first look to the kick failure hypothesis first, it would spend a lot of time chasing phantoms in either of those two cases.

Real-World Study.
We have implemented our hybrid policy selection algorithm on our AIBO RoboCup team where the robots and algorithm have performed (and won) in competition.In the AIBO implementation, the underlying probabilistic state estimation algorithm for tracking the ball is a simplified multihypothesis tracker using an extended  Kalman filter.The deciding factor for the choice of this estimator was the need for computational efficiency on a very limited CPU budget.Other algorithms, such as the computer vision and self-localization, require a large percentage of the available computation as well.Our hybrid hypothesis selection algorithm was implemented as described in the previous section.Each hypothesis estimate is allowed one or two Kalman filters (merging or splitting as needed).An example of the hypothesis selection policy, as implemented on our AIBO robots, is illustrated on a simple example in Figure 7.The Kalman filter [3] is a Bayesian filtering algorithm which estimates the state of a system by modeling the process and sensor noise with zero-mean univariate Gaussian distributions.The Kalman filter estimates a quantity with a propagation step whereby the predicted state of the system is computed according to a dynamics model, and a sensor update step, where a (noisy) sensor reading model corrects the predicted state.In both steps, the state estimate and the uncertainty associated with the state are updated.However, a shortcoming of the basic Kalman filter algorithm is that it assumes that all of the noise models can be estimated using white Gaussian noise.Additionally, the final state and uncertainty estimate are also represented as a single Gaussian distribution.Thus, our approach uses a variation on the multiple hypothesis tracker (MHT) [10] Kalman filter algorithm where a multimodal probability density is estimated by a bank of Kalman filters.
Interestingly enough, the deterministic approximation to the state estimation problem solved by the MHT paradigm can be considered analogous to approximate inference methods for performing stochastic inference in switching Kalman filter models via a Rao-Blackwellized particle filter [27].At this time, it is not clear whether one approach is superior to the other.For efficiency purposes, the AIBOs use the Kalman filter to generate a probabilistic estimate for the position of the ball.Particle filters typically require greater computational power due to the large number of samples that must be maintained and updated.While significant for a robot its size, the AIBO's onboard computer, a 600 MHz MIPS processor, must handle a great deal of additional processing, such as vision, localization (already using a particle filter), and kinematics.Computational issues of the robot aside, we assert that our proposed hypothesis selection algorithm is independent of the particular representation used for the state estimates.Instead of Kalman filters for each estimate, independent sets of particle filters could be used to represent the different hypothesis classes.By keeping them disjoint, the robot can select the appropriate hypothesis to explore using the proposed algorithm.
Each element of the disjoint state estimate is represented using a bank of L Kalman filters.In this way, a multimodal estimate generated by multiple potentially conflicting or ambiguous sensor readings can be maintained until additional sensor information removes one or more hypotheses that are inconsistent with new sensor data.When new sensor data arrives, a gating function is used to determine   which filter should be updated with the new information.
If no hypothesis matches the data, a new hypothesis will be initialized.All hypotheses have an uncertainty model which is represented as a covariance matrix P. As per the propagation algorithm, the uncertainty of the covariance matrix will continuously grow if there is no sensor data.Eventually, a check is performed to determine whether the covariance of the estimate has grown too large to be practical.
In this case, a particular filter is no longer informative (essentially a uniform density distribution) and is removed from consideration.The sources of information that feed into this estimator can have distinctly different process models which describe how quickly the uncertainty grows in the model.Our approach makes use of this in order to exploit both positive and negative information returned from the sensors to adapt the process noise of the estimates.When the tracked object is observed by the sensors, the process noise is set to a model which best describes the dynamics of that object.Specific estimate process noise is based on the following states and is ranked from lowest (1) to highest (5) as folows: (1) Visible.The estimate is being actively observed and tracked with the camera.(2) Possession.The estimate is not seen, but the robot believes that the object is under its chin and can be manipulated.(3) Not in camera view.The estimate is not within the expected field of view of the camera as the robot's sensors have been directed elsewhere.(4) In camera view but occluded.The estimate is expected to be visible in the calculated camera view but currently is not.However, occluding objects (such as other robots) are also present in the image, so the object could still be present.(5) In camera view but not visible.The estimate is expected to be visible in the calculated camera view but it is not.No additional occluding objects are present.
As the process noise increases, the estimate uncertainty will increase and decrease the likelihood that it will be selected as the next hypothesis to explore by the robot.Thus, when the sensors view an area where the tracked object is expected, but no readings are found, the process noise increases drastically to reflect the notion that the object has moved.
The specific notation for these algorithms is described in Table 6.The propagation algorithm for our disjoint multiple hypothesis tracker is shown in Algorithm 3, and the sensor update algorithm is shown in Algorithm 4.
Directly evaluating this algorithm on real robots is much more difficult due to the challenge of obtaining the ground truth of the ball and the robot in the environment.However, we have conducted controlled experiments where we have measured the time that it takes for the robot to maintain visual contact with the ball with the policy algorithm versus a straight estimator with a naïve search.The mean times for visually reacquiring the ball after losing track of it are statistically significant on the order of several seconds.This time to reacquire the ball is even more significant when dealing with reported teammate estimates.Due to the difficulty of localizing the robot in the dynamic RoboCup environment, teammates can potentially broadcast very inaccurate information.In actual competition games where the teammate information was given higher priority, the robots tended to be more lost than in games when they used their own models first before listening to teammates.
Figure 8 illustrates how the multiple disjoint hypothesis tracking algorithm step through the different hypothesis classes in an attempt to drive the robot towards the correct estimated ball position.In this example, two AIBOs are tracking two different balls on the field.The AIBO in the center is actively attempting to score a goal with its ball.The stationary AIBO in the upper right corner of the field tracks a ball that is occluded from the first AIBO.The moving AIBO continuously receives a global position estimate for the ball from the stationary one.
In Figure 8(a), an AIBO observes the ball on and moves toward it in an attempt to kick it into the goal.In Figure 8(b), the robot performs a side kick that uses its head and the current hypothesis changes to the class of kicks and is split into two cases.The first case is the success case, which models the kinematics of the kick and predicts the motion.The second case is a failure case which models the situation where the AIBO failed to kick the ball.The kick success case is initially higher priority, and so the robot attempts to track its position.The kick success hypothesis estimates the ball's new position at each timestep by modeling the velocity of the ball after the kick, as shown in Figure 8(c).Because the robot missed the ball, the successful kick hypothesis is not valid and when the robot aims its camera toward it, no ball is observed.This negative information greatly increases the process noise of the kick success model and the uncertainty grows quickly, as shown in Figure 8(d).The success hypothesis quickly expires and the robot brings its attention to the kick failure hypothesis.As shown in Figure 8(e), once again, because the ball is not there, the negative information causes the uncertainty of that hypothesis to grow until it expires as well.In Figure 8(f), the kick failure hypothesis also expires and the robot finally uses the teammate observations to direct its motion to the upper corner of the field.
We have conducted controlled experiments on the real robots where we have measured the time that it takes for the robot to maintain visual contact with the ball with our proposed search policy algorithm versus a standard MHT-EKF state estimator coupled with a naïve search.We have run a set of experiments to compare the performance of the two different estimators.The naïve search is considered to be a policy where the robot always tracks the estimated position of the ball assuming a successful kick.In both cases, after the tracked hypothesis expires due to exceeding an uncertainty threshold, the robot will revert back to a more expensive generic ball search which constitutes spinning in place while scanning its camera to exhaustively search at different distances.In a real game situation, minimizing the time to find the ball is critical as the longer the robots search for the ball, the greater the chance that the other team will find and control the ball.
In these experiments, similar to the simulation experiments, the robots were required to locate the ball and kick it to a specific position on the field.This emulates normal game behavior where the robots will attempt to move the  ball up the field (and potentially near teammates).The time between when each kick was performed and the ball was reacquired by the vision system was recorded.The time to relocate the ball in the AIBO's camera image is used as the performance metric rather than the position difference in the robot's estimate and the real-world estimate for several reasons.First, the soccer ball used by the Four-legged league is a hollow plastic ball which, due to nonuniformities in its casting, will often roll in a very nonlinear fashion at low speeds.In our prior work with kick modeling, we observed that immediately after a successful kick, the ball will travel along a straight line until it slows down due to friction.At a certain speed, the ball will often curve away from the expected linear position.We found that if the ball is successfully kicked, it will travel far enough from the robot that even if its trajectory moves in a curve, the ball will still be visible in the robot's field of view.This is because when the robot aims its camera toward a specific hypothesis for analysis, the camera is cast along the entire area of the uncertainty in the estimated position.In contrast, if a kick fails, the ball will often times roll to the side of the robot or sometimes behind it.In order to reacquire it, the robot will have to perform a search behavior which requires that it spins in place and casts its camera around the local area.Thus, rather than using the estimated position as a metric, we feel that a more practical metric is the amount of time required for the robot to actually reacquire the ball in its camera.Once this has been accomplished, the robot can move straight for the ball as well as tell its teammates where the ball can be found.
A set of ten experiments was performed where the times to find the ball after the first ten successful kicks as well as the first ten failed kicks were recorded.Kicks can fail due to misalignment of the ball to the robot's head and legs.In a game situation, this happens very frequently due to the robot being jostled by opponent robots.The results are illustrated in Table 7.
On an empty field, the kicks fail approximately 10% of the time.However, as mentioned previously, when jostled in a real game situation, this kick failure can be in excess of 50% and is often much higher in crowded situations.Thus, it is very important that the robots reason effectively about the potential outcomes of their actions at a high level in order to more rapidly reacquire the target.

EURASIP Journal on Advances in Signal Processing
This time to reacquire the ball can be even significant when dealing with reported teammate estimates.Due to the difficulty of localizing the robot in the dynamic RoboCup environment, teammates can potentially broadcast very inaccurate information.In actual competition games where the teammate information was given higher priority, the robots tended to be more lost than in games when they used their own models first before listening to teammates.Because the robots have no sense of touch, if they are knocked off course due to a collision, they do not know that their localization estimate has become inaccurate until they attempt to relocalize themselves based on the nearly markers.However, as all robot are attempting to track the position of the ball as much as possible, teammates can potentially transmit very poor ball information for long periods of time.

Summary and Conclusions
In this paper, we describe an approach for multihypothesis state estimation in a dynamic environment where a robot must contend with uncertain sensor readings and incomplete models of objects.We formally describe the problem of tracking objects with multiple hypotheses based on models and other information sources.Specifically, we have addressed the problem of tracking an object (or objects) that is (are) uniquely identifiable by a robot's sensors.For any environment of reasonable complexity, a robot is incapable of simultaneously tracking all objects of interest with its sensors as the scope of the robot's sensor field is simply too narrow (or otherwise limited) compared to the size of the environment.
Probabilistic state estimates are powerful mechanisms for representing the uncertainty in a robot's state estimate.However, due to the multimodal nature and potentially high dimensionality of these estimates, estimating the complete density function can be exceedingly challenging, particularly when the noise models are not known exactly.More importantly, in many applications, maintaining an accurate estimating of the density is not as important as choosing an action quickly in a dynamic environment.We believe that the fusion of a high-level policy-based approach with effective probabilistic state estimation algorithms will allow robots to maintain better estimates of their world by combining effective action selection with robust state estimation.
We describe a mechanism by which the robot can intelligently decide how best to aim its sensors to maintain an accurate estimate of the state of all objects.When an object is not in view, its position must be predicted from analyzing multiple sources of model information that can include models for the success or failure of actuation as well as external observations from teammates.We describe a formal policy mechanism by which the robot can select appropriately among multiple hypotheses based on domain information in order to augment a traditional state estimation algorithm to allow the robot quickly reacquire the object.Deciding on the correct policy for the robot can be done a priori and can rapidly be changed if the situation warrants.Our approach was developed for and successfully applied to several real multirobot systems.We have validated it through an extensive empirical simulation study and have used it successfully in competition on our real robots.Our current work is to analyze how we can learn these policies in real time on the robots as they perform their tasks rather than having to rely on a priori-defined policies.Future work will relax the assumption that the objects are uniquely identifiable and address the important complexity of how to address the data association problem in the context of this research.

Figure 1 :
Figure 1: Sony AIBO robots preparing to play robot soccer at a RoboCup competition.

Figure 2 :
Figure2: The general world modeling problem in a dynamic environment includes requiring a robot to use a narrow-scope sensor to track the positions of multiple (static and dynamic) objects in an environment.Determining when and how to use additional sources of information, such as from the effects of actuation, and teammate sensor information is a nontrivial task.

Figure 3 :
Figure 3: Tracking objects with the greedy policy.Example run of the one-dimensional simulation showing the positions of three objects being tracked with the greedy policy.Only 100 timesteps out of 500 are shown for clarity.For each object, "x" marks the target's estimated position.The most dynamic object (bottom in red) is tracked 52% of the time, the second most dynamic (middle in green) is tracked 33% of the time, and the least dynamic (top in blue) is tracked 15% of the time.

Figure 4 :Figure 5 :
Figure 4: Example run of the one-dimensional simulation showing the robot chasing and actuating an object.After actuating the object, the robot maintains three separate hypotheses: one for actuation success, one for actuation failure, and one for the noisy external teammate observation.Actuation succeeds at times 21, 125, 300, and 341.Actuation fails at time 205.The object is moved by an external force at time 307 and the robot must use the teammate observation to relocalize it.Not shown are the other two objects that the robot is tracking.
Case A: high probability of kick success with bad teammate localization.Case B: low probability kick success with bad teammate loc localization.Case C: high probability kick success with good teammate loc localization.Case D: low probability kick success with good teammate localization.

Figure 6 :
Figure 6: Example illustration of the simulated world where the robot has just kicked the ball.The vanilla RBPF estimator tracking the ball (a).The hybrid bank of RBPF estimators tracking the ball in (b).In (b), the three different hypotheses are represented as different shaped particles (vision: circle, kicksuccess: cross, kickfailure: square).
(a) Case (1a): Robot tracks and approaches the ball for a kick (b) Case (1b): Robot performs an openloop grab motion and kick which succeeds (c) Case (1c): The robot looks to the location of the kick success hypothesis for the ball (d) Case (2a): In a different kick, the open-loop grab and kick fails (e) Case (2b): The robot looks to the location of the kick success hypothesis for the ball but doesn't find it (f) Case (2c): The kick success hypothesis is pruned and the robot looks to the kick failure hypothesis (g) Case (3a): The robot attempts to kick, but the ball is stolen by external forces (h) Case (3b): A nearby teammate is shown the ball.Both the kick success and kick failure hypotheses are evaluated (i) Case (3c): All of the kick hypotheses expire and the robot tracks the teammate's reported hypothesis (the ball is hidden from the first robot's view)

Figure 7 :
Figure 7: An illustrative example of the prioritized multiple hypothesis algorithm for reasoning about possible locations for the ball.
Success hypothesis rapidly grows in uncertainty Ball position Ball position (e) Success hypothesis is pruned.Tracking failure hypothesis Ball position Ball position (f) Failure hypothesis is pruned.Robot heads toward teammate information

Figure 8 :Table 7 :
Figure 8: A top-down view of the multihypothesis algorithm running on the robot which demonstrates how the algorithm directs the robot's actions.The field of view of the robot's camera is shown with white lines while the circles represent the uncertainty in the tracked object position.

•
Given: 1. Set of hypotheses based on the game manager: H g 2. Set of hypotheses based on the robot's own sensors: H r 3. Set of hypotheses based on teammate sensors: H t • Select game manager, self, or teammate information -If H g not empty * Select game-hypothesis -else if H r not empty and H t not empty * If vision data actively supports a hypothesis in H r then select self-hypothesis * else If time since ball viewed < threshold then select self-hypothesis * else select teammate-hypothesis -else if H r not empty then select self-hypothesis -else if H t not empty then select teammate-hypothesis -else return • Track hypothesis classes -If game-hypothesis * Track hypothesis H g created by game manager -else if self-hypothesis * If ball is actively in view of the camera, filter self estimates with source vision and actively track the most likely one * else If ball is in possession, track possession estimates * else If ball was kicked, track the kick estimates starting with the Kick Success estimate and switching to the Kick Failure estimate when the former is pruned * else Track any estimates based on older vision information -else If teammate-hypothesis * Rank the teammate estimates based on current self-role * Track best ranked estimate based on teammate role and position Algorithm 2: Hypothesis selection policy for ball tracking in AIBO Robot Soccer.

Table 1 :
Six multihypothesis tracking policies tested in simulation.Policies 4-6 only ever track a single hypothesis.

Table 2 :
Mean error and std dev for all 12 cases evaluated over 10,000 trials in simulation.The policy combinations are sorted with the least error on top and the largest error on bottom.The first column represents the hypothesis selection policy, as described in Table1, and the second column represents the target tracking policy.

Table 3 :
The multihypothesis tracking policies tested in 2D simulation.

Table 4 :
Results of the 2D-simulated soccer simulation experiment showing a list of the policies ordered from best (top) to worse (bottom) based on the estimator error for the four different experimental cases.Every result is statistically significant (based on t-test).

Table 5 :
Results of the 2D-simulated soccer simulation experiment showing a list of the policies ordered from best (top) to worse (bottom) based on the time the ball is in view for the four different experimental cases.Time is measured in frames of video where the ball is visible (frame rate is 33 Hz).Every result is statistically significant (based on t-test), except for the times of policies 1 and 3 in case B.

Table 6 :
Description of terms used in the multiple hypothesis Kalman filter algorithm.Estimated state at time t with accumulated sensor readings from time tX t/t+1Estimated state at time t + 1 with accumulated sensor readings from time t.This occurs when the system dynamics are propagated but no sensor reading has yet been obtained at time t + 1 Covariance matrix at time t + 1 with accumulated sensor readings from time t.See definition of X t/t+1 above.

•
Given: 1.A list of Kalman filters L 2. Sensor poses and expected fields of view • Propagate -For each filter l (with state estimate X t/t and covariance matrix P t/t ) do * Propagate the state and covariance matrix from time t − 1 to time t X t+1/t = F X t/t + Bu t P t+1/t = FP t/t F T + GQ t G T * Update the process noise matrix Q of the filter based on the expected readings of the sensor * If likelihood of P t+1/t is less than a threshold, delete filter l from the list Algorithm 3: Multiple hypothesis state estimation propagation algorithm.-else Initialize a new filter based on the sensor reading and add it to the list.
t+1-Select the Kalman filter l i with the smallest Mahalanobis distance M i -If M i <= thresh, apply the sensor estimate to that filter