 Research
 Open Access
 Published:
Longterm target tracking combined with redetection
EURASIP Journal on Advances in Signal Processing volume 2021, Article number: 2 (2021)
Abstract
Longterm visual tracking undergoes more challenges and is closer to realistic applications than shortterm tracking. However, the performances of most existing methods have been limited in the longterm tracking tasks. In this work, we present a reliable yet simple longterm tracking method, which extends the stateoftheart learning adaptive discriminative correlation filters (LADCF) tracking algorithm with a redetection component based on the support vector machine (SVM) model. The LADCF tracking algorithm localizes the target in each frame, and the redetector is able to efficiently redetect the target in the whole image when the tracking fails. We further introduce a robust confidence degree evaluation criterion that combines the maximum response criterion and the average peaktocorrelation energy (APCE) to judge the confidence level of the predicted target. When the confidence degree is generally high, the SVM is updated accordingly. If the confidence drops sharply, the SVM redetects the target. We perform extensive experiments on the OTB2015 and UAV123 datasets. The experimental results demonstrate the effectiveness of our algorithm in longterm tracking.
Introduction
While visual object tracking as a hot research topic in computer vision has been widely applied in various fields, many challenges are still not resolved especially in target disappearance, partial occlusion, and background clutter, and studying a general and powerful tracking algorithm is a tough task.
A typical scenario of visual tracking is to track an unknown object in subsequent image frames by giving the initial state of a target in the first frame of the video. In the past few decades, visual object tracking technology has made significant progress [1,2,3,4,5,6,7,8,9,10]. These methods are very effective for shortterm tracking tasks, which the tracked object is almost always in the field of view. However, in realistic applications, the requirement for tracking is not only to track correctly, but also to track for a longer period of time [11]. During the period of time, the tracking output is wrong in the absence of the target objects. And the training samples will be incorrectly annotated, which leads to a risk of model drifts. Therefore, it is important to longterm trackers to determine whether the target is absent and have the capability of redetection.
Longterm tracking task also requires the tracker as well as shortterm tracking to maintain high accuracy in the challenges of disappearance and occlusion, especially to stably capture the target object in a longterm video [12]. Therefore, the longterm tracking presents more challenges from two aspects. The first issue is how to determine the confidence degree of the tracking results. In [13], the maximum response value of the target is used to determine the confidence of the tracking result. When the maximum peak value of the response map is lower than the threshold value, the result is determined to be unreliable. However, the response map may fluctuate drastically when the object in occlusion or disappear condition, only using the maximum response value to judge confidence is incredibility. The average peaktocorrelation energy (APCE) criterion in [14] indicates the degree of fluctuation of the response map. If the target is undergoing fast motion, the value of APCE will be low even if the tracking is correct. However, the APCE criterion is commonly used to update trackers in [14]. Secondly, how to relocate the outofview targets remains unresolved. The trackinglearningdetection (TLD) [15] algorithm exploits an ensemble of weak classifiers for global redetection of the out of view. The method fails to classify the target object due to the huge number of scanning windows. The longterm correlation tracking (LCT) [13] algorithm proposes a random fern redetection model to detect the outofview target. In [16], it learns a spatialtemporal filter in a lowerdimensional discriminative manifold to alleviate the influences of boundary effects. But the method still cannot solve the target disappearance problem.
This paper proposes a tracking algorithm combining the learning adaptive discriminative correlation filter tracker and redetector. The proposed method aims to perform robust redetection and relocate the target when target tracking fails. Our main contributions can be summarized as follows:

i)
We propose a stable longterm tracking strategy to track the targets that may disappear or deform heavily in longterm tracking. With the confidence strategy adopted, the learning adaptive discriminative correlation filters (LADCF) tracks the accurate target online. And the support vector machine (SVM) is updated when the confidence degree is generally high. In contrast, if the response maps fluctuate heavily, the SVM is used as a redetector to relocate the target.

ii)
We not only utilize the maximum response but also adopt the APCE criterion to the redetection component. The fusion of the two criteria can accurately determine the state of the tracker and improve the accuracy of the tracking system.

iii)
We evaluate the proposed tracking algorithm on the OTB2015 [17] and UAV123 [18] datasets; the experimental results show that the proposed algorithm performs more stable and accurate tracking performance in the case of occlusion, background clutter, etc. during the longterm tracking.
The structure of the rest of the paper is as follows: Section 2 overviews the related work. Section 3 presents the proposed method. Section 4 reports the experimental results and experimental analysis. Section 5 concludes the paper.
Related works
Correlation filter
Correlation filters have shown outstanding results for target tracking [17, 19]. These methods exploit the circular correlation of the filter in the frequency domain to locate the target object. Bolme et al. [4] propose the pioneering MOSSE tracker, using only gray image features to train the filter. The circulant structure of trackingbydetection with kernels (CSK) tracker [20] employs the illumination intensity features and applies DCFs in a kernel space. The kernelized correlation filters (KCF) [6] further improves CSK by the use of the multichannel histogram of oriented gradient (HOG) features. Danelljan et al. [5] exploit the color attributes of the target object and learn an adaptive correlation filter. The literature [21] proposes a patchbased visual tracker that divides the object and the candidate area into several small blocks evenly and uses the average score of the overall small blocks to determine the optimal candidate, which greatly improves under the occlusion circumstances. The literature [22] proposes an online representative sample selection method to construct an effective observation module that can handle occasional large appearance changes or severe occlusion.
The estimation of the target scale is another important aspect for testing an outstanding tracker. It not only improves better performance, but also provides computational efficiency. The discriminative scale space tracking (DSST) tracker [23] performs translation estimation and scale estimation separately, using a scale pyramid to respond to the scale change. Li and Zhu [24] present an effective scale adaptive scheme, which defines a scale pool to turn the samples of each scale into the same size as the initial sample by the bilinear interpolation method.
The formulation of DCFs exploits the circular correlation which implements learning efficiently by applying fast Fourier transform (FFT). However, it induces the circular boundary effects, which has a drastic negative impact on tracking performance. Danelljan et al. [25] suggest reducing these boundary effects by introducing a spatial regularization component. Nevertheless, regularization will make the cost of the model optimization higher. Galoogahi et al. [26] propose an idea to the premultiply a fixed masking matrix containing the target regions to address such deficiency of DCFs. Then, they apply the alternating direction method of multipliers (ADMM) [27] algorithm to solve the constrained optimization problem in real time. The contextaware correlation filter tracking (CACF) [28] algorithm selects the background reference around the target by considering the global information and adds the background penalty to the closed solution of the filter. The discriminative correlation filter with channel and spatial reliability (CSRDCF) [29] method distinguishes the foreground and background by segmenting the colors in the search area. The learning adaptive discriminative correlation filters (LADCF) [16] approach adds adaptive spatial feature selection and temporal consistency constraints to alleviate the spatial boundary effects and temporal filter degradation problems that exist in the DCF method.
Longterm tracking
Kalal et al. [15] propose a trackinglearningdetection (TLD) algorithm, which decomposes the tracking task into tracking, learning, and detection. Among them, tracking and detection facilitate each other, the shortterm tracker provides training examples for the detector, while the detectors are implemented as a cascade to reduce computational complexity. Enlightened by the TLD framework, Ma et al. [13] propose a longterm correlation filter tracker using a KCF as a baseline algorithm and a random fern classifier as a detector. The FCLTA fully correlational longterm tracker (FCLT) [30] trains several correlation filters on different time scales as a detector and exploits the correlation response to link the shortterm tracker and longterm detector.
Methods
In this section, we describe our tracker. In Section 3.1, we introduced the main tracking framework of our algorithm, which is shown in Fig. 1. In Section 3.2, we introduce the tracker based on LADCF correlation filtering. In Section 3.3, we introduce the composite evaluation criteria of the confidence degree and the SVM based redetector.
The main framework of the algorithm
The proposed algorithm aims to combine both the DCF tracker and the redetector for longterm tracking. First, the baseline correlation filter tracker is adopted to estimate the translation in the tracking stage. Second, the maximum response value and the APCE criterion are utilized to judge the confidence level of the target. Finally, when the value of confidence is higher than the threshold, the baseline tracker achieves the tracking target alone. When the confidence level drops sharply, it indicates tracking failure. We do not update the model and exploit the SVM model to redetect the target object in the current frame. The structure of the algorithm in this paper is shown in Fig. 1.
The tracking framework is summarized as follows:

(1)
Position and scale detection: We utilize DSST to achieve the target position and scale prediction. The t − th frame target is I_{t}, and the filter model is θ_{model}. When a new frame I_{t} appears, we extract multiple scale search windows \( \left[{I}_t^{\mathrm{patch}}\left\{s\right\}\right] \) from it, s = 1, 2, …, S, with S denoting the number of scales. For each scale s, the search window patch is centered around the target center position p_{t − 1} with a size of a^{N}n × a^{N}n pixels, where a is the scale factor and \( N=\left\lfloor \frac{2sS1}{2}\right\rfloor \). The size of the basic search window size is n × n, which is determined by the target size ω × h and padding parameter ϱ as \( n=\left(1+\upvarrho \right)\sqrt{\omega \times h} \). So, the bilinear interpolation is applied to resize each patch into n × n. Then, we extract multichannel features for each scale search window as \( \chi \left(\mathrm{s}\right)\epsilon {\mathbb{R}}^{D^2\times L} \). Given the filter template, the response score can efficiently be calculated in the frequency domain as [16]:
After the implementation of the IDFT on each scale, the maximum value of \( f\in {\mathbb{R}}^{D^2\times S} \) is the relative position and scale.

(2)
Updating: We adopt the same updating strategy as the traditional DCF method:
where α is the updating rate. More specifically, since θ_{model} is not available in the learning stage for the first frame, we use a predefined mask that only the target region is activated to optimize θ as in BACF. And then, we initialize θ_{model} = θ after the learning stage of the first frame.
Correlation filter tracker
In this paper, we set LADCF [16] as the baseline algorithm of our tracking approach.
The LADCF algorithm proposes a new DCFbased tracking method, which utilizes the adaptive spatial feature selection and temporal consistent constraints to reduce the impact of spatial boundary effect and temporal filter degradation. The feature selection process is to select several specific elements in the filter to retain distinguishable and descriptive information, forming a lowdimensional and compact feature representation. Considering an n × n image patch \( x\in {\mathbb{R}}^{n^2} \) as a base sample for the DCF design, the circulant matrix for this sample is generated by collecting its full cyclic shifts, \( {X}^T={\left[{x}_1,{x}_2,\dots, {x}_{n^2}\right]}^T\in {\mathbb{R}}^{n^2\times {n}^2} \) with the corresponding Gaussianshaped regression labels \( y=\left[{y}_1,{y}_2,\dots, {y}_{n^2}\right] \). The spatial feature selection embedded in the learning stage can be expressed as:
where θ denotes the target model in the form of DCF, and ⊛ denotes the circular convolution operator. The indicator vector ϕ can potentially be expressed by θ and ‖ϕ‖_{0} = ‖θ‖_{0}, and diag(ϕ) is the diagonal matrix generated from the indicator vector of selected features ϕ. The ℓ_{0}norm is nonconvex, and the ℓ_{1}norm is widely used to approximate the sparsity [24], so a temporal consistency is constructed by ℓ_{1}norm relaxation spatial feature selection model [16]:
where λ_{1} and λ_{2} are tuning parameters, and λ_{1}<<λ_{2}. θ_{model} denotes the model parameters estimated from the previous frame.
The ℓ_{2}norm relaxation is adopted to further simplify the following expression:
where the lasso regularization controlled by λ_{1} select the spatial feature. In the above formula, the filter template model is used to increase smoothness between consecutive frames to promote time consistency. In this way, the temporal consistency of spatial feature selection can be preserved to extract and retain the diversity of the static and dynamic appearance.
Since the multichannel features share the same spatial layout [16], the multichannel input is represented as Χ = {x_{1}, x_{2}, …, x_{L}}, and the corresponding filter is represented as θ = {θ_{1}, θ_{2}, …, θ_{L}}. By minimization, the goal can be extended to multichannel functions with structured sparsity [16]:
where θ^{j} is the jth element of the ith channel feature vector \( {\theta}_i\in {\mathbb{R}}^{D^2} \). ⊙ denotes the elementwise multiplication operator. The structured spatial feature selection term calculates the ℓ_{2}norm of each spatial location and then executes the ℓ_{1}norm to achieve joint sparsity.
Subsequently, utilizing ADMM [27] to optimize the above formula, we introduce the relaxation variables to construct the goals based on convex optimization [31]. Then, we could obtain the global optimal solution of the model through ADMM and form an enhanced Lagrange operator [16]:
where \( \mathcal{H}=\left\{{\eta}_1,{\eta}_2,\dots, {\eta}_L\right\} \) are the Lagrange multipliers, and μ > 0 is the corresponding penalty parameter controlling the convergence rate [16, 32]. As \( \mathcal{L} \) is convex, ADMM is exploited iteratively to optimize the following subproblems with guaranteed convergence:
Redetector
Confidence criterion
Most existing trackers do not consider whether the detection is accurate or not. In fact, once the target is detected incorrectly in the current frame, severely occluded, or completely missing, this may cause the tracking failure in subsequent frames.
We introduce a measure to determine the confidence degree of the target objects, which is the first step in the redetection model. The peak value and the fluctuation of the response map can reveal the confidence about the tracking results. The ideal response map should have only one peak while all the other regions are smooth. Otherwise, the response map will fluctuate intensely. If we continue to use the uncertain samples to track the target in the subsequent frames, the tracking model will be destroyed. Thus, we exploit to fuse two confidence degree evaluation criteria. The first one is the maximum response value F_{max} of the current frame.
The second one is the APCE measure which is defined as:
where the F_{max} and F_{min} are the maximum response and the minimum response of the current frame, respectively. F_{w, h} is the element value of the wth row and the hth column of the response matrix.
If the target is moving slowly and is easily distinguishable, the APCE value is generally high. However, if the target is undergoing fast motion with significant deformations, the value of APCE will be low even if the tracking is correct.
Target redetection
In this section, we describe the redetection mechanism used in the case of tracking failure. In the redetection module, when the confidence level is lower than the threshold, the SVM [33] is used for redetection. Considering a sample set (x_{1}, y_{1}), (x_{2}, y_{2}), …, (x_{i}, y_{i}), …, x_{i} ∈ R^{d}, including positive and negative samples, where d is the dimension of the sample, y_{i} ∈ (+1, −1) is sample labels, SVM can make segmentation of positive and negative samples to obtain the best classification hyperplane. The classification plane is defined as [33]:
where ω represents the weight vector, and b denotes the bias term. In the case of the linearly classifiable, for a given dataset T and classification hyperplane, the following formula is used for classification judgment:
Combining the two equations, we can abbreviate it as:
The distance from each support vector to the hyperplane can be written as:
The problem of solving the maximum partition hyperplane of the SVM model can be expressed as the following constrained optimization problem:
Next, the paper introduces the Lagrangian function to solve the above problem [33].
where c_{i} > 0 is the Lagrange multiplier, the solution of the optimization problem satisfies the partial derivative of L(ω, λ, c) to ω and b be 0. The corresponding decision function is expressed as:
Then, the new sample points are imported into the decision function to get the sample classification.
In the case of linear inseparability, we use the kernel function to map it to the highdimensional space. In this work, we use the Gaussian kernel function as follows:
When a frame is redetected, an exhaustive search is performed on the current frame using a sliding window, and the HOG features are extracted for each image patch as the Χ vector in the above formula. And the f(x) is calculated by formula (16). Then, we obtain the sample area with the largest f(x). When the response value is greater than the threshold, it will be used as the location of the tracking target again.
The training process of SVM is as follows [33]. By the confidence level, we determine the quality of the sample. Then, samples with high confidence are used as the positive samples, and samples with low confidence are used as the negative samples. The HOG features from positive and negative samples are extracted to obtain the feature vectors. The feature vectors are represented as (x_{i}, y_{i}), i = 1, 2, …, n, where n denotes the number of training samples, x_{i} represents the HOG feature vector, and y_{i} represents the attribute of the extracted sample. If the training sample is positive, then y_{i} = 1, and if the sample is negative, then y_{i} = − 1. For the binary classification problem of our samples, the loss function is defined as formula (18).
When the value of loss is negative, the parameters of SVM are updated as follows.
where c_{j} is the Lagrangian coefficient, x is the feature vector extracted from the sample, and y is the label corresponding to the sample.
Experimental results and discussion
In this section, we evaluate the proposed algorithm on OTB2015 and UAV123 benchmarks [17] with comparisons to other detectionbased tracking algorithms and classical correlation filtering tracking algorithms. Section 4.1 introduces the experimental platform and parameter settings of the experiments. Section 4.2 introduces the experimental datasets and the evaluation criteria for the experiments. Section 4.3 describes the quantitative evaluation of the results and describes the qualitative evaluation in Section 4.4
Experimental setups
The experimental software environment is MATLAB R2016a, and the hardware environment is Intel Core i54200M processor, 4GB memory, Windows 8 operating system.
The regularization parameters λ_{1} and λ_{2} are set to 1 and 15, respectively; the initial penalty parameter μ = 1; the maximum penalty parameter μ_{max} = 20; the maximum number of iterations K = 2; the padding parameter as ϱ = 4; the scale factor as a = 1.01; the threshold for redetection is set to tr = 0.13; and the update threshold is set to tu = 0.20.
Experimental datasets and evaluation criteria
The OTB2015 dataset has a total of 100 video sequences, including 11 challenges, namely, illumination variation (IV), scale variation (SV), occlusion (OCC), deformation (DEF), motion blur (MB), fast motion (FM), inplane rotation (IPR), outofplane rotation (OPR), outofview (OV), background clutter (BC), and low resolution (LR). The UAV123 consists of 143 challenging sequences, including 123 shortterm sequences and 20 longterm sequences. Their evaluation criteria adopt the distance precision and overlap precision in onepass evaluation (OPE) as the criteria of the evaluation algorithm. The overlap precision is defined as the percentage of overlap ratios exceeding 0.5. The distance precision shows the percentage of location error within 20 pixels.
Quantitative evaluation
In this paper, we compare our algorithm with 6 stateoftheart trackers on the OTB2015 dataset, including 2 trackingbydetection algorithms, such as LCT [13] and large margin object tracking with circulant feature maps (LMCF) [14], and 4 mainstream correlation filtering tracking algorithms, such as CSK [20], KCF [6], DSST [23], and backgroundaware correlation filters (BACF) [26]. Figure 2 shows the OPE success rate and precision plots of these algorithms. It can be seen from Fig. 2 that the proposed algorithm has been significantly improved compared with other algorithms. The precision and success rate of our method are 81.4% and 59.9%, respectively. Through experiments, we found that the shortterm target trackers learn some wrong information, when the target is occluded or disappears. Thus, the template is polluted by the wrong information and unable to track the target correctly in subsequent frames. Therefore, compared with the BACF algorithm, our method improves the precision and success rate by 14.8% and 7.8%, respectively. The LCT exploits the random fern algorithm to redetect targets, which is slow to operate. Thus compared with the trackingbydetection LCT algorithm, the proposed algorithm improves the precision and success rate by 8% and 9.3%, respectively. Compared with the LMCF algorithm with multipeak detection, our method increased the precision and success rate by 11.2% and 11.1%, respectively.
In order to further verify the superiority of our method, we analyze the tracking performance through attributebased comparison in Table 1, which shows the area under the curve (AUC) scores of the success plots with 11 different attributes.
As shown in Table 1, the proposed algorithm in this paper achieves the best performance on 11 attributes. In the case of OCC, our algorithm score is 10.1% higher than that of the LMCF algorithm (trackingbydetection style) and 12% higher than the algorithm BACF algorithm (shortterm correlation filtering style). For FM images, our algorithm is 4.6% higher than the secondranked BACF algorithm and 5.1% higher than the LCT algorithm using random fern redetection. In the above condition, the target model may be contaminated, which makes target tracking difficult. Meanwhile, our model can solve this problem by accurate redetection via SVM. In the case of OPR, LCT achieves a score of 48.5%. And our tracker provides a gain of 8.7%. This is because the baseline algorithm applied in this paper solves the influence of boundary effects to a certain extent and can achieve higher accuracy when the target rotation occurs. In the case of OV, the score of our algorithm is 50.7%, which is 3.9% higher than the BACF algorithm. The reason is that our template stops updating when the target goes out of view; the SVM is used to detect the target again. When the target reappears in the field of view, our model is not contaminated and can continue tracking the target correctly.
Furthermore, we present the OPE success rate and precision plots on UAV123 in Fig. 3.
As shown in Fig. 3, our method beats other algorithms on the UAV123 dataset. Specifically, our method achieves the AUC scores of 65.2% and 46.1%, which is better than LCT by 13.1% and 13.4%. At the same time, the proposed method is 16.1% and 10.5% higher than BACF, because the proposed redetection approach provides a novel solution to redetect the lowconfidence targets to improve tracking accuracy.
Qualitative evaluation
We selected 7 representative benchmark sequences from OTB2015 to demonstrate the effectiveness of our algorithm. The visual evaluation results are shown in Fig. 4. As it can be seen from Fig. 4, in the “Jogger” sequence, the target is blocked at the 70th frame and the target reappears in the field of view at the 84th frame. Due to the redetection mechanism, our tracker can track the target correctly. But the shorttime correlation filter tracking algorithm learns error information during occlusion, which leads to tracking errors in subsequent frames. In the “Soccer” and “Matrix” sequences, due to background clutter, the algorithms such as LCT and BACF lose the target. In contrast, the proposed algorithm can successfully handle such situations. In the “Car4” sequence, due to the scale change problem, the scalebased DSST algorithm and the proposed algorithm both show better performance. In the “Shaking” sequence, the proposed algorithm loses its target in the 17th frame due to issues such as similar lighting changes and background. However, owing to the supplement of a redetection mechanism, the proposed algorithm relocates the target at the 18th frame and keeps tracking correctly. In the “Bolt” sequence, our algorithm follows the target very closely even in the case of rapid motion of the target. In the “Dog” sequence, when the target is deformed, our algorithm can accurately track the target, while the BACF and LMCF algorithms have a certain offset. It can be seen from the above description that our algorithm achieves higher accuracy in these 7 sequences.
Furthermore, we compare our method with the baseline tracker using 7 representative benchmark sequences of OTB2015 in Fig. 5. The first three rows are shortterm sequences which none of which exceeds 1000 frames, and the last four rows are longterm sequences, which all exceed 1000 frames.
As shown in Fig. 5, in the experiments for the shortterm sequences, the LADCF tracker drifts when the target objects undergo heavy occlusions (Soccer) and does not redetect targets in the case of tracking failure. Moreover, the LADCF tracker fails to handle background clutter and deformation (Ironman, Bird1), since only the tracking component without the redetection mechanism makes it less effective to discriminate targets from the cluttered background. In contrast, our method can track the object correctly on these challenging sequences because the trained detector effectively redetects the target objects.
In the Sylvester and Lemming sequences, the LADCF algorithm tracks incorrectly due to the rotating conditions encountered in these sequences, while our method provides better robustness to these conditions. In the Liquor sequence, the LADCF tracking algorithm is similar to our algorithm before the target is occluded. But when the target is occluded, the LADCF method fails to locate the occluded target. In the Rubik sequence, since the target object has undergone deformation and color variation at the 854th frame, the LADCF tracker fails to track correctly. Our method is able to track successfully due to redetection. In our method, if the tracking fails, we perform the redetection procedure and initialize the tracker so that the target can be redetected. Thus, our method can correctly track the target all the time.
Overall, our method performs well in estimating the positions of the target objects, which can be attributed to three reasons. Firstly, the combined confidence criterion of our method can correctly identify the target even in very lowconfidence cases. Secondly, our redetection component effectively redetects the target objects in the case of tracking failure. Thirdly, our baseline tracker achieves adaptive discriminative learning ability on a lowdimensional manifold and improves the tracking effect.
Conclusions
This paper proposes a longterm target tracking algorithm, where the two main components are a stateoftheart LADCF shortterm tracker which estimates the target translation and a redetector which redetect the target objects in the case of tracking failure. Besides, the algorithm introduces a robust confidence criterion to evaluate the confidence value of the predicted target. When the confidence value is lower than the specified threshold, the SVM model is utilized to redetect the target objects and the template is not updated. The algorithm is suitable for longterm tracking because it can detect the target accurately in real time and update the template with high reliability. Numerous experimental results show that the proposed algorithm achieves better performances than the other tracking algorithms.
Abbreviations
 LADCF:

Learning adaptive discriminative correlation filters
 APCE:

Average peaktocorrelation energy
 SVM:

Support vector machine
 OPE:

Onepass evaluation
References
D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters for online nonlinear/nonGaussian Bayesian tracking. IEEE Trans. Signal Process. 50(2), 174–188 (2002)
H. Yang, J. Wang, Y. Miao, Y. Yang, Z. Zhao, Z. Wang, Q. Sun, D.O. Wu, Combining spatiotemporal context and Kalman filtering for visual tracking. Mathematics 7(11), 1–13 (2019)
D.S. Bolme, J.R. Beveridge, B.A. Draper, Y.M. Lui, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Visual object tracking using adaptive correlation filters, (IEEE, San Francisco, 2010), pp. 2544–2550
M. Danelljan, F. Shahbaz Khan, M. Felsberg, J. Van de Weijer, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Adaptive color attributes for realtime visual tracking (2014), pp. 1090–1097
J.F. Henriques, R. Caseiro, P. Martins, J. Batista, Highspeed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)
H. Nam, B. Han, in 2016 IEEE Conference on Computer Vision and Pattern Recognition. Learning multidomain convolutional neural networks for visual tracking (2016), pp. 4293–4302
L. Bertinetto, J. Valmadre, J.F. Henriques, et al., in European Conference on Computer Vision Workshop. Fullyconvolutional Siamese networks for object tracking, vol 9914 (2016), pp. 850–865
E. Gundogdu, A.A. Alatan, Good features to correlate for visual tracking. IEEE Trans. Image Process. 27(5), 2526–2540 (2018)
M. Asadi, C.S. Regazzoni, Tracking using continuous shape model learning in the presence of occlusion. EURASIP J. Adv. Signal Process. 2008, 250780 (2008)
T. Li, S. Zhao, Q. Meng, et al., A stable longterm object tracking method with redetection strategy. Pattern Recognit. Lett. 127, 119–127 (2018)
B. Yan, H. Zhao, D. Wang, H. Lu, X. Yang, in IEEE/CVF International Conference on Computer Vision. ‘Skimmingperusal’ tracking: a framework for realtime and robust longterm tracking (2019), pp. 2385–2393
C. Ma, X. Yang, C. Zhang, M.H. Yang, in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Longterm correlation tracking (2015), pp. 5388–5396
M. Wang, Y. Liu, Z. Huang, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Large margin object tracking with circulant feature maps (2017), pp. 4800–4808
Z. Kalal, K. Mikolajczyk, J. Matas, Trackinglearningdetection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)
T. Xu et al., Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans. Image Process. 28(11), 5596–5609 (2019)
Y. Wu, J. Lim, M.H. Yang, Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
M. Mueller, N. Smith, B. Ghanem, in European Conference on Computer Vision. A benchmark and simulator for UAV tracking, (Springer, Amsterdam, 2016), pp. 445–461
Y. Wu, J. Lim, M.H. Yang, in Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Online object tracking: a benchmark (2013), pp. 2411–2418
J.F. Henriques, R. Caseiro, P. Martins, J. Batista, in European Conference on Computer Vision. Exploiting the circulant structure of trackingbydetection with kernels (2012), pp. 702–715
W. Ou, D. Yuan, D. Li, et al., Patchbased visual tracking with online representative sample selection. J. Electron. Imaging 26(3), 033006 (2017)
W. Ou, D. Yuan, Q. Liu, et al., Object tracking based on online representative sample selection via nonnegative least square. Multimed. Tools Appl. 77(9), 10569–10587 (2018)
M. Danelljan, G. Häger, F.S. Khan, M. Felsberg, in Proceedings of the British Machine Vision Conference. Accurate scale estimation for robust visual tracking, (BMVA Press, Nottingham, 2014), pp. 1–5
Y. Li, J. Zhu, in European Conference on Computer Vision Workshop. A scale adaptive kernel correlation filter tracker with feature integration, (Springer, Zurich, 2014), pp. 254–265
M. Danelljan, G. Hager, F. Shahbaz Khan, M. Felsberg, in Proceedings of the IEEE International Conference on Computer Vision. Learning spatially regularized correlation filters for visual tracking (2015), pp. 4310–4318
H. Kiani Galoogahi, A. Fagg, S. Lucey, in Proceedings of the IEEE International Conference on Computer Vision. Learning backgroundaware correlation filters for visual tracking (2017), pp. 1135–1143
S. Boyd, N. Parikh, E. Chu, et al., Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. Now Publishers Inc 3(1), 1–122 (2011)
M. Mueller, N. Smith, B. Ghanem, in IEEE Conference on Computer Vision & Pattern Recognition. Contextaware correlation filter tracking (2017), pp. 1387–1395
A. Lukezic, T. Vojir, L.C. Zajc, J. Matas, M. Kristan, in IEEE Conference on Computer Vision and Pattern Recognition. Discriminative correlation filter with channel and spatial reliability (2017), pp. 4847–4856
A. Lukežič, L. Čehovin Zajc, T. Vojíř, J. Matas, M. Kristan, in Asian Conference on Computer Vision. FCLT  a fullycorrelational longterm tracker (2017)
R. Jenatton, J. Mairal, et al., Structured sparsity through convex optimization. Stat. Sci. 27(4), 450–468 (2012)
D.P. Bertsekas, Constrained optimization and Lagrange multiplier methods, (Academic, Pittsburgh, 1982)
T. Joachims, in Advances in kernel methods support vector learning. Chapter 11, ed. by B. Scholkopf, C. Burges, A. Smola. Making largescale SVM learning practical (MIT Press, Cambridge, 1999), pp. 169–184
Acknowledgements
Thanks to the anonymous reviewers and editors for their hard work.
Funding
This work was supported in part by the China Postdoctoral Science Special Foundation Funded Project (2015T80717), the Natural Science Foundation of Shandong Province (ZR2020MF086).
Author information
Affiliations
Contributions
ZZ and DOW proposed the original idea of the full text. JZ and JW designed the experiment. JW and NX performed the experiment. JW and HY wrote the manuscript under the guidance of ZZ. CW, JZ, and JW revised the manuscript. All authors read and approved this submission.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, J., Yang, H., Xu, N. et al. Longterm target tracking combined with redetection. EURASIP J. Adv. Signal Process. 2021, 2 (2021). https://doi.org/10.1186/s13634020007133
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13634020007133
Keywords
 Learning adaptive discriminative correlation filters
 Longterm tracking
 Redetection