Object tracking method based on edge detection and morphology

,

scene and needs.Traditional edge detection operators such as the Sobel operator are not able to accurately extract image edges, although they can smooth the noise and eliminate it.
The canny operator is a multistage optimization operator that filters noise and detects the edges of an image.Before processing the image, the canny operator smooths the image using a Gaussian filter to remove noise.The operator statistically extracts the edge information of the image while extracting the magnitude and direction of the image gradient by finite difference of the first-order partial derivatives.During processing, the Canny operator will also perform the process of non-polarization suppression, which means keeping the gradient direction locally polarized to reduce noise and blurred regions in the edge detection results.Finally, the Canny operator uses two thresholds to connect the edges.However, the conventional Canny algorithm uses a 2 × 2 convolutional Kernel with only horizon- tal and vertical directions in the detection direction, which is not complete enough in the extraction of edge information, so this paper adopts a 3 × 3 convolutional kernel with one more gradient component selected in the diagonal direction.
Morphology theory is a combination of lattice theory and topology, which is a nonlinear filtering technique widely used in the field of image processing, as well as a filtering technique in the study of image edge detection and image segmentation.Therefore, edge detection using mathematical morphology can reduce the effect of noise and preserve the original image content with the best results.
In motion target tracking, the commonly used methods are Camshift-based algorithms [16][17][18][19], Algorithms based on Kalman filter [20,1] as well as based on particle filtering [21][22][23][24].The Camshift algorithm is an improved algorithm derived from the MeanShift algorithm.It was first described by Gary R. Bradski et al.It was proposed and applied to face tracking.However, the algorithm only uses color information for tracking and is prone to tracking errors when the background color is similar to the target.Particle filtering algorithm is an algorithm that can effectively deal with the problem of target tracking in complex environments; the algorithm approximates the a posteriori probability density of the system by utilizing a certain number of particles to achieve target tracking, but the algorithm has a high complexity due to the need for a large number of samples.The Kalman filter algorithm can track the target or assist the tracking process by constantly updating the state of the target, which effectively improves the quality of target tracking based on appearance features, reduces the object boundary tracking error, and narrows the scope of the candidate tracking region.
Therefore, to improve the effective detection and tracking of moving targets such as people, this paper improves the traditional Canny operator in the calculation of gradient magnitude and fuses it with the morphological edge detection algorithm, to accurately recognize the edge of the target and then get the target's position, based on which the improved kernel correlation filtering algorithm is introduced for effective tracking of pedestrians [25,19,26,27].The whole process of the algorithm is shown in Fig. 1.

Canny edge detection principle
Canny edge detection is a multi-step algorithm and has gradually improved the evaluation criteria in theory, the main steps of this edge detection algorithm include Gaussian filtering, calculation of gradient magnitude and direction, non-maximum value suppression, and double threshold processing.Among them, Gaussian filtering can effectively reduce noise, gradient magnitude and direction can extract the edge information in the image, non-extremely large value suppression can retain the detailed information of the edge, and double thresholding can distinguish the edge from the noise.The advantage of the Canny edge detection operator is that it can detect various types of edges, including strong edges and weak edges, and it can accurately locate the position of the edges.Therefore, the Canny edge detection operator is widely used in image processing and analysis in the field of computer vision.It follows the following steps in detecting the image edges.
1. Smooth the image and remove noise using Gaussian filtering.2. Derive the Gaussian filter to obtain the magnitude and direction of the gradient along the x-and y-dimensions.3. Non-maximum suppression of the gradient magnitude, retaining only the local maxima of the gradient.4. Detecting and connecting edges with a double-thresholding algorithm.
This edge detection operator has good localization performance and can effectively suppress the low probability multiple responses of single edges and false edges, so it can extract the edge information of complex background images.The steps of the algorithm are shown in Fig. 2.

Improved Canny edge detection operator
Since the traditional Canny edge detection operator uses a 2 × 2 convolutional kernel, unable to extract effective edge information, therefore, in this paper, while retaining the convolutional template of the Sobel operator, we improve the differential template near 2 × 2 , change the convolution kernel to 3 × 3 and add the differential template of the first-order partial derivatives in the 45 • and 135 • directions, the template representation of the eight-neighborhood of the center pixel in the gradient template is given as Here, the convolution kernels for the x direction, y direction, 45 • direction, and 135 • direction are denoted as S x , S y , S r , S l , respectively, and the operations are performed using the convolution kernels while processing the image so that the gradient components g x , g r , g y and g l can be obtained in each of the different directions.Finally, the gra- dient magnitude and gradient direction angle of the image are

Improved Canny edge detection with morphological fusion
Multiscale morphological edge detection methods mainly utilize structural features at different scales to detect edges in images.This method can capture edges of different sizes and shapes at different scales.Specifically, this method utilizes a set of structural units of varying sizes to corrode and swell the image, based on which the edge image can be obtained by differentiating the above-processed image.The advantage of this method is that it is able to remove noise and details, which improves the detection accuracy of the edges.Therefore, in order to get a better filtering effect on the image, this paper fuses the multiscale morphology and the improved Canny edge detection operator.
To increase the noise immunity of the improved algorithm, the image can be detected using three structural elements of different sizes.This has the advantage of suppressing noise and getting a more complete edge profile.The structuring elements chosen for this paper are as follows ( 1)

Fig. 2 Traditional Canny algorithm flow
Firstly, the outermost edge of the image is extracted using the noise-resistant expansion operator and then based on this, the inner boundary of the image is extracted using this denoising operator.An edge detection formula for resisting noise is In the formula, f is the input image, y d represents the outer edge contour of the image, y e represents the inner edge contour of the image, the expansion operation is ⊕ , the ero- sion operation is , • is the open operation, and • is the closed operation.Finally, the obtained inner and outer edges, as well as the edges bounded between the inner and outer edges, are weighted and fused to obtain an enhanced noise-resistant morphological edge detection algorithm.
In order to obtain a well-defined image, this paper fuses multiscale morphology with the improved Canny detection operator, and the specific flowchart is shown in Fig. 3.
In the figure, the input image is a color image and the output image is the resultant image of edge detection by fusing Canny edge detection operator and multiscale morphology; the two algorithms are weighted and fused through the double threshold connection described in the literature, the steps are as follows: (4) 1.First, the image is edge detected using the improved Canny operator, and the morphological edge detection operator to obtain the edge images B 1 and B 2 ; 2. Then, the two images are subtracted to obtain the dissimilarity map between B 1 and B 2 , B 1 and B d ; 3. Find pixel points in the difference map that have a gray value other than 0, and then mark the edge points adjacent to them as 1 in the neighborhood; 4. Calculate the number of 1's around the center pixel point, if the number of 1 's is four, then, this point is defined as a weak edge point of the image, and finally, the detail edge map b is obtained; 5. Fuse the edge image B 1 and B 2 with the edge map b to get a clearer edge map.

Improved kernel correlation filter tracking algorithm based on Kalman filtering
Firstly, we briefly introduce the process of target detection, the main steps of detecting moving objects are difference operation, binarization processing, morphological processing, edge extraction, and finally obtaining the capture of moving objects.Since morphological processing and edge extraction have been introduced in the previous part of the article, we will not repeat them here.The concepts of difference operation and binarization are introduced below.

Differential Operations
The difference operation of the image, which is mainly to do the difference operation between the pixels of the current frame and the background pixels, so as to initially obtain the moving object, its mathematical formula is shown as follows?

Binarization
By setting a threshold, the foreground obtained by subtracting the background is a binary image.
Diff(x, y, frame ) denotes the gray value at the pixel position (x, y) under the frame, T denotes the segmentation threshold, and F(x, y, frame ) denotes the binary image.On this basis, the threshold T is set to segment the grayscale image and distinguish the image from the background.Selecting too high a threshold value will not be able to completely segment the moving objects, and selecting too low a threshold value will generate a lot of noise.Therefore, the threshold needs to be correctly selected to minimize the background interference and noise.Finally, the detection results are input into the tracker.(

Moving object tracking
The traditional kernel correlation filtering algorithm is a discriminative target tracking algorithm [28][29][30]; the core of the KCF algorithm is to collect positive and negative samples in the target area through the cyclic matrix and train the target using the ridge regression algorithm and then, transform the matrix into the dot product of elements through a diagonalizable property in the cyclic matrix in the Fourier space, which reduces the computational cost, improves the tracking speed, and enables the algorithm to satisfy the real-time requirements.However, the traditional KCF algorithm often needs to manually box the target tracking box; the algorithm accuracy is not high, and when the target occurs occlusion and overlap easily lead to tracking failure, while the Kalman filter algorithm is an algorithm to track the target by constantly updating the target state [31][32][33], which can be used for target tracking or assisting the tracking process.Kalman filtering realizes target tracking by fusing the prediction of the target state and the measured values, which can effectively improve the quality of target tracking based on appearance features, reduce the object boundary tracking error, and narrow the range of candidate tracking areas.

Kalman filter principle
Kalman filtering is an algorithm used for state estimation, the basic principle of which is to realize state estimation by fusing predictions and observations of the system state.In target tracking, Kalman filtering can be used to estimate state quantities such as position, velocity and acceleration of the target.The state transfer equation for Kalman filtering is: The principle of this equation is to estimate the current state based on the state and control variables of the previous moment, ω k is a noise obeying a Gaussian distribution, and A k is a state transfer matrix.The target observation equation is: v k is the observed noise, obeying a Gaussian distribution, also known as measurement noise.The state update phase of Kalman filtering is based on the predicted state values and the observed residuals to compute the state estimates and the state covariance matrix.Kalman filter gain coefficient calculation:

State variable update:
Error covariance values are updated: (14)

Principle of kernel correlation filtering
By using a one-dimensional vector x = (x 1 , x 2 , . . ., x n ) as the base sample, the sample is sampled using the concept of a cyclic matrix: The ridge regression expression for this algorithm is?
Putting w in terms of a linear combination of samples as: For the regression function, the purpose of this training is to use x i to minimize the error of the regression objective y i , the above can be attributed to training the classifier.According to the nature of the cyclic matrix, the Fourier transform of Eq can simplify the equation as: Here, ⊙ means dot product, ∧ stands for Fourier variation; x is the base sample, z is the training sample, k zx is the kernel relationship between the base sample and the training sample; α stands for vector coefficients.

Kalman filter-based kernel correlation filter tracking
This paper proposes a kernel correlation filter tracking algorithm extended by Kalman filter.Firstly, the moving target is detected by the improved target edge detection algorithm in the previous paper, and the KCF tracking frame is initialized at the same time; then, once the target is occluded or lost, the position of the target is predicted by using the Kalman filter, and this tracking algorithm is able to achieve stable tracking of the target.The steps of the improved tracking algorithm are shown below.
1. Read video image sequences; 2. The specific position of the moving target is extracted by the target edge detection algorithm and marked with a rectangular box; 3. Calculate the initial position of the target according to the KCF algorithm; 4. Initialize the Kalman filter and predict the position of the target in the next frame.

Experimental environment
In order to verify the superiority of the improved algorithm in this paper, the relevant platform configuration for this simulation experiment is as follows: hardware (19 environment CPU Intel(R) Core(TM)-i7-4200H@3.4GHz,The RAM is 16GB, running on Windows 11 operating system with Visual Studio 2019+MATLAB2021b running software.

Evaluation indicators for target detection
In order to confirm the effectiveness of the edge detection algorithm proposed in the previous section, the edge detection effect of the image is evaluated by calculating the root mean square error (MSE) [34][35][36][37] of the image processed by the edge detection algorithm and the peak signal-to-noise ratio (PSNR) [38] of the image.The specific calculation methods are as follows I(i, j) is the pixel value of the original image at (i, j), K(i, j) is the pixel value of the processed image at (i, j), M and N are the number of pixels in the original and processed images, respectively.MSE is used to measure the difference between the processed image and the original image, when the value of MSN is smaller, the better the effect of image detection.
The Peak Signal to Noise Ratio of an image, i.e., PSNR is used to measure the similarity between the processed image and the original image and is calculated as follows: MAX I is the maximum value of the pixel value, usually 255.The edge detection effect of the image can be evaluated by calculating the MSE and PSNR of the image processed by different algorithms.The smaller the MSE and the larger the PSNR, the closer the processed image is to the original image and the better the processing effect.Table 1 gives the comparison results of MSE and PNSR of the edge image after several different edge detection algorithms process the noise-free image.From the data in the table, it can be seen that the edge detection algorithm in this paper can better preserve the detail information of the image, the PSNR index is higher than the rest of the algorithms, and the MSE index is lower than the rest of the algorithms, which indicates that the image is smoother after processing by the algorithm in this paper, and it has better results.(

Targeted tracking evaluation indicators
The metrics for evaluating this paper's algorithm as well as other tracking algorithms in this experiment include the average distance precision (DP) [39] and the average overlap precision (OP) [40].Distance precision is the percentage of frames with center error below a certain threshold (usually 20) to the total number of video frames, which can effectively reflect the robustness of the algorithm.Success rate indicates the percentage of frames with overlap greater than a threshold to the total number of video frames.

Qualitative inorganic analysis
In this paper, a color image is first used as a test image and tested and compared with the edge detection algorithm, and the comparison results are shown in Fig. 4.
The figure shows a color map and the edge image processed by the three algorithms; from the figure, it can be seen that although all four algorithms can extract the edge information, but the algorithm proposed in this paper is clearer and more accurate In this paper, a section of the surveillance video is randomly selected as a test video, the current frame and the video background to do the difference operation, to obtain the image and then do the binarization process; the simulation results obtained are shown in Fig. 5.
From the simulation results of the original video and the video after differential binarization in Fig. 4, it can be seen that there is a large amount of noise interference in the figure after differential binarization, and most of the noise as well as the motion interference can be removed by morphological processing, which improves the processing effect of the whole system.
The simulation effect of edge extraction is shown in Fig. 6.Through the above steps, the detection of the moving human body is finally realized and the detection results are shown in Fig. 7.
From the simulation results above, it can be seen that after the algorithm, it basically realizes the detection of each moving individual in the video; in the figure, the red part is the actual moving human body.Then, the tracking algorithm of this paper is applied to track the video sequence soccer in the standard test set OTB2013, as shown in Fig. 8. From the above simulation results, it is clear that a particular moving figure can be successfully tracked using this algorithm.
The following Fig. 9 shows the tracking failures of other trackers, the red target box is tracked using this algorithm, the yellow target box is tracked using the Discriminative Scale Space Tracker (DSST) algorithm, and the blue target box is tracked using the original KCF algorithm; from the figures, it can be seen that the tracker in this paper is able to accurately track the characters, and the other trackers have experienced varying degrees of tracking frame drifting.

Quantitative analysis
In this paper, several more typical algorithms are selected for comparison; they are the original KCF algorithm, Spatially Regularized Correlation Filters (SRDCF) algorithm, Sum of Template And Pixel-wise LEarners (STAPLE) algorithm, and Discriminative Scale Space Tracker (DSST) algorithm, while the algorithms are tested on the OTB2013 dataset.Fig. 10 shows the accuracy and success graphs obtained for the dataset video sequences.Through the experimental results, it can be seen that the accuracy of the  improved algorithm Ours in this paper is 83.3%, which is slightly better than the previous algorithm, and the success rate is 64.4%, which is also better than the pre-improved algorithm.

Ablation experiment
In order to check that the image edge detection model and the target position prediction model added in this paper are improving the performance of the tracker, the KCF tracker model is used as the baseline, and the ablation experiments are performed on   2, when the target detection module is added, the DP(%) is improved, and the OP(%) is improved.The target tracking algorithm in this paper incorporates the above two models, and the tracking performance is improved compared to the original KCF tracking algorithm.

Conclusion
The paper proposes a target detection and tracking method applicable to intelligent surveillance scenarios.This method can significantly improve the accuracy and success rate of target tracking in intelligent surveillance scenarios.By improving the Canny edge detection operator while integrating the relevant algorithms of mathematical morphology, the clarity of image edge detection is effectively improved, and finally the extended kernel correlation filtering algorithm is applied to video surveillance and tracking in the standard test set OTB2013.The experimental results show that the method can accurately detect moving targets in the field of video surveillance and perform real-time tracking.

Fig. 1
Fig. 1 Process diagram of the target tracking algorithm including detection and prediction modules

Fig. 4
Fig. 4 Detection results of different edge detection algorithms for portraits

Fig. 8
Fig. 8 Motion body tracking simulation effect

Fig. 9
Fig. 9 Tracking effect of different trackers

Fig. 10
Fig. 10 Plot of success and accuracy rates of different algorithms

Table 1
Results after processing by different edge detection algorithms

Table 2
Ablation studies