2.1 Multi-sensor data fusion
Human beings are a complex multi-sensor information fusion system, which is performing information fusion all the time. Multi-sensor information fusion is the use of multiple sensors to obtain relevant information, and perform data preprocessing, correlation, filtering, integration, and other operations to form a framework that can be used to make decisions, so as to achieve identification, tracking, and situation assessment. [4]. In summary, the multi-sensor data fusion system includes the following three parts:
-
1.
Sensor. Sensors are the cornerstone of a sensor data fusion system. Without sensors, data cannot be obtained. Multiple sensors can obtain more comprehensive and reliable data.
-
2.
Data. Data are the processing object in the multi-sensor data fusion system and the carrier of fusion. The quality of the data determines the upper limit of the fusion system performance, and the fusion algorithm only approaches this upper limit.
-
3.
Fusion. Fusion is the core of a multi-sensor data fusion system. When the quality of the information cannot be changed, fusion is to mine the information to the greatest extent and make decisions based on the data.
Data fusion performs multi-level processing on multi-source data. Each level of processing abstracts the original data to a certain extent. It mainly includes data detection, calibration, correlation, and estimation [5]. Data fusion can be divided into three levels according to the degree of abstraction of data processing in the fusion system:
-
1.
Pixel-level fusion. This method is currently the most widely used fusion method. Directly use the original image data, design algorithms to process and integrate the pixels one by one to achieve the purpose of image fusion. The data processed at the pixel level are all raw data, without any conversion, the information expressed is more accurate. However, because each pixel of the original data is processed, the amount of data becomes larger, and the fusion efficiency is low.
-
2.
Feature level fusion. This method is based on the feature information of the image itself for fusion. The algorithm is used to extract feature information such as the edge and contour of the image, and then the information is fused. This method reduces the amount of data in the fusion process and improves the fusion efficiency, but it has higher requirements for feature extraction, and the quality of feature extraction directly affects the fusion effect [6, 7].
-
3.
Decision-level integration. This type of method is the most complicated. It is necessary to prepare expert knowledge related to the image content before image processing, and then perform targeted adjustment and processing of the image. This fusion method is relatively abstract and requires higher expert knowledge, but due to its strong pertinence, the fusion effect is also more ideal [8, 9].
Shannon entropy is defined as "the averaged amount of information after excluding information redundancy", which is defined by Shannon as "something that can remove uncertainty". Most modern scholars have proposed the opposite view to Shannon's data dfinition, that is, information is the increase of certainty. In accordance with information theory, the multidimensional information created by the fusion of multiple one-dimensional pieces of information is more informative than any one-dimensional piece of information, which is the theoretical ground for the fusion of multi-sensor data. Below we give the proof from the perspective of Shannon entropy [10].
Suppose the Shannon entropy H(X) of the random variable X is a function of the probability distribution P1, P2, …, Pn. According to the definition of Shannon entropy:
$$H(X) = - \sum\limits_{j = 1}^{n} {P_{j} \log P_{j} }$$
(1)
Among them, \(0 \le P_{j} \le 1\), easy to get:
$$H(P1,P2 \ldots Pn) \ge 0$$
(2)
If and only if each item on the right side of the equal sign in formula (1) is 0, the equal sign in formula (2) is true:
$$\sum\limits_{j = 1}^{n} {P_{j} } = 1$$
(3)
Combining formula (2) and formula (3), we can see that when \(P_{j} = 1\) and \(P_{k} = 0\), formula (2) takes the equal sign.
Assuming that the Shannon entropy of random variables X and Y are H(X) and H(Y), respectively, their joint Shannon entropy is H(XY). According to the additivity of Shannon entropy, we can know:
$$H(XY) = H(X) + H(Y\left| X \right.)$$
(4)
Suppose the Shannon entropy H(Y) of the random variable Y is a function of P1, P2 … Pm, and the conditional transition probability of X and Y is \(Pij\). Combining formulas (1) and (4), the Shannon of the two-dimensional random variable can be obtained. Entropy expression:
$$H(P_{1} P_{11} ,P_{1} P_{12} , \ldots P_{1} P_{1n} ,P_{2} P_{21} ,P_{2} P_{22} , \ldots P_{2} P_{2n} ,P_{m} P_{m1} ,P_{m} P_{m2} , \ldots P_{m} P_{mn} ) = H(P_{1} ,P_{2} , \ldots P_{n} ) + \sum\limits_{j = 1}^{m} {P_{j} H(P_{j1} ,P_{j2} , \ldots P_{jn} )}$$
(5)
From the non-negativity of Shannon entropy and \(0 \le P_{j} \le 1\), the formula (6):
$$H(P_{1} P_{11} ,P_{1} P_{12} , \ldots P_{1} P_{1n} ,P_{2} P_{21} ,P_{2} P_{22} ,P_{2} P_{2n} , \ldots P_{m} P_{m1} ,P_{m} P_{m2} \ldots P_{m} P_{mn} ) \ge H(P_{1} ,P_{2} , \ldots P_{n} )$$
(6)
Generalizing to the scenario of n random variables X1, X2, Xn, from the additivity of Shannon entropy, formula (7) can be obtained:
$$H(X_{1} X_{2} \ldots X_{n} ) = H(X_{1} ) + H(X_{1} \left| {X_{2} } \right.) + \cdots + H(Xn\left| {X_{1} X_{2} + \cdots Xn - 1} \right.)$$
(7)
Shannon entropy is a quantity that describes the uncertainty of a system or variable, not the amount of information in the system, but when a random variable takes a specific value, the value with information is equal to the Shannon entropy [11]. The larger the Shannon entropy, the larger the amount of information that the random variable has when it takes a specific value. Combining formulas (6) and (7), it can be inferred that the multi-dimensional information fused by multiple single-dimensional information contains more information for a specific target than any single-dimensional information [12].
The functional model diagram of multi-sensor data fusion is shown in Fig. 1.
As a matter of fact, multi-sensor information fusion is a functional simulation of the human mind's integrated processing of complex problems. When compared with single sensor, the use of multi-sensor information fusion technology can enhance the system survivability, reliability and robustness of the whole process, credibility of data, accuracy, duration and spatial coverage, realize real-time and information utilization, etc., in addressing the problems of exploration, tracking, and object identification.
Translated with http://www.DeepL.com/Translator (free version).
2.2 Image processing
Image processing refers to the use of a computer to process the image to be recognized to meet the subsequent needs of the recognition process. It is mainly divided into two steps: image preprocessing and image segmentation [13]. Image preprocessing mainly includes image restoration and image transformation. Its main purpose is to remove interference and noise in the image, enhance the useful information in the image, and improve the detectability of the target object. At the same time, due to the real-time requirements of image processing, it is necessary that the image is re-encoded and compressed to reduce the complexity and computational efficiency of subsequent algorithms. The existing image segmentation methods mainly include edge-based segmentation, threshold-based segmentation, and region-based segmentation [14].
2.3 Significant area detection
The salient area detection aims to find the most salient target area in the picture. When observing a picture, a specific target object often attracts our attention immediately. When processing image scene information, it is possible to obtain priority processing target areas through saliency area detection, so as to rationally allocate computing resources and reduce the amount of calculation. Therefore, detecting the saliency area of the image has higher application value. Generally speaking, saliency area detection is divided into two types: top-down and bottom-up methods. The top-down approach is task-driven and requires the use of high-level information for supervised training and learning. This method has complex cross-disciplinary issues, because it probably requires a combination of neurology, physiology, and other related subject areas. The bottom-up method is data-driven, which mainly uses low-level information such as color contrast and spatial layout characteristics to obtain the saliency target area. This method is simple and fast to operate. Relevant studies in recent years have shown that this type of salient detection method has good results and has been widely used in image segmentation, target recognition, visual tracking, and other fields [15].
-
1.
Bottom-up significant area detection
Bottom-up data-driven salient area detection has nothing to do with human cognition. The salient value is calculated by extracting the underlying features of the image. These features can be color, direction, brightness, or texture. Bottom-up salient area detection methods can be divided into salient area detection methods based on local contrast and salient area detection methods based on global contrast.
-
2.
Top-down saliency area detection
The top-down salient area detection is a task-driven computing model. For example, if you want to find a person, you will pay attention to human-shaped objects when you look at the image; when you want to find a dog, you will ignore the person in the image and pay attention to the dog. Therefore, the top-down salient area detection is generally to find a certain kind of things [16]. Most of the top-down salient area detection requires training on a large amount of image data, which is computationally intensive and will get different results due to different tasks, which is not universal.