Decision-level fusion detection method of visible and infrared images under low light conditions

Hu, Zuhui; Jing, Yaguang; Wu, Guoqing

doi:10.1186/s13634-023-01002-5

Research
Open access
Published: 21 March 2023

Decision-level fusion detection method of visible and infrared images under low light conditions

EURASIP Journal on Advances in Signal Processing volume 2023, Article number: 38 (2023) Cite this article

2808 Accesses
4 Citations
Metrics details

Abstract

Aiming at the problem of poor effect of object detection with visible images under low light conditions, the decision-level fusion detection method of visible and infrared images is studied. Taking YOLOX as the object detection network based on deep learning, a decision-level fusion detection algorithm of visible and infrared images based on light sensing is proposed. Experiments are carried out on LLVIP dataset, which is a visible-infrared paired dataset for low light vision. Through comparative analysis, it is found that the decision-level fusion algorithm based on Soft-NMS and light sensing obtained the optimal AP value of 69.0%, which is 11.4% higher than the object detection with visible images and 1.1% higher than the object detection with infrared images. The experimental results show that the decision-level fusion algorithm based on Soft-NMS and light sensing can effectively fuse the complementary information of visible and infrared images, and improve the object detection effect under low light conditions.

1 Introduction

With the development of internet information technology, especially deep learning technology, artificial intelligence has entered a new stage of development. Object detection is one of research hotspots in the field of artificial intelligence and deep learning currently. Object detection is a computer vision technology, which can recognize and locate objects in images or videos. Compared with image classification, object detection can accurately detect the category and position of each object contained in each image, which is a further image understanding [1]. Object detection supports many vision tasks, such as instance segmentation, pose estimation, object tracking and action recognition. These computer vision tasks can meet the application requirements in many real scenes. Therefore, object detection has gradually become a research hotspot in the field of computer vision in recent years. The traditional object detection used to utilize manual features and shallow network architecture, with poor performance and slow development. After deep learning methods were applied to the field of object detection, object detection has achieved a breakthrough in rapid development, and the detection speed and detection accuracy have been greatly improved, even surpassing human vision in some specific scenes. Deep learning allows computing models composed of multiple processing layers to learn data representation with multiple abstract levels, and can discover complex structures from dataset by using back propagation algorithm, greatly improving the technical level of visual object detection and recognition [2].

The object detection algorithm based on deep learning can be divided into two-stage object detection algorithm and one-stage object detection algorithm according to whether to generate a region proposal in advance. The two-stage object detection algorithm generates a region proposal in the first stage, and then performs classification and regression in the second stage. The representative algorithms are R-CNN series algorithms [3,4,5,6]. The one-stage object detection algorithm regards the object detection task as a regression task for the whole image, and the representative algorithms are YOLO series algorithms [7,8,9,10,11] and SSD series algorithms [12,13,14]. According to whether anchors are defined, object detection algorithms based on deep learning can be divided into anchor-based object detection algorithms and anchor-free object detection algorithms. An anchor box is a predefined detection box with different sizes. The anchor-based object detection algorithms define the anchors by the scene and data firstly, and then classify and regress the contents in the anchors. The anchor-free object detection algorithm can directly predict the boundary box of the object without defining the anchors in advance. Compared with anchor-based object detection algorithm, anchor-free object detection algorithm has obvious advantages in reducing network parameters, improving model generalization ability and detection speed. Considering the requirements for detection accuracy and detection speed in practical applications, one-stage anchor-free object detection algorithm is the main development direction of current object detection algorithms, such as CenterNet [15], CornerNet [16], FCOS [17], YOLOX [11].

The above object detection algorithms are all based on visible images. In some special scenes, such as low light conditions, the detection effect of visible images is poor, and the detection effect of infrared images is better due to the small influence of light conditions. Visible and infrared images are collected by image sensors with different wavelengths and have their own advantages. The visible images obtain the reflection characteristics of the objects, and have significant advantages in the performance of detail information such as color and texture, but they are easily affected by low light, uneven light, shadow and occlusion. The infrared images obtain the thermal radiation characteristics of the objects, mainly presenting the temperature information of the objects, which can avoid the influence of light and occlusion, but have the disadvantages of unclear detailed information, low contrast and poor imaging effect [18]. The fusion of visible images and infrared images can obtain dual-band complementation information and improve object detection performance. The fusion of visible and infrared images is called multi-spectral fusion or multi-modal fusion, and object detection based on the fusion of visible and infrared images is also called multi-spectral object detection or multi-modal object detection.

There are three ways to fuse visible and infrared images, which are pixel-level fusion, feature-level fusion and decision-level fusion, as shown in Fig. 1. Pixel-level fusion is to perform feature extraction and object detection after registration and pixel fusion of the source image at the front end of the object detection network. Pixel-level fusion is usually computationally intensive, prone to noise interference and redundant information, which is not suitable for real-time application requirements. Feature-level fusion is to perform feature extraction and feature fusion through different source images, and then perform object detection based on the fused features. Feature-level fusion reduces the amount of calculation, and retains most of the information, but still loses some details. Decision-level fusion is to fuse the object detection results of different source images at the back end of the object detection network to make a global optimal decision. Decision-level fusion algorithm can be efficiently compatible with the feature information of different source images, which has low complexity, strong fault tolerance, and good real-time and adaptability [19].

According to different fusion stages, visible and infrared images fusion can also be divided into four forms: early fusion, middle fusion, late fusion and score fusion. The early fusion is pixel-level fusion, the middle fusion and late fusion are feature-level fusion, and the score fusion is decision-level fusion. The four forms of visible and infrared images fusion can be shown in Fig. 2 [20].

Aiming at the low detection performance of visible object under low light conditions, we use decision-level fusion method to perform dual-band fusion detection on visible and infrared images, and use light sensing strategy to optimize decision-level fusion detection effect and improve object detection performance.

Some scholars have carried out relevant research. Tang et al. [21] proposed a decision-level fusion detection method of infrared and visible light based on depth learning, and the experimental results showed that the dual-band fusion detection method based on depth learning had better detection performance and stronger robustness than the single-band detection method. They further applied it to object tracking, established a decision-level fusion tracking model of infrared and visible light based on deep learning, and conducted comparative experiments of single-band tracking and dual-band fusion tracking, achieving higher precision and robustness [22]. Bai et al. [23] used YOLOv3 as the object detection network to propose a visible and infrared image fusion object detection algorithm based on decision-level fusion. The detection results of visible and infrared images were weighted and fused to achieve fast object detection based on decision-level fusion. Zhang et al. [24] proposed a complementary and precise vehicle fusion detection approach in RGB-T images, which combined the detection results of both visible and infrared images based on a decision-level fusion strategy, and applied it to vehicle detection in traffic monitoring. In general, for the dual-band fusion detection of visible and infrared images using decision-level fusion methods, although the object detection networks and datasets used in different literatures are different, they all improve the object detection effect to a certain extent. In this study, YOLOX is taken as the object detection network, and the decision-level fusion detection of visible and infrared images is studied on the basis of existing research.

2 YOLOX object detection network

In the YOLO series of algorithms, the original YOLO algorithm is anchor-free, but its precision is lower than that of anchor-based one-stage object detection algorithm in the same period, so the later YOLOv2, YOLOv3, YOLOv4, and YOLOv5 are all anchor-based. With the development of anchor-free object detection technology, YOLOX algorithm is changed to anchor-free again, and the performance is improved a lot, surpassing other YOLO algorithms. YOLOX performs well in both detection performance and detection speed, so we use YOLOX as the object detection network.

YOLOX not only changed the detector to an anchor-free mode, but also integrated some of the latest advanced detection technologies, including Mosaic and MixUp data enhancement strategies, decoupled head, multi-positives, SimOTA label allocation strategies, etc. YOLOX divides the network model into YOLOX-Nano, YOLOX-Tiny, YOLOX-S, YOLOX-M, YOLOX-L, and YOLOX-X according to the model size. YOLOX network models of all sizes have achieved a good balance between detection precision and detection speed, and the overall performance is excellent. YOLO-Nano only uses 0.91 M parameters and 1.08GFLOPs to obtain 25.3% AP on COCO, which is higher than 1.8% AP of NanoDet. The YOLOX-S realized 39.6% AP on COCO, with 9.0 MB parameters and 26.8GFLOPs. Compared with the most representative YOLOv3 in the YOLO series, the parameters amount of YOLOX-S is reduced by 85.71%, and the calculation amount is reduced by 82.96%, but its AP exceeds YOLOv3 by 1.1%. The parameters amount of YOLOX-L is roughly the same with YOLOv5-L. YOLOX-L achieved 50.0% AP On COCO, which is more than 1.8% of the AP of YOLOv5-L. YOLOX-L also won the first place in the 2021 CVPR Streaming Perception Challenge [11]. We use YOLOX-S as the baseline model. The YOLOX-S network structure is shown in Fig. 3.

The YOLOX network structure is divided into four parts: Input, Backbone, Neck, and Head. On the input side, YOLOX uses Mosaic and MixUp data enhancement strategies to preprocess the input image. The main network part uses the CSPDarknet network structure composed of CBS convolution module and CSP residual module to enhance the feature extraction ability of the network. CBS convolution module consists of normal convolution layer, batch normalization layer and SiLU activation function. The CSP residual module consists of three CBS modules and several Bottleneck residual modules, which are connected by residuals, and features are extracted by stacking residual structures. The Bottleneck residual module uses the residual structure in the ResNet network to build the network deeper, so that the network can extract deeper features while avoiding gradient disappearance or explosion. Neck uses PAFPN structure for multi-scale feature fusion. The PAFPN structure is an improvement of the FPN feature pyramid structure. Compared with the normal FPN, PAFPN adds a bottom-up down-sampling path to further fuse the features of different scales and improve the semantic recognition ability of the network. Head uses three decoupling heads to calculate and split the output of three different scale sizes after feature fusion into category probability, position and confidence, and finally merges and outputs the final prediction results.

3 Decision-level fusion of visible and infrared images

3.1 Decision-level fusion model

The decision-level fusion detection of visible and infrared images uses visible and infrared images as inputs, and uses YOLOX network to detect objects, respectively. At the back end of the network, the decision-level fusion of visible image detection results and infrared image detection results is performed to output fusion detection results. In order to improve the decision-level fusion effect, the input visible and infrared images should be collected from the same natural scene and registered. The decision-level fusion network model can be shown in Fig. 4.

3.2 Decision-level fusion algorithm

Visible and infrared images have different imaging effects on the same natural scene and object due to different imaging principles and bands. Especially under low light conditions, they can complement each other. The basic idea of the visible and infrared images decision-level fusion algorithm is to combine the candidate boxes detected by the visible image and the candidate boxes detected by the infrared image for each pair of visible and infrared images to perform IOU non-maximum suppression operation. Each object retains a candidate box with the highest confidence as the final object detection box [21]. The non-maximum suppression operation can use the normal NMS algorithm or the Soft-NMS algorithm [25]. The steps of visible and infrared images decision-level fusion algorithm are as follows:

(1)
A pair of visible images and infrared images from the same natural scene after registration are detected by single-band image object detection through YOLOX network. For each object, $m$ visible image candidate boxes $(B_{i}^{visible} ,C_{i}^{visible} )_{i = 1,2, \cdots ,m}$ and $n$ infrared image candidate boxes $(B_{j}^{infrared} ,C_{j}^{infrared} )_{j = 1,2, \cdots ,n}$ can be obtained. $B_{i}^{visible}$ and $B_{j}^{infrared}$ are composed of four values of the candidate box, which are the top left corner coordinate and the bottom right corner coordinate. $C_{i}^{visible}$ and $C_{j}^{infrared}$ are confidence values for the candidate box;
(2)
Combine the visible image candidate boxes $(B_{i}^{visible} ,C_{i}^{visible} )_{i = 1,2, \cdots ,m}$ and infrared image candidate boxes $(B_{j}^{infrared} ,C_{j}^{infrared} )_{j = 1,2, \cdots ,n}$ obtained in step (1) for non-maximum suppression operation, and each object retains a candidate box with the highest confidence as the final object detection box.
(3)
Repeat operations (1) and (2) for each pair of visible and infrared images to obtain the final fusion object detection result.

The decision-level fusion algorithm of visible and infrared images is described in detail as Algorithm 1.

3.3 Decision-level fusion algorithm based on light sensing

There is a defect in the above algorithm. Under low light conditions, the visible image is easily affected by light intensity, and the generated candidate boxes have serious missed detection and false detection. When merging non-maximum suppression operations with the candidate boxes generated by infrared images, it is easy to suppress the correct candidate boxes. To address this problem, an improved decision-level fusion algorithm based on light sensing is proposed. The basic idea of the algorithm is to adaptively suppress some visible light candidate boxes by calculating the light intensity of candidate box area, thereby improving the decision-level fusion effect of visible light and infrared images. Using the idea of Soft-NMS algorithm for reference, reduce the confidence of visible light candidate box flexibly through Gaussian function operation. The lower the light intensity, the lower the confidence of the visible light candidate box. Assuming that the light intensity is roi_light, the light sensing adaptive suppression method is defined as follows:

$$F(roi\_light) = e^{{ - \frac{{(1 - roi\_light/255)^{2} }}{\sigma }}}$$

(1)

The corresponding function curve is shown in Fig. 5.

The steps of decision-level fusion algorithm based on light sensing are as follows:

(1)
A pair of visible images and infrared images from the same natural scene after registration are detected by single-band image object detection through YOLOX network. For each object, $m$ visible image candidate boxes $(B_{i}^{visible} ,C_{i}^{visible} )_{i = 1,2, \cdots ,m}$ and $n$ infrared image candidate boxes $(B_{j}^{infrared} ,C_{j}^{infrared} )_{j = 1,2, \cdots ,n}$ can be obtained. $B_{i}^{visible}$ and $B_{j}^{infrared}$ are composed of four values of the candidate box, which are the top left corner coordinate and the bottom right corner coordinate. $C_{i}^{visible}$ and $C_{j}^{infrared}$ are confidence values for the candidate box;
(2)
Carry out light sensing adaptive suppression operation with formula (1) for $m$ visible light image candidate boxes $(B_{i}^{visible} ,C_{i}^{visible} )_{i = 1,2, \cdots ,m}$ obtained in step (1);
(3)
Combine the visible image candidate boxes $(B_{i}^{visible} ,C_{i}^{visible} )_{i = 1,2, \cdots ,m}$ and infrared image candidate boxes $(B_{j}^{infrared} ,C_{j}^{infrared} )_{j = 1,2, \cdots ,n}$ obtained in step (1) for non-maximum suppression operation, and each object retains a candidate box with the highest confidence as the final object detection box.
(4)
Repeat operations (1), (2) and (3) for each pair of visible and infrared images to obtain the final fusion object detection result.

The decision-level fusion algorithm of visible and infrared images based on light sensing is described in detail as Algorithm 2.

4 Results and discussion

4.1 LLVIP dataset

The experimental dataset is the LLVIP dataset from Beijing University of Posts and Telecommunications [26]. LLVIP dataset is a visible and infrared images pedestrian detection dataset collected under low light conditions. The dataset contains 15,488 pairs of pictures, a total of 30,976 pictures. The dataset is a high quality visible-infrared paired dataset. The resolution of images is 1280 × 1024. All images in the dataset are collected in dark scenes at night, and each pair of visible and infrared images are strictly aligned in time and space. The dataset labels pedestrians in both visible light images and infrared images, and innovatively uses the reverse mapping method to label pedestrian objects that are difficult for human eyes to recognize in visible images under low light conditions. The experiment sets the training set and test set at a ratio of 4: 1, and the proportion of the training set used for verification is 20%. Since the visible images and infrared images are used as the input images, respectively, the number of images in the training set is 12391, and the number of images in the test set is 3097.

4.2 Experimental environment

The experiment is trained and tested on the Ubuntu 18.04 operating system. Intel Xeon CPU E5-2600 V4 is used as CPU, and NVIDIA TITAN Xp 12 GB × 2 is used as GPU. PyTorch is used as the deep learning framework, and GPU is used for computing. YOLOX-S object detection network is adopted. The input image size is 640 × 640. In the training process, the training batch size is set to 16, and the SGD optimizer is selected. The initial learning rate is set to 0.01, and it is trained 300 epochs. The $\sigma$ value of light sensing adaptive suppression method is 0.5.

4.3 Qualitative analysis of experimental results

In order to verify the effectiveness of decision-level fusion object detection method, experiments are carried out on LLVIP dataset. The detection results of decision-level fusion are compared with the results of using YOLOX-S network to detect visible and infrared images, respectively, as shown in Fig. 6.

It can be observed in Fig. 6 that the overall effect of YOLOX object detection is good, and most objects can be correctly detected. However, under different light conditions, both visible object detection and infrared object detection have missed detection more or less. In the low light scene, such as (a) and (b) in Fig. 6, there is a more serious phenomenon of missed detection in the visible object detection, and the infrared object detection effect is better. In the scene with good light conditions, as shown in (c) and (d) in Fig. 6, some infrared objects are undetected, and the visible detection effect is better. Therefore, decision-level fusion can integrate the effects of visible object detection and infrared object detection, reduce the rate of missed detection, and improve the overall performance of object detection.

4.4 Quantitative analysis of experimental results

Average Precision (AP) is used as the evaluation index to quantitatively evaluate the performance of object detection. The AP is calculated on the test set, and various algorithms are compared and analyzed to further verify the detection performance. The comparison of experimental results is shown in Table 1. The optimal detection results are set in bold font in the table.

Table 1 Comparison of experimental results

Full size table

It can be observed in Table 1 that under low light conditions, no matter what object detection network model is used, the detection effect of infrared image is better than that of visible image. Especially for small object detection, the small object detection AP of visible images based on YOLOX is only 1.6%, and that of infrared images is 54.9%. YOLOX is superior to YOLOv5 and YOLOv3 in both visible and infrared image object detection. The visible image object detection based on YOLOX is 4.9% higher than YOLOv5 and 11% higher than YOLOv3. The infrared image object detection based on YOLOX is 0.9% higher than YOLOv5 and 9.7% higher than YOLOv3. It indicates the significant performance of YOLOX network model. The decision-level fusion algorithm adopts two non-maximum suppression algorithms, which are normal NMS and Soft-NMS, and adds the light sensing method to the ablation experiment. The experimental results show that only using normal NMS or Soft-NMS for decision-level fusion of dual band images, the AP is not improved, but inferior to the object detection effect of infrared images. Especially for decision-level fusion based on Soft-NMS algorithm, the final detection performance is seriously degraded. It indicates that the complementary information of visible and infrared images is not only not effectively utilized, but also produces adverse interference. With the addition of the light sensing algorithm (roi_light), the performance of normal NMS and Soft-NMS has been improved more or less, surpassing the visible image object detection and infrared image object detection. The decision-level fusion algorithm based on Soft-NMS and light sensing obtains the optimal AP value. It shows that the light sensing algorithm can effectively improve the decision-level fusion effect of visible and infrared images.

4.5 Computational complexity

By introducing the infrared images into object detection, the computational complexity is increased. According to the decision level fusion network model in Fig. 4, decision-level fusion detection of visible and infrared images uses visible and infrared images as inputs, and uses YOLOX network to detect objects, respectively. Therefore, the computational complexity of the decision-level fusion detection is doubled. In addition, the non-maximum suppression and light intensity calculation of candidate boxes in the fusion algorithm may also increase a certain amount of computational cost. The computational complexity affects the running speed and the running time of detectors. However, the influence of computational complexity on detection speed is controllable. Because we can increase the performance of computing hardware such as GPU to address this problem. In some application scenarios such as intelligent transportation, detection performance is crucial for driving safety. Therefore, although the introduction of infrared images increases the computational complexity, it is still worth exploring to obtain higher detection performance. Object detection performance and computational complexity need to be balanced according to the application requirements.

5 Conclusion

This paper presents an experimental study on the decision-level fusion detection method of visible and infrared images under low light conditions. Taking YOLOX as the object detection network, experiments are carried out on the LLVIP dataset for the object detection of visible and infrared images under low light conditions. The experimental results show that the performance of YOLOX object detection network is better than YOLOv3 and YOLOv5. Based on the decision-level fusion idea of visible and infrared images in Reference [21], YOLOX is used as the object detection network to reproduce the decision-level fusion algorithm of visible and infrared images based on normal NMS, and Soft-NMS is used to improve the algorithm. However, the performance of the decision-level fusion algorithm based on Soft-NMS is lower than that of normal NMS. An improved decision-level fusion algorithm based on light sensing is proposed. By calculating the light intensity of candidate box area, some visible light candidate boxes with low light intensity are adaptively suppressed, so as to improve the decision-level fusion effect of visible light and infrared images. The experimental results demonstrate that the decision-level fusion algorithm based on Soft-NMS and light sensing can obtain the optimal AP value. The decision-level fusion detection algorithm based on light sensing can effectively utilize the complementary information of visible and infrared images to improve the final detection results. The experimental results also show that the detection effect of various decision-level fusion detection algorithms on small objects had not been improved, and the algorithm needs to be improved in the future research to further improve the fusion detection performance of small targets.

Availability of data materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

AP:: Average precision
SGD:: Stochastic gradient descent
FPN:: Feature pyramid networks
PAFPN:: Path augmentation feature pyramid networks

References

P.F. Felzenszwalb, R.B. Girshick, D. McAllester et al., Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
R. Girshick, J. Donahue, T. Darrell et al., Rich feature hierarchies for accurate object detection and semantic segmentation. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2014), pp. 580–587
K. He, X. Zhang, S. Ren et al., Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
R.B. Girshick, Fast R-CNN. in Proceedings of the IEEE International Conference on Computer Vision, (2015), pp. 1440–1448
S. Ren, K. He, R. Girshick et al., Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
J. Redmon, S. Divvala, R.B. Girshick et al., You only look once: unified, real-time object detection. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), pp. 779–788
J. Redmon, A. Farhadi, YOLO9000: better, faster, stronger. in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, (2017), pp. 6517–6525
J. Redmon, A. Farhadi, YOLOv3: an incremental improvement (2018). Preprint http://arxiv.org/abs/1804.02767
A. Bochkovskiy, C. Wang, H. Liao, YOLOv4: optimal speed and accuracy of object detection (2020). Preprint http://arxiv.org/abs/2004.10934
Z. Ge, S. Liu, F. Wang et al., YOLOX: exceeding YOLO series in 2021. (2021). Preprint http://arxiv.org/abs/2107.08430
W. Liu, D. Anguelov, D. Erhan et al., SSD: single shot multibox detector. in Proceedings of the European Conference on Computer Vision, (2016), pp. 21–37.
C. Fu, W. Liu, A. Ranga et al., DSSD: deconvolutional single shot detector (2017). Preprint http://arxiv.org/abs/1701.06659
Z. Li, F. Zhou, FSSD: feature fusion single shot multibox detector (2018). Preprint http://arxiv.org/abs/1712.00960
X. Zhou, D. Wang, P. Krähenbühl, Objects as points (2019). Preprint http://arxiv.org/abs/1904.07850
H. Law, J. Deng, CornerNet: detecting objects as paired keypoints (2019). Preprint http://arxiv.org/abs/1808.01244
Z. Tian, C. Shen, H. Chen et al., FCOS: fully convolutional one-stage object detection. in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), pp. 9626–9635
E. Fendri, R.R. Boukhriss, M. Hammami, Fusion of thermal infrared and visible spectra for robust moving object detection. Pattern Anal. Appl. 20(10), 1–20 (2017)
MathSciNet Google Scholar
T. Meng, X. Jing, Z. Yan et al., A survey on machine learning for data fusion. Inf. Fusion 57, 115–129 (2020)
Article Google Scholar
Q. Fang, Z. Wang, Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery. Pattern Recogn. 130, 108786 (2022)
Article Google Scholar
C. Tang, Y. Ling, H. Yang et al., Decision-level fusion detection for infrared and visible spectra based on deep learning. Infrared Laser Eng 48(6), 456–470 (2019)
Google Scholar
C. Tang, Y. Ling, H. Yang et al., Decision-level fusion tracking for infrared and visible spectra based on deep learning. Laser Optoelectron. Progress 56(7), 217–224 (2019)
Google Scholar
Y. Bai, Z. Hou, X. Liu et al., An object detection algorithm based on decision-level fusion of visible light image and infrared image. J Air Force Eng Univ 21(6), 53–59 (2020)
Google Scholar
X. Zhang, X. Lu, L. Peng, A complementary and precise vehicle detection approach in RGB-T images via semi-supervised transfer learning and decision-level fusion. Int. J. Remote Sens. 43(1), 196–214 (2022)
Article Google Scholar
N. Bodla, B. Singh, R. Chellappa et al., Improving object detection with one line of code (2017). Preprint http://arxiv.org/abs/1704.04503
X. Jia, C. Zhu, M. Li et al., LLVIP: a visible-infrared paired dataset for low-light vision (2021). Preprint http://arxiv.org/abs/2108.10831

Download references

Acknowledgements

The authors acknowledge the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant: KYCX19_2055).

Author information

Authors and Affiliations

School of Information Science and Technology, Nantong University, Nantong, China
Zuhui Hu & Guoqing Wu
School of Educational Sciences, Nantong University, Nantong, China
Yaguang Jing

Authors

Zuhui Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yaguang Jing
View author publications
You can also search for this author in PubMed Google Scholar
Guoqing Wu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ZH carried out the study of decision-level fusion detection of visible and infrared images under low light conditions and drafted the manuscript. YJ participated in the experiment and the analysis of the results and helped to draft the manuscript. GQ conceived of the study, and participated in its design and coordination. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Zuhui Hu or Guoqing Wu.

Ethics declarations

Competing interests

The authors have no competing interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hu, Z., Jing, Y. & Wu, G. Decision-level fusion detection method of visible and infrared images under low light conditions. EURASIP J. Adv. Signal Process. 2023, 38 (2023). https://doi.org/10.1186/s13634-023-01002-5

Download citation

Received: 17 November 2022
Accepted: 03 March 2023
Published: 21 March 2023
DOI: https://doi.org/10.1186/s13634-023-01002-5

Decision-level fusion detection method of visible and infrared images under low light conditions

Abstract

1 Introduction

2 YOLOX object detection network

3 Decision-level fusion of visible and infrared images

3.1 Decision-level fusion model

3.2 Decision-level fusion algorithm

3.3 Decision-level fusion algorithm based on light sensing

4 Results and discussion

4.1 LLVIP dataset

4.2 Experimental environment

4.3 Qualitative analysis of experimental results

4.4 Quantitative analysis of experimental results

4.5 Computational complexity

5 Conclusion

Availability of data materials

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords