2.1 Image blurring: synthetic and real
To synthesize a blurred images, it can be generated by frame-by-frame blurring. For blurred spatially varying images, there is no camera response function (CRF) estimation technique [16], and the CRF can be approximated as a known CRF, as shown in Eq. (1).
$$\begin{aligned} g(I_{S[i]}) = I_{S'[i]^{\frac{1}{\gamma }}}. \end{aligned}$$
(1)
In the above formula, \(\gamma\) is a parameter, it is generally considered to be equal to 2.2, and the real clear image \(I_{S[i]}\) can be obtained from the observed clear image \(I_{S'[i]}\).
The blurred image of the simulation can then be obtained by the following equation.
$$\begin{aligned} I_{B} \simeq g\left( \frac{1}{M}\sum _{t=1}^{M}I_{S[t]}\right) \end{aligned}$$
(2)
where M is the number of sharp frames.
The real-world images are captured during continuous exposure, so a true blur is the integration of multiple clear frames. This can be expressed as:
$$\begin{aligned} I_{B} = g\left( {\frac{1}{T}\int _{t=0}^{T}I_{S(t)}dt}\right). \end{aligned}$$
(3)
Before Jaesung Rim et al. proposed RealBlur, a real-world blur dataset that can be used for deep learning, deep learning methods in the field of image deblurring were mostly trained and tested on synthetic simulated blur datasets, where non-professional image collection devices could only save blurred images and not the corresponding clear images when taking blurred images. The RealBlur dataset is the first publicly available real blur dataset that can be used for deep learning, using multiple cameras and optical devices to capture both blurred and clear image pairs using professional image collection devices.
2.2 Image deblurring
Image deblurring is the operation of recovering, or deblurring, a given blurred image to obtain the corresponding clear image.
Non-blind deblurring refers to the deblurring of an image by a given known blur kernel, while the blind deblurring problem refers to the estimation of the original image X and the blur kernel Z from a given noisy image Y.
The blind deblurring process can be expressed as:
$$\begin{aligned} \{{\hat{X}},{\hat{Z}}\} = arg min\Vert Z \oplus X - Y \Vert ^{2}_{2} + \phi (X) + \theta (Z) \end{aligned}$$
(4)
where \(\phi (X)\) and \(\theta (Z)\) are the regularization terms and possible blur kernels of the expected clear image, respectively.
Non-blind deblurring refers to the process of recovering clear images according to the blur kernel when the blur kernel is known. Blind deblurring algorithm refers to the process of estimating the blur kernel and clear image only through the blur image, or directly abandoning the estimation process of the blur kernel to directly restore the clear image when the blur kernel is unknown. From the perspective of application, non-blind deblurring is more widely used. It also can be divided into traditional optimization-based methods and deep learning-based methods, especially the convolution neural network. All most of the deblurring methods based on deep learning do not need to estimate the blur kernel. A large majority of traditional optimization-based deblurring methods guide the maximum posteriori probability (MAP) process by assuming a priori [17]. For example, Pan et al. [18] proposed an image deblurring method using dark channel prior, Xu et al. [19] proposed a L0 gradient prior with clear edge information. Due to the rapid development of deep learning in recent years and the strong performance of convolution neural network, which is widely used in the field of computer vision, there are also a large number of methods using deep learning and convolution neural network in the field of image deblurring, which can be roughly divided into the methods of using convolution neural network to estimate the blur kernel. For example, Sun et al. [20] used CNN to estimate the probability distribution of the blur kernel. With the end-to-end deblurring method without estimating the blur kernel, most of the deblurring methods using CNN do not need to estimate the blur kernel, and directly realize the image deblurring process, such as Gong et al. [21] proposed a fully convolutional neural network, which can estimate the blur kernel at pixel level. Nah et al. [10] proposed a multi-scale convolutional neural network, which can use the feature association between images of different sizes, such as \(64 \times 64\), \(128 \times 128\), \(256 \times 256\), to achieve more refined image deblurring. Tao et al. [22] also selected the multi-scale structure, and added LSTM (long short-term memory) into it to propose a multi-scale recurrent neural network. However, since the multi-scale convolutional neural network from coarse to fine takes more time in deconvolution operation, Zhang et al. [23] proposed a multi-slice structure, which cuts the input image into multiple slices, and at the same time, multiple encoders and decoders can be arranged in multiple ways to achieve better deblurring performance. The application of image deblurring in IoV, such as filter-DeblurGAN proposed by Zhou et al. [24], filter-DeblurGAN can judge whether the image is blurred, and can be directly applied to the vehicle logo detection ( VLD ) task. With the vehicle logo detection method, the motion-blurred vehicle logo image can be directly detected.
2.3 GAN
GAN (generative adversarial networks) is a deep learning model, which is one of the most promising unsupervised learning methods on complex distribution in recent years. In the early generative adversarial network, generator and discriminator are not required to be neural networks, and only the corresponding generative and adversarial functions are required. GAN belongs to the generative model (GM), which can be used for the modeling of supervised learning, semi-supervised learning and unsupervised learning. Its application in the field of image can be divided into generating images from random noise or text, and completing the conversion from image to image. At the same time, image restoration can be regarded as the conversion from low-quality image to clear image. The structure of GAN is shown in Fig. 1. GAN contains two competing networks generator and discriminator. The idea of confrontation in GAN can be traced back to Nash equilibrium of game theory. The two sides of confrontation are generator and discriminator. The generator is responsible for generating samples as close as possible to the target image and cheating the discriminator, while the discriminator is responsible for distinguishing the generated image from the samples generated by the generator. The objective function of the confrontation can be described as follows:
$$\begin{aligned} \min _{G}\max _{D}V(D,G) = E_{x \sim P_{data(x)}}[\log D(x)] + E_{z \sim P_{z(x)}}[\log (1 - D(G(z)))]. \end{aligned}$$
(5)
x indicates that the real sample comes from the real data distribution \(P_{data(x)}\), \(E_{x \sim P_{data(x)}}\) is the expectation of inputting clear images. \(D( \cdot )\) represents the output of D, \(G( \cdot )\) represents the output of G. The aim of G is to minimize it while the aim of D is to maximize it.
Since its invention, GAN has been the focus of research in the field of deep learning, and there are various variants, such as Wasserstein GAN (WGAN) [25], which improves GAN in terms of loss function and training strategy, proposes the \(Earth-Mover(EM)\) distance W(q, p), which is informally defined as the minimum cost of transporting mass in order to transform the distribution q into the distribution p. They also proposed a constraint strategy which can make the discriminator get rid of pattern collapse. The value function of WGAN is as follows:
$$\begin{aligned} {\min _{G}\max _{D \in {\mathcal {D}}} \underset{{_{x \sim {\mathbb {P}}_{r}}}}{{{\mathbb {E}}}} [D(x)] - \underset{_{{\tilde{x}} \sim {\mathbb {P}}_{g}}}{{\mathbb {E}}} [D({\tilde{x}})]} \end{aligned}$$
(6)
where \({\mathcal {D}}\) is the set of 1-Lipschitz functions and \({\mathbb {P}}_{g}\) is once again the model distribution implicitly defined by \({\tilde{x}} = G(z), z \sim p(z)\). By clipping the weights of the discriminator to lie within a compact space [\(-c, c\)], they let the discriminator follows the 1-Lipschitz constraint, with the help of EM distance, WGAN further improving the performance of GAN; WGAN-GP [26], which improves on WGAN, effectively preventing the gradient disappearance, gradient explosion and the difficulty of weight restriction that may occur in WGAN, etc. They found that the weights clipping strategy of WGAN pushes weights toward two values, so they propose gradient penalty strategy to enforce the 1-Lipschitz constraint. The value function of the discriminator of WGAN-GP is as follows:
$$\begin{aligned} {L = \underset{_{{\tilde{x}} \sim {\mathbb {P}}_{g}}}{{\mathbb {E}}} [D({\tilde{x}})] - \underset{{_{x \sim {\mathbb {P}}_{r}}}}{{{\mathbb {E}}}} [D(x)] + \lambda \underset{_{{\hat{x}} \sim {\mathbb {P}}_{{\hat{x}}}}}{{\mathbb {E}}} [((\Vert \nabla _{{\hat{x}}}D({\hat{x}}) \Vert _2)-1 )^{2}]} \end{aligned}$$
(7)
where the first two part is the same as WGAN without the 1-Lipschitz constraint, while the last part of L, which is gradient penalty loss, can enforce the 1-Lipschitz constraint. \({\mathbb {P}}_{{\hat{x}}}\) is sampling uniformly along straight lines between pairs of points sampled from the data distribution \({\mathbb {P}}_{r}\) and the generator distribution \({\mathbb {P}}_{g}\). \(\lambda\) is set to 10 in original WGAN-GP.
2.4 Transfer learning
In this work, our transfer learning component focuses on the transformation from image deblurring to image de-raining. We found that using transfer learning, the model pre-trained by a large number of image deblurring can achieve good results in image de-raining tasks which is similar to image deblurring after fine-tuning with a small amount of image de-raining datasets.
2.5 Image de-raining
In simple and broad terms, the band rain image model can be defined as:
$$\begin{aligned} Y = X + W \end{aligned}$$
(8)
where Y represents the rain-bearing image, X represents the original clear image, and W represents rain-streak component. Thus, the goal of a single image de-raining task is to recover the original clear image X from a given rain-bearing image Y. Similar to image deblurring, which is also a low-level vision task, earlier image de-raining methods also used image priors to solve it. For example, sparse coding-based methods [27], GMM-based (Gaussian mixture model) [28]-based methods and patchrank prior methods [29]. Applications of single image de-raining in IoV, such as Sun et al. [24], proposed a convolution neural network with rainy images as input, which can directly recover clear images in the case of atmospheric veiling effects caused by distant rain-streak accumulation.