Video is a compound of image sequences, including both spatial and temporal information. Accordingly, our ASTC video noise filter adapts from temporal to spatial noise filter. We will detail the spatial filter, the temporal filter, and the adaptive fusion strategy in this section.
5.1. The Spatial Connective Trilateral Filter
As mentioned in Section 4, NCV is a good statistic for impulse noise detection, whereas the bilateral filter [2] well suppresses Gaussian noise. Thus, we incorporate NCV into the bilateral filter to form a trilateral filter in order to remove mixed noise.
For a pixel p
xy
, its new intensity
after bilateral filtering is computed as
where
and
represent spatial and radiometric weights, respectively [2]. In our experiments,
and
are fixed to 2 and 30, respectively. The formula is based on the assumption that pixels locating nearer and having more similar intensities should have larger weights.
As to images with noises, intuitively, the signal pixels should have larger weights than the noise pixels. Thus, similar to the above, we introduce a third weighting function
to measure the probability of a pixel being a signal pixel:
Where
is a parameter to penalize large INCVs and is fixed to 0.3 in our experiments. Thus, we can integrate
into (10) to form a better weighting function. Yet, direct integration will fail to process impulse noise pixels because neighboring signal pixels will have lower
than other impulse pixels of similar intensity. As a result, the impulse pixels remain impulse pixels. To solve this problem, Garnett et al. [2] brought forward a switch function J to determine the weight of the radiometric component in the presence of impulse noise. Similarly, our switch is defined as
The switch J tends to reach its maximum 1, when p
xy
or p
ij
has large INCV, that is, with high probability of being a noise pixel; J tends to reach its minimum 0, when both p
xy
and p
ij
have small INCVs, that is, with high probability of being signal pixels. Thus, we introduce the switch J into (10) to control the weights of
and
as
According to the new weighting function, for impulse noise pixels,
is almost "shut off" by the switch J, while
and
work to remove the large outliers; for other pixels,
is almost "shut off" by the switch J, and only
and
work to smooth small amplitude noise without blurring edges. Consequently, we build the spatial trilateral connective (SCT) filter by merging (9) and (13).
Figure 6 shows the outputs of ROAD and SCT filters for the "Neon Light" image corrupted by mixed noise. ROAD filter is based on a rank-order statistic for impulse detector and the bilateral filter. It can well smooth the mixed noise with PSNR = 23.35 but blur lots of fine features such as the tiny lights in Figure 6(b). In contrast, our SCT filter preserves more fine features and produces more visually pleasing output with PSNR = 24.13, as shown in Figure 6(c).
5.2. Trilateral Filtering in Time
As to videos, temporal filtering is more important than spatial filtering [10], but irregular camera and object motions often degrade the performance. Thus, robust motion compensation is quite necessary. Optical flow is a classical approach for this problem; however, it depends on robust gradient estimation and will fail for noisy, underexposed, or overexposed images. Therefore, we pre-enhance the frames with SCT filter and our adaptive piecewise mapping function, which will be detailed in Section 6. Then, we adopt the cvCalcOpticalFlowLK() function of the intel open source computer vision library (Opencv) to compute dense optical flows for robust motion estimation. Too small and too large motions are deleted; also, half-wave rectification and Gaussian smoothing are applied to eliminate noises in optical flow field [29].
After motion compensation, we adopt the similar approach to SCT filter in temporal direction. In temporal connective trilateral (TCT) filter, we define the neighborhood window of a pixel p
xyt
as
, which is a (
)-length window in temporal direction with p
xyt
as the middle. In our experiments, m is fixed to 10. Noticing that the pixels in the window may have different horizontal and vertical coordinates in frames, but they are on the same tracking path generated by the optical flow. Thus, the TCT filter is computed as
where
and
and J are defined the same as (11) and (12), respectively.
The TCT filter can well differentiate impulse noise pixels from motional pixels and smooth the former while leaving the later almost untouched. For impulse noise pixels, the switch function J in TCT filter will "shut off" the radiometric component and the spatial weight is used to smooth them; for motional pixels, J will "shut off" the impulsive component and TCT filter reverts to bilateral filter, which takes the motional pixels as "temporal edges" and leaves them unchanged.
5.3. Implementing ASTC
Although TCT filter is based on robust motion estimation, there are often not enough similar pixels in temporal direction for smoothing in presence of complex motions. As a result, the TCT filter fails to achieve desirable smoothing results and have to convert to spatial direction. Thus, a threshold is necessary to determine whether a sufficient number of temporal similar pixels are gathered; this threshold then can be used as a switch between temporal and spatial filters (in [21]), or as a parameter adjusting importance of the two filters (in our ASTC). If the threshold is too high, then for severely noisy videos, there are always not enough valuable temporal pixels, and temporal filter becomes useless; if the threshold is too low, then no matter how noisy a video is, the output will be always based on unreliable temporal pixels. Accordingly, we introduce an adaptive threshold
like [21], but further considering local noise levels:
In the above formula,
presents the local noise level and is computed in a spatial
neighborhood window.
reaches its maximum 1 in good frames and decreases with the increase of noise level.
is the gain factor of current pixel and equals the tone mapping scales in our adaptive piecewise mapping function, which will be detailed in Section 6. Thus, the more mapping scale is and less noises exist, the larger
becomes; the less mapping scale is and more noises exist, the smaller
becomes. Such characteristics assure the threshold working well for different kinds of videos.
Since the temporal filter outperforms the spatial filter when gathering enough temporal information, we propose the following criteria for the fusion of temporal filter and spatial filter.
-
(1)
If a sufficient number of temporal pixels are gathered, only temporal filter is used.
-
(2)
On the other hand, even if temporal pixels are insufficient, the temporal filter should still more dominant over the spatial one in the fused spatio-temporal filter.
Based on these two criteria, we propose our adaptive spatio-temporal connective (ASTC) filter, which adaptively fuses the spatial connective trilateral filter and temporal connective trilateral filter as
where
which represents the sum of pixel weights in temporal direction. If
(i.e., sufficient temporal pixels),
, then ASTC filter regresses to temporal connective trilateral filter; if
(i.e., insufficient temporal pixels),
, ASTC filter will use the temporal connective trilateral filter to gather pixels in temporal direction first, and then use the spatial connective trilateral filter to gather the remaining number of pixels in spatial direction.