- Research
- Open Access
- Published:
An innovative document image binarization approach driven by the non-local p-Laplacian
EURASIP Journal on Advances in Signal Processing volume 2022, Article number: 50 (2022)
Abstract
Text image binarization is fairly a tedious task and a significant problem in document image analysis. This process, as a necessary pretreatment for noisy images with stains, non-uniform background, or degraded text characters, can successfully improve the quality of the image and facilitate the subsequent image processing steps. A theoretically well-motivated non-local method for document image binarization is addressed in this paper. This approach enhances degraded images by estimating and then removing the undesirable background. Extensive experiments conducted on degraded document images evince the greater effectiveness of the proposed non-local algorithm.
Introduction
Image binarization is a very active research area in document image processing. It basically consists of separating the handwritten and printed text from the non-uniform background. This task becomes extremely difficult when dealing with ancient historical documents, which may suffer from various degradations, such as non-uniform illumination, ink intensity variation, large dark stains, or ink bleed-through. Accordingly, correcting these deterioration effects on document images can significantly enhance the performance of subsequent treatments.
There have long been numerous studies of document image binarization techniques. They are widely categorized as adaptive methods and global methods [1, 2]. Global approaches aspire to find one single thresholding parameter to extract the text from the background based on their gray-level distribution. The popular nonparametric Otsu algorithm [3] is very simple and fast, but it cannot successfully handle images with low contrast, uneven illumination, heavy noises, and bleed-through. To overwhelm those limitations, local threshold methods attempt to compute a mutable threshold for each sub-image. These approaches yield better performance on complex text images by reducing the influence of degradations adaptively. Bernsen [4] first proposed a local method based on image contrast. In this technique, the mean value of the maximum and minimum intensities within a local window centered at the considered pixel is used to compute the threshold value. Niblack’s method [5] uses the mean M and the standard deviation S to deduce the threshold \(T=M+k.S\), where k is a user-defined parameter. This approach is pursued by Sauvola [6] and [7], who proposed modified versions of Niblack’s algorithm to tackle the background noise problem. Singh et al. [8] proposed an improved algorithm based on four steps: local contrast analysis, contrast stretching, thresholding, and noise removal. Di Lu et al. [9] proposed a local thresholding method that focuses on the differences in image grayscale contrast in different areas. In general, local methods give accurate results while remaining computationally expensive because they need to specify several thresholds for the same image.
For decades, partial differential equations (PDEs) have been attractive tools in image processing ranging from denoising smoothing and image inpainting to shape extraction and remote sensing, both from a theoretical and experimental point of view. For example, Wang and He [10] proposed an evolution equation-based binarization method for document images produced by cameras where the evolution is controlled by a global force and a local force, both of which have an opposite sign inside and outside the object of interest in the original image. Jacobs and Momoniat’s method [11] is based on the dynamic process of diffusion, coupled with a nonlinear Fitzhugh–Nagumo-type source term that exhibits binarizing properties: Then, the authors extend this model by selecting the thresholding parameter in the source term based on local information making the process locally adaptive [12]. Recently, Yagoubi et al. [13] proposed a historical document compression scheme combined with a novel automatic enhancement scheme, both based on PDE-analysis. In the same principle, Guo et al. [14] proposed a novel edge-preserving equation by incorporating an adaptive source term expressing binarization properties into a nonlinear diffusion model, wherein the obtained results assure remarkable performance compared to six benchmark binarization algorithms and four PDE-based binarization methods. The work in [15] is among the first ones to apply level set framework in degraded document image binarization, which uses the image edges that can usually be detected around the text stroke boundary.
With the purpose of achieving better binarization results, the use of non-local operators may provide a solid foundation. One of the motivations for the exploitation of this kind of operator may date back to the work of [16], introducing a very general framework to treat image and signal processing by adopting the so-called non-local PDE. Non-local models involve integral equations and fractional derivatives allowing non-local interactions, that is to say, the interaction may occur even when the closures of two domains have an empty intersection. Such models are effective in modeling material singularities and are widely considered in a variety of applications, including image processing [17,18,19], phase transition [20, 21], machine learning [22], and obstacle problem [23]. On the other hand, in [24], the authors offer a rigorous mathematical analysis of non-local models by describing the analogy between the two classical and non-local frameworks.
This paper presents a new non-local document image binarization algorithm based on the non-local p-Laplacian operator. First and foremost, our main concern is separating the text from the background. One of the main features of the proposed image decomposition is that it is induced by human perception and can function very well under changing lighting conditions since it is inspired by many Retinex models [25, 26]. To recover the degraded text by estimating the smooth component of the degraded document image, we suggest considering a non-local p-Laplacian-type diffusion equation. Considering the non-local p-Laplacian operator is motivated by the need to represent anomalous diffusion to estimate complex background; moreover, the exponent “p” allows the control of the degree of smoothing to treat any degradation. Maintaining full advantage of the characteristics of non-local operators, the proposed approach succeeds in preserving text textures, small details, and complex structures.
The rest of this paper is arranged as follows. The second section focuses on a description of the proposed non-local approach. In Section 3, we present a very simple algorithm to solve the proposed non-local model and analyze the obtained results. A summary is presented in Section 4.
Proposed approach
Text image decomposition
To extract textual information contained in an image, it is necessary to eliminate the unwanted background. Here, taking inspiration from many Retinex models that consider that the observed image is the product of the varying illumination and the reflectance, we derive a two-component decomposition model for document images.
In a text document that does not include images, graphs, or tables, the text emanates from the foreground, while the remaining parts belong to the background. Our fundamental assumption is that the background noise, bleed-through, and non-uniform illumination affect the text multiplicatively. We propose decomposing a given document image “I” into two main parts: the foreground text “T” and the background component “B” as follows:
We logarithmically transform (1) to obtain
This model allows the deduction of the text by subtraction if one can estimate the background which gathers all various degradations of the considered image.
Background estimation
The classical PDE-based methods and the frequency domain filters restore images using local information, which means that they fail to preserve the fine structure, details, and texture. To overcomes this drawback, the non-local (NL) means filter [27] tries to take advantage of the high degree of redundancy of any natural image. The NL-means algorithm estimates the value of x as an average of the values of all the pixels whose Gaussian neighborhood looks like the neighborhood of x:
where u is the original image defined in a bounded domain \(\Omega \subset {\mathbb {R}}^2\), and denoted by u(x) for a pixel \(x = (x_1, x_2) \in {\mathbb {R}}^2\), \(G_a\) is a Gaussian kernel of standard deviation a, h acts as a filtering parameter, and C(x) is the normalizing factor.
By replacing local operators in variational models with the non-local ones, the authors of [16] generalized the non-local means filter into a variational formulation and proposed a non-local functional with a total variation (TV) regularization:
where \(\lambda >0\) controls the trade-off between data fidelity and regularization, w is the weighting function, and f is the noisy input image or signal. The corresponding non-local TV flow is given by:
where
This non-local TV model can apply to denoising, sharpening, deblurring, and inpainting problems.
Motivated by the huge interest in the current literature, exploiting non-local methods, we introduce the following non-local p-Laplacian equation with homogeneous Neumann boundary conditions to estimate the document background:
where \(\Omega \subset {\mathbb {R}}^2\) is a bounded domain, \(\textstyle {J:{\mathbb {R}}^2\rightarrow {\mathbb {R}}}\) is a nonnegative continuous radial function with compact support and \(\textstyle {J(0)>0}\) and \(\textstyle {1< p < +\infty }\).
In summary, the main contribution of introducing the non-local p-Laplacian operator is to use more diffusion properties, controlled by the parameter p, to dispatch the information from a clean neighborhood to a degraded one. This can generalize the classical non-local TV and other non-local filtering.
The differences between our proposed method and the PDE-based methods for binarization of degraded document images lie in two aspects. Firstly, unlike traditional methods that use differential operators, we present the first attempt to incorporate non-local operators for image binarization. Secondly, the existing methods are based on evolution PDEs with a source term exhibiting binarization properties. The presence of a source term exhibiting binarization features advances remarkably the desired task, but in this work, we prefer to shed light on the robustness of non-local operators.
The basic concept of the proposed method is to recover the complex background of a document image so as to obtain a binarized image offering readable and clear text. The proposed approach is strongly inspired by the intuition that, for document images, the background is redundant and smooth, while the foreground contains text and sharp edges.
The non-local p-Laplacian equation (3) does not rely on the gradient to extract the diffusion direction so that the diffusion of the density at a point x and time t depends on values of u in a neighborhood of x. This property avoids an exaggerated smoothing effect and motivates the ability of the model to preserve the textures and details of the text.
Solutions to (3) will be understood in the following sense:
Definition 1
A solution to (3) u in [0, T] is a function
that satisfies
and
where \(L^1(\Omega )\) is the Lebesgue space of measurable functions and
In [28], the authors discussed many non-local evolution models with different boundary conditions and proved convergence, under rescaling, to local problems. We state their main result (Theorem 6.2, page 124) which guarantees the existence of a global and unique solution to the proposed non-local binarization problem:
Theorem 1
Suppose \(p>1\) and let \(u_0\in L^p(\Omega )\). Then, for any \(T>0\), there exists a unique solution to (3).
The mathematical properties and the convergence of solutions to the non-local p-Laplacian equation were established in [28] where the authors studied a non-local analog of the p-Laplacian evolution equation for \(1< p < \infty\) with Dirichlet or Neumann boundary conditions. One of the main tools for the proof (in [28, pp. 125–127]) is nonlinear semigroup theory.
Experiments and discussion
The principal steps of the proposed discrete algorithm to compute approximated solutions to the problem (3) are as follows:
Let h and \(\triangle t\) be space and time steps. Let \(u_{i}^n=u(i_1,i_2, n\triangle t )\) with \(n\ge 0\).
Equation (3) can be implemented via a simple explicit finite difference scheme
In all numerical experiments, we choose the following nonnegative real-valued kernel function:
where \(\parallel . \parallel\) denotes the Euclidean distance and \({\mathcal {N}}_i = \{j : \parallel i-j\parallel < d\}\) denotes the neighbors set of the pixel \(i=(i_1,i_2)\).
Algorithm for solving (3) | |
---|---|
Input: The acquired image u, iteration number \(N=10\) and convergence parameter \(\epsilon\). | |
Step 1: Initialize \(B^{0}=\log (u+1)\), choose \(dt>0\), \(h>0\) and \(p>1\) and set \(n=0.\) | |
Step 2: Normalize \(B^{0}\) into [0, 1]. | |
Step 3: \(B^{n+1}_i=B^{n}_i+dt\triangle _{NL}^p (B^{n}_i)\;\;\;\;\)for \(n= 1, . . . ,N\) | |
if \(\parallel B^{n+1}-B^{n}\parallel \le \epsilon\) or \(n> N\), go to step 4. | |
Step 4: Compute \(U=\exp (B^{0}-B)\). | |
Output: The binarized image \(U_B\). |
The proposed model does not take into account any preprocessing step to enhance the quality of the degraded image. The color-to-grayscale conversion of the input image in Step 2 is considered using the simplest and most widely used approach: It is directly obtained using a mean of the RGB channels. The binary output image \(U_B\) is obtained using the following projection:
Image binarization outcomes
Now, we return to summarize performed tests and obtained results to demonstrate the efficiency of our binarization approach.
Throughout the experimental section, we set \(dt = 0.3\), \(h = 80\), a patch size of \(15 \times 15,\) while the exponent parameter \(p\in ]1, 11]\) is adjusted for each image.
Indeed, we have been able to obtain sufficiently satisfactory results for \(60 \le h \le 110\) and note that if the value of h is too small, the weight function becomes less important and consequently the background is not well estimated; if h is sufficiently high, the estimated text becomes darker and wider than desired. Concerning the choice of the parameter p, we have tried to use a bilevel optimization framework to automatically determine this parameter, but we encountered a problem of the regularity on p and also big flows in the convergence results. As an alternative way, we tried to use some learning techniques based on CNN configuration, which will be the object of a future work.
We experimentally evaluate our proposed model using the DIBCO testing dataset. DIBCO 2009, DIBCO 2011, and DIBCO 2013 contain both handwritten and machine-printed images, while DIBCO 2010, DIBCO 2012, DIBCO 2014, and DIBCO 2016 include only handwritten images. This dataset consists of 86 document images containing diverse representative degradations that commonly appear such as non-uniform illumination, stains, bleed-through, noise, and smudges. In Figs. 1, 2, 3, 4, 5, 6 and 7, we select some examples of degraded images and our proposed binarized solutions. The binarization performance on machine-printed images and handwritten images is comparable even in the presence of pale character and bleed-through background.
It is worth mentioning that traditional binarization methods fail to handle images with large ink stains. In Fig. 8, we test three representative degraded images from DIBCO 2013 (“PR04.bmp,” “PR05.bmp,” and “PR08.bmp”). A visual analysis of obtained results shows that the proposed method can detect and remove all smudges.
The results produced in this subsection prove the efficiency of our PDE for binarizing ancient and severely degraded document images.
Comparison with State of the Art
We compare the binarization visual quality of the proposed method with eleven widely used state-of-the-art image binarization approaches. For evaluation, the considered comparison criteria include a set of measures that are suitable for evaluation in the context of document analysis. These evaluation measurements are:
(1) F-Measure (FM):
where \({\text{Recall}} = \frac{{\text {TP}}}{{\mathrm {TP}}+{\text {FN}}}\), \({\text {Precision}} = \frac{{\text {TP}}}{{\mathrm {TP}}+{\text {FP}}}\) , and TP, FP, FN are true positive, false positive, and false negative values, respectively.
(2) Pseudo-F-Measure (Fps):
where psRecall and psPrecision are the pseudo-recall and the pseudo-precision [29].
(3) Peak Signal-to-Noise Ratio (PSNR):
where \(I_{BN}\) is the binary image output and \(I_{GT}\) is the hand-annotated ground truth binary result.
(4) Distance Reciprocal Distortion (DRD):
where \({\text {DRD}}_k\) is the reciprocal distortion distance of the kth modified pixel and NUBN is defined as the number of non-uniform blocks of \(8 \times 8\) sizes in the ground truth image [30]. Good results minimize the DRD and maximize the first three metrics.
Figure 9 illustrates how Otsu, Niblack, and Bernsen’s methods cannot deal with the dark background, while the other methods ignore some faint characters. Figure 10 illustrates the efficacy of our method on an image with contrast variation.
In Fig. 11, we select a handwritten image with bleed-through from the DIBCO 2013. The first four results in Fig. 11 show some problems with the extreme bleed-through due to the edge confusion between background and foreground text, while our binarization result and result of [14] can better identify the text and background areas.
The last example is a handwritten document image with uneven illumination, taken from the DIBCO 2012. Once more, our method is robust to this type of degradation and continues to excel in the binarizing process.
According to Figs. 9, 10, 11 and 12, it can be certainly seen that the global threshold method cannot effectively handle the documents with degradations such as bleed-through, uneven background, and low contrast. Local threshold methods can remedy the limitations of the global method while remaining computationally expensive because they need to specify several thresholds for the same image. On the other hand, we touch the flexible adaptation of the proposed non-local method to some types of degradations related to image contrast variation, non-uniform illumination, or ink bleed-through. Moreover, the simple explicit finite difference scheme for solving our non-local evolution equation numerically allows for a very fast binarization process compared with many different traditional PDE-based methods; we obtain the desired binarization after only one or two iterations.
Tables 1, 2 and 3 depict the evaluated performance of the selected methods on the DIBCO testing dataset where the bold font marks the best values. All considered comparison criteria reflect the outperformance of the proposed method.
Conclusions
The proposed binarization method is a nonlinear p-Laplacian diffusion process that can elegantly model the non-local nature of document images degradations. Fundamentally, background variations are estimated from an image input to acquire a binary outcome. Extensive experiments claimed that our method is highly successful.
As future prospects, we want to study how to adjust automatically the exponent “p.” For instance, it would be interesting to adaptively select this parameter instead of a global choice in order to allow a careful study of smudges, fading characters, or any non-uniform degradation.
Availability of data and materials
We have shown the accuracy of our proposed approach on seven public datasets from popular DIBCO competition.
References
R. Keefer, N. Bourbakis, A survey on document image processing methods useful for assistive technology for the blind. Int. J. Image Graph. 15(01), 1550005 (2015)
A. Shrivastava, D.K. Srivastava, A review on pixel-based binarization of gray images, in Proceedings of the International Congress on Information and Communication Technology, pp. 357–364 (2016). Springer
N. Otsu, A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
J. Bernsen, Dynamic thresholding of gray-level images, in Proceedings of Eighth International Conference on Pattern Recognition, Paris (1986)
W. Niblack et al., An Introduction to Digital Image Processing, vol. 34 (Prentice-Hall, Englewood Cliffs, 1986)
J. Sauvola, M. Pietikäinen, Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000)
B. Gatos, I. Pratikakis, S.J. Perantonis, Adaptive degraded document image binarization. Pattern Recogn. 39(3), 317–327 (2006)
B.M. Singh, R. Sharma, D. Ghosh, A. Mittal, Adaptive binarization of severely degraded and non-uniformly illuminated documents. Int. J. Doc. Anal. Recognit. (IJDAR) 17(4), 393–412 (2014)
D. Lu, X. Huang, L. Sui, Binarization of degraded document images based on contrast enhancement. Int. J. Doc. Anal. Recognit. (IJDAR) 21(1–2), 123–135 (2018)
Y. Wang, C. He, Binarization method based on evolution equation for document images produced by cameras. J. Electron. Imaging 21(2), 023030 (2012)
B. Jacobs, E. Momoniat, A novel approach to text binarization via a diffusion-based model. Appl. Math. Comput. 225, 446–460 (2013)
B. Jacobs, E. Momoniat, A locally adaptive, diffusion based text binarization technique. Appl. Math. Comput. 269, 464–472 (2015)
M.R. Yagoubi, A. Serir, A. Beghdadi, A collaborative enhancement-compression approach for historical document images based on pde-analysis. Digital Signal Process. 67, 61–75 (2017)
J. Guo, C. He, X. Zhang, Nonlinear edge-preserving diffusion with adaptive source for document images binarization. Appl. Math. Comput. 351, 8–22 (2019)
D. Rivest-Hénault, R.F. Moghaddam, M. Cheriet, A local linear level set method for the binarization of degraded historical document images. Int. J. Doc. Anal. Recognit. (IJDAR) 15(2), 101–124 (2012)
G. Gilboa, S. Osher, Nonlocal operators with applications to image processing. Multiscale Model. Simul. 7(3), 1005–1028 (2008)
S. Kindermann, S. Osher, P.W. Jones, Deblurring and denoising of images by nonlocal functionals. Multiscale Model. Simul. 4(4), 1091–1115 (2005)
G. Gilboa, S. Osher, Nonlocal linear image regularization and supervised segmentation. Multiscale Model. Simul. 6(2), 595–630 (2007)
A. Buades, B. Coll, J.-M. Morel, Image denoising methods. A new nonlocal principle. SIAM Rev. 52(1), 113–147 (2010)
G. Alberti, G. Bellettini, A nonlocal anisotropic model for phase transitions. Math. Ann. 310(3), 527–560 (1998)
P.W. Bates, A. Chmaj, An integrodifferential model for phase transitions: stationary solutions in higher space dimensions. J. Stat. Phys. 95(5–6), 1119–1139 (1999)
L. Rosasco, M. Belkin, E.D. Vito, On learning with integral operators. J. Mach. Learn. Res. 11(Feb), 905–934 (2010)
Q. Guan, M. Gunzburger, Analysis and approximation of a nonlocal obstacle problem. J. Comput. Appl. Math. 313, 102–118 (2017)
Q. Du, M. Gunzburger, R.B. Lehoucq, K. Zhou, Analysis and approximation of nonlocal diffusion problems with volume constraints. SIAM Rev. 54(4), 667–696 (2012)
E.H. Land, J.J. McCann, Lightness and retinex theory. Josa 61(1), 1–11 (1971)
E.H. Land, The retinex theory of color vision. Sci. Am. 237(6), 108–129 (1977)
A. Buades, B. Coll, J.-M. Morel, A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4(2), 490–530 (2005)
F. Andreu, J. Mazón, J. Rossi, J. Toledo, Nonlocal Diffusion Problems (2010)
K. Ntirogiannis, B. Gatos, I. Pratikakis, Performance evaluation methodology for historical document image binarization. IEEE Trans. Image Process. 22(2), 595–609 (2012)
H. Lu, A.C. Kot, Y.Q. Shi, Distance-reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 11(2), 228–231 (2004)
Acknowledgements
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
All authors contributed equally to the production of this paper. All authors read and approved the final manuscript.
Fatim Zahra Ait Bella
received the Master’s degree in Mathematics and Computer Science Applied to Engineering Sciences and the PhD degree from Cadi Ayyad University, Morocco, in 2017 and 2019, respectively. Her current research interests are in image restoration. She is a member of the applied mathematics and computer science Laboratory at the Faculty of Sciences and Technics, Cadi Ayyad University, Morocco.
Mohammed El Rhabi
is a full Professor of Applied Mathematics and the head of the “Applied Mathematics-Computer and Data Sciences Unit” (MID@S unit) at Ecole Centrale Casablanca. He received the PhD degree from the Pierre and Marie University (Paris 6, Sorbonne University), France, in 2002. His research interests include applied mathematics and data processing.
Abdelilah Hakim
received his PhD degree from the Paris XI Orsay University in 1989. He is currently a Full Professor and is the Head of the Applied Mathematics and Computer Science Laboratory at the Faculty of Sciences and Technics, Cadi Ayyad University, Morocco. His major research interests include functional analysis, computer vision, and machine intelligence.
Amine Laghrib
is an Associate Professor of Mathematics at the Sultan Moulay Slimane University. He received the PhD degree from Cadi Ayyad University, Morocco, in 2015. His research is mainly focused on inverse problem, image denoising, and image super-resolution.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ait Bella, F.Z., El Rhabi, M., Hakim, A. et al. An innovative document image binarization approach driven by the non-local p-Laplacian. EURASIP J. Adv. Signal Process. 2022, 50 (2022). https://doi.org/10.1186/s13634-022-00883-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13634-022-00883-2
Keywords
- Binarization
- Document image
- Nonlinear diffusion
- Non-local p-Laplacian