Text localization using standard deviation analysis of structure elements and support vector machines
 Konstantinos Zagoris^{1}Email author,
 Savvas A Chatzichristofis^{1} and
 Nikos Papamarkos^{1}
https://doi.org/10.1186/16876180201147
© Zagoris et al; licensee Springer. 2011
Received: 28 January 2011
Accepted: 25 August 2011
Published: 25 August 2011
Abstract
A text localization technique is required to successfully exploit document images such as technical articles and letters. The proposed method detects and extracts text areas from document images. Initially a connected components analysis technique detects blocks of foreground objects. Then, a descriptor that consists of a set of suitable document structure elements is extracted from the blocks. This is achieved by incorporating an algorithm called Standard Deviation Analysis of Structure Elements (SDASE) which maximizes the separability between the blocks. Another feature of the SDASE is that its length adapts according to the requirements of the application. Finally, the descriptor of each block is used as input to a trained support vector machines that classify the block as text or not. The proposed technique is also capable of adjusting to the text structure of the documents. Experimental results on benchmarking databases demonstrate the effectiveness of the proposed method.
Keywords
1 Introduction
The present electronic age produces vast quantities of many digital document images such as technical articles, business letters and faxes. In order to effectively exploit them by many systems, such as optical character recognition, Word Spotting [1, 2] and Document Retrieval Systems, the contained text must be located by a detection technique. The research community is engaged on an ongoing attempt to address this problem by using a variety of approaches. There are topdown techniques employing recursive algorithms to segment the whole page to small regions. The subdivision is based on a homogeneity criterion: the splitting procedure stops when the criterion is met, and blocks obtained at this stage constitute the final segmentation result [3]. The advantage of those methods is the high detection speed as they are not containing timeconsuming operations, but they cannot handle documents well with very complex layouts. Some examples of topdown algorithms are reported in [4–7].
Bottomup techniques first identify primary elements (e.g., characters) and afterwards merge them into larger regions (text blocks). The procedure can be iterated giving rise to a growing process which adjoins unconnected adjacent components to cluster higherorder components (such as words, lines, document zones). Strouthopoulos et al. [8] proposed such technique to automatically detect and extract text in mixedtype color documents using a combination of an adaptive color reduction technique and a page layout analysis approach. Jain et al. [9] presented a geometric layout analysis of technical journal pages using connected component extraction to efficiently implement page segmentation and region identification. Jiang et al. used a spatial colorquantized map, an edge map calculated by Sobel operators and morphology operators, in order to merge bounding boxes and obtained candidate text regions. In [10], a bottomup technique first identifies marks using a suitable contourfollowing technique. A principal component analyzer is employed afterward to determine the principal axes of each mark, and a nearestneighbor technique is used for finding the shortest distances between marks. A feature vector is formed based on mark dimensions and distances between them, which is then fed into a selforganizing feature map (SOFM) to divide the marks into homogeneous clusters. A set of fuzzy rules is formed using all cluster weights and variances. Finally, a fuzzy classification scheme identifies each mark as a character or a noncharacter. Recently, Li et al. [11] proposed an approach to automatically localize horizontally texts appearing in color and complex images. First, an edgedetection method using a wavelet transform is used to find text in an image. Afterward, the image is binarized, and a filter is applied for removing dispersed pixels and nontext area. Finally, a new projection profile is applied for estimating text regions. In [12, 13], the respective authors treated text detection as a classification problem. Li et al. [12] used support vector machines (SVM) to obtain a text region based on the features extracted by stroke filter calculation on stroke maps. Chen et al. [13] compared the SVMbased method with multilayer perceptrons (MLP) based on text verification over four independent features, namely, the distance map feature, the grayscale spatial derivative feature, the constant gradient variance feature and the DCT coefficient feature. Finally they found that better detection results were obtained by using SVM rather than MLP. Bottomup techniques can segment correct complex layouts but take considerably more time to complete than topdown methods.
Hybrid algorithms can be regarded as a mix of the previous approaches, thus configuring a procedure which involves both splitting and merging phases. In [14], authors proposed the adaptation of the Scale Invariant Feature Transform SIFT [15] approach in the context of text character localization in graphical documents. This method uses a combination of bottomup and topdown approaches to separate and locate text characters. They extract knowledge from a bottomup approach and use them in a topdown approach. Other hybrid algorithms are reported in [16–18].
Along with research in the text localization from still images, several algorithms have been proposed for text localization in videos. Video images often have complex backgrounds with strong edge or texture clutter, and it is very difficult to detect the graphic or scene text with high accuracy [19]. In [20], authors proposed a new localization and recognition method for scoreboard text in sport videos. The method first matches the SIFT points using a modified matching technique between two frames extracted from a video clip and then localizes the scoreboard by computing a robust estimate of the matched point cloud in a twostage nonscoreboard filter process based on some domain rules. Some other text localization methods from videos are reported in [19, 21, 22].
This article proposes a new bottomup method which detects and extracts homogeneous text in document images indifferent to font types and size by using connected components analysis for the object detection, document structure elements (DSE) to construct a descriptor and SVM to tag the appropriate objects as text. The proposed technique has the ability to adapt to the peculiarities of each document images database since the features are adjustable. It provides also the ability to increase or decrease text localization speed by the manipulation of the block descriptor length. A preliminary version of this work has been presented in [23].
Next, a descriptor that consists of a set of structural features (determined by a procedure called standard deviation analysis of structure elementsSDASE) is extracted from the merged blocks and used as input to a trained SVM. Finally, the output of the SVM defines the block as text or not.
The rest of the article is organized as follows: Section 2 describes the block detection method while Section 3 explains the creation of the block descriptor using a novel algorithm called SDASE. Section 4 presents the SVM and the algorithm to train them. Section 5 contains the evaluation and the experimental results of the textextraction technique, and finally, the conclusions are drawn in Section 6.
2 Block detection using connected components labeling and filtering
The primary aim of the block detection method is to detect and extract all the objects of a document. This is accomplished using the connected components labeling and filtering technique.

Step 1: The very large and small CCs are disregarded to speed up the features extraction process (if there are such CCs). This is accomplished by the rejection of the CCs that satisfy one of the following conditions:$\mathsf{\text{C}}{\mathsf{\text{C}}}_{h}>\frac{{D}_{h}}{4}\phantom{\rule{1em}{0ex}}\mathsf{\text{or}}\phantom{\rule{1em}{0ex}}\mathsf{\text{C}}{\mathsf{\text{C}}}_{h}\le 2$(1)
where CC _{ h } the height of the CC and D _{ h } the document height.

Step 2: Create a CC height histogram as Figure 3a depicts.

Step 3: Apply a mean 3 × 1 filter to smooth the histogram (Figure 3b).

Step 4: Find the peaks H(p) of the histogram.

Step 5: Find the average of the each peak values:$A=\frac{\sum H\left(p\right)}{{N}_{\mathsf{\text{p}}}}$(2)
where N_{p} is the total number of the peaks.

Step 6: Define as CC _{ h } the maximum height which the remaining peaks point to. For example, in Figure 3b, the CC _{ h } is equal to 27:$\mathsf{\text{C}}{\mathsf{\text{C}}}_{\mathsf{\text{ch}}}=\mathsf{\text{max}}\left\{p\right\},\phantom{\rule{1em}{0ex}}\forall p\in \left\{H\left(p\right)>A\right\}$(3)

Step 7: Expand the left and right sizes of the blocks by $\frac{\mathsf{\text{c}}{\mathsf{\text{c}}}_{h}}{2}$ as Figure 2d illustrates.

Step 8. In [26], it has been proven that the height of a word can reach the double of a character mean size due to presence of ascenders and descenders. Hence, in the worstcase scenario where the CC _{ h } corresponds to a height of a character which does not have ascenders and descenders, it is safe to merge the overlapping CCs that satisfy the following conditions to model the line of texts (Figure 3e):$\mathsf{\text{C}}{\mathsf{\text{C}}}_{h1}\ge \frac{C{C}_{h2}}{5}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{C}}{\mathsf{\text{C}}}_{h1}\le 5\times \mathsf{\text{C}}{\mathsf{\text{C}}}_{h2}$(4)
where CC_{ h1 }and CC_{ h2 }are the heights of the two overlapping components.
3 Block description using SDASE
The next step involves the feature extraction stage of the blocks. The extracted features construct a descriptor of each block that maximizes the separability between the blocks. The spatial features are constructed by the number of the suitable DSE contained in each block.
for n = 1, 2, . . . , (C  2)(K  2) where L_{ j }, L_{ v } ∈ [1, 510]. Note that the 0 and 511 DSEs are removed because they correspond to pure background and pure document objects, respectively.
where X(L) is a vector of 510 elements.
Next, a featurereduction algorithm is applied, which reduces the number of features. We call this algorithm SDASE.
 1.
Find the standard deviation (SD) SDXT (L_{ n } ) of the X (L_{ n } ) for the T blocks for each L_{ n } DSEs.
 2.
Repeat the same for the P blocks: Find the SD SDXP (L_{ n } ) of the X (L_{ n } ) for each L_{ n } DSEs.
 3.Normalize the SDXT (L_{ n } ) and SDXP (L_{ n } ):$SDX{T}^{\prime}\left({L}_{n}\right)=\frac{SDXT\left({L}_{n}\right)}{510}$(8)$SDX{P}^{\prime}\left({L}_{n}\right)=\frac{SDXP\left({L}_{n}\right)}{510}$(9)
 4.Then define the vector O (L_{ n } ) as$O\left({L}_{n}\right)=SDX{T}^{\prime}\left({L}_{n}\right)SDXPT\phantom{\rule{0.3em}{0ex}}\left({L}_{n}\right)\phantom{\rule{0.3em}{0ex}}$(10)
 5.
The first element/bin of the block descriptor corresponds to the L_{ n } DSE that has the maximum value of O (L_{ n } ). The second element/bin corresponds to the L_{ n } DSE that has the second largest value of O (L_{ n } ), and so on.
The aim of the SDASE is to find those DSEs that have maximum SD for the text blocks and minimum SD for the nontext blocks and the opposite. Hence, it sorts the DSEs by their ability to determine which block contains text or not. Also, the length of the descriptor can be reduced from the 510 initial DSEs to any number. We proposed the length of the descriptor to be around 128 as the evaluation suggests.
Note that the descriptor has the ability to adapt to the demands of each set of documents images. Also, if there is not enough computational power, the descriptor can decrease its size.
Section 5 presents experiments evaluating the effect of the descriptor length both on the proposed method speed and success rate. Obviously, a training dataset is required to determine the optimal DSEs. Fortunately, this does not cause a problem because such dataset already is required for the training of the SVMs.
Therefore, the final block descriptor is a 128 (or any other number that is chosen)element vector, and it corresponds to the X (L_{ n } ) (Equation 7) of those 128 L_{ n } DSEs that the block contains. SVM is trained using this descriptor as input.
4 Block classification by SVM
The SVMs, introduced in 1992 [27, 28], are based on statistical learning theory and have been applied to many and various classification problems.
The most common kernels
Polynomial  ${\left({x}^{T}\cdot {x}^{\prime}\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}1\right)}^{p}$ 

Radial basis function (Gaussians)  $\mathsf{\text{exp}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{{}}\gamma \u2225x{x}^{\prime}\u2225\mathsf{\text{}}}$ 
Sigmoid  tanh $\left(k{x}^{T}\cdot {x}^{\prime}\delta \right)$ 
The constant C > 0 defines the tradeoff between the training error and the margin. The training data x_{ i } for which a_{ i } > 0, are called support vectors.
So, if f (x) > 0, then the data x are classified to class 1; otherwise, it is classified to class 0.
4.1 Parameter selection
One of the difficulties of the SVMs is the tuning of their parameters. In our case, there are two parameters: the C from the maximum margin classifier and the γ from the radial basis function kernel. The aim is to find the optimal values of the two parameters C and γ so that the classifier can accurately predict the unknown data. Very often, this is achieved through a crossvalidation procedure by using a grid search for the two parameters. In this study, the parameter estimation algorithm (PEA) [29] from the parameter detection for the binarization methods is employed for the detection of the correct SVM parameters. The stages of the algorithm for the detection of the best SVM parameters values are

Stage 1: Set the initial range of the SVM parameter values. Consider the range [c_{ s }, c_{ e } ] for C and the range [γ_{ s }, γ_{ e } ] for γ. In this study, c_{ s } = 0, c_{ e } = 300, γ_{ s } = 0, and γ_{ e } = 40.

Stage 2: Set the number of steps executed in each iteration for each parameter. In this study, s_{ c } = 10 (C parameter) and s_{ γ } = 10 (γ parameter).

Stage 3: Calculate the lengths of each step according to the following equations:${L}_{c}=\frac{{c}_{e}{c}_{s}}{{s}_{c}1}$(15)${L}_{\gamma}=\frac{{\gamma}_{e}{\gamma}_{s}}{{s}_{\gamma}1}$(16)

Stage 4: Calculate all the values of parameters C and γ for each step according to the following equations:$C\left(i\right)={s}_{c}+k\cdot {L}_{c},\phantom{\rule{2.77695pt}{0ex}}\forall k\in \left[0,\phantom{\rule{2.77695pt}{0ex}}{s}_{c}1\right]$(17)$\gamma \left(i\right)={s}_{\gamma}+k\cdot {L}_{\gamma},\phantom{\rule{2.77695pt}{0ex}}\forall k\in \left[0,\phantom{\rule{2.77695pt}{0ex}}{s}_{c}1\right]$(18)

Stage 5: Find the two pairs of parameter values that give the best and secondbest results by crossvalidation technique. Let (C_{1}, γ_{1}) and (C_{2}, γ_{2}) be those two pairs, respectively.

Stage 6: Redefine the ranges for the two parameters used during the next iteration according to the following equations:$\left[{c}_{s}^{\prime},\phantom{\rule{2.77695pt}{0ex}}{c}_{e}^{\prime}\right]=\left\{\begin{array}{cc}\hfill \left[{C}_{1},\phantom{\rule{2.77695pt}{0ex}}{C}_{2}\right]\hfill & \hfill \mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{C}_{1}<{C}_{2}\hfill \\ \hfill \left[\frac{{c}_{s}+{c}_{1}}{2},\phantom{\rule{2.77695pt}{0ex}}\frac{{c}_{e}+{c}_{2}}{2}\right]\hfill & \hfill \mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{C}_{1}={C}_{2}\hfill \\ \hfill \left[{C}_{2},\phantom{\rule{2.77695pt}{0ex}}{C}_{1}\right]\hfill & \hfill \mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{C}_{1}>{C}_{2}\hfill \end{array}\right.$$\left[{\gamma}_{s}^{\prime},\phantom{\rule{2.77695pt}{0ex}}{\gamma}_{e}^{\prime}\right]=\left\{\begin{array}{cc}\hfill \left[{\gamma}_{1},\phantom{\rule{2.77695pt}{0ex}}{\gamma}_{2}\right]\hfill & \hfill \mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{\gamma}_{1}<{\gamma}_{2}\hfill \\ \hfill \left[\frac{{\gamma}_{S}+\gamma 1}{2},\phantom{\rule{2.77695pt}{0ex}}\frac{{\gamma}_{e}+\gamma 2}{2}\right]\hfill & \hfill \phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{\gamma}_{1}={\gamma}_{2}\hfill \\ \hfill \left[{\gamma}_{2},\phantom{\rule{2.77695pt}{0ex}}{\gamma}_{1}\right]\hfill & \hfill \mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{\gamma}_{1}>{\gamma}_{2}\hfill \end{array}\right.$

Stage 7: Redefine the steps for the new ranges used in the next iteration according to the following equations:${s}_{c}^{\prime}=\left\{\begin{array}{cc}\hfill {s}_{c}1\hfill & \hfill \mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{c}_{e}{c}_{s}\le {s}_{c}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}{s}_{c}\ge 5\hfill \\ \hfill {s}_{c}\hfill & \hfill \mathsf{\text{anything}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{else}}\hfill \end{array}\right.$(19)${s}_{\gamma}^{\prime}=\left\{\begin{array}{cc}\hfill {s}_{\gamma}1\hfill & \hfill \mathsf{\text{if}}\phantom{\rule{2.77695pt}{0ex}}{\gamma}_{e}{\gamma}_{s}\le {s}_{\gamma}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}{s}_{\gamma}\ge 5\hfill \\ \hfill {s}_{\gamma}\hfill & \hfill \mathsf{\text{anything}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{else}}\hfill \end{array}\right.$(20)

Stage 8: if ${s}_{c}^{\prime}\ge 5$, or ${s}_{\gamma}^{\prime}\ge 5$, then go to Stage 3 and repeat all the stages again with the new ranges and steps. If ${s}_{c}^{\prime}\ge 5$ and ${s}_{\gamma}^{\prime}\ge 5$, then terminate the procedure; the best parameter values are those calculated at Stage 6 of the last iteration.
The training parameters of the SVM from the PEA
Database  C  γ 

MediaTeam document database II  244.444  86.5 
MediaTeam document database II with artificial noise  284.211  92.21 
5 Evaluation
The SVMs parameter values, the crossvalidation results and the calculation time for each procedure
Procedure  C  γ  Crossvalidation result  Calculation time (h) 

Grid search  177.962  84.861  98.973  23.71 
PEA  244.444  86.5  99.9997  3.85 
The application makes use of the Document Image Database from the University of Oulu [31, 32], which includes 233 types of documents. Those images contain a mixture of text and pictures.
The previous two experiments show the ability of the proposed SDASE algorithm to adjust to the peculiarities of the database. Especially, the experiments on the noisy database make this fact more clear. Finally, the proposed textextraction method scores better than other similar textextraction techniques.
6 Conclusions
In this article, a bottomup text localization technique is proposed that detects and extracts homogeneous text from document images. A CCA technique is applied which detects the objects of the document. Then a powerful and adaptive descriptor is constructed from the contained DSEs in each object based on the SDASE algorithm. Finally, a trained SVM classifies the objects as text and nontext.
In order to evaluate the proposed technique, we utilize the Document Image Database from the University of Oulu. First, we provide the correlation of the descriptor length with the success rate of the proposed method and we reach the conclusion that 128 elements is enough for the detection of the text blocks satisfactory. Moreover, the descriptor length can be increased or decreased accordingly to the computational constrains. In addition to that, we provide the run time of the proposed method with regard to the descriptor length. Then, we add noise to the Document Image Database and calculate the new block descriptor so as to demonstrate its flexibility. The results are very close to the original image documents. Finally, we assessed the efficiency of the SDASE textextraction algorithm against other textextraction techniques and it performed better.
Declarations
Authors’ Affiliations
References
 Frinken V, Fischer A, Bunke H: A Novel Word Spotting Algorithm Using Bidirectional Long ShortTerm Memory Neural Networks. In Artificial Neural Networks in Pattern Recognition: 4th IAPR TC3 Workshop, ANNPR 2010, Cairo, Egypt, April 1113, 2010, Proceedings. Volume 5998. Springer; 2010:185.View ArticleGoogle Scholar
 Zagoris K, Papamarkos N, Chamzas C: Web Document Image Retrieval System Based on Word Spotting. ICIP 2006, 477480.Google Scholar
 Gorecki P, Caponetti L, Castiello C: Fuzzy Techniques for Text Localisation in Images.In Computational Intelligence in Multimedia Processing: Recent Advances, of Studies in Computational Intelligence Edited by: Hassanien AE, Abraham A, Kacprzyk J. Springer Berlin/Heidelberg; 2008, 96: 233270. [http://dx.doi.org/10.1007/978354076827210]View ArticleGoogle Scholar
 Matrakas MD, Bortolozzi F: Segmentation and Validation of Commercial Documents Logical Structure. ITCC 2000, 242246.Google Scholar
 Jun K: Neural networkbased text localization in color images. "Pattern Recognition Letters 2001, 22: 15031515. 10.1016/S01678655(01)000964View ArticleGoogle Scholar
 Ingold R, Armangil D: A TopDown Document Analysis Method for Logical Structure Recognition. First International Conference Document Analysis and Recognition 1991.Google Scholar
 Ha J, Haralick R, Phillips I: Document Page Decomposition by the BoundingBox Projection Technique. Third International Conference Document Analysis and Recognition 1995.Google Scholar
 Strouthopoulos C, Papamarkos N, Atsalakis A: Text extraction in complex color documents. Pattern Recognition 2002,35(8):17431758. 10.1016/S00313203(01)001674View ArticleGoogle Scholar
 Jain AK, Yu B: Document Representation and Its Application to Page Decomposition. IEEE Trans Pattern Anal Mach Intell 1998,20(3):294308. 10.1109/34.667886View ArticleGoogle Scholar
 Nikolaidis A, Strouthopoulos C: Robust text extraction in mixedtype binary documents. MMSP 2008, 393398.Google Scholar
 Aghajari G, Shanbehzadeh J, Sarrafzadeh A: A Text Localization Algorithm in Color Image via New Projection Profile. Proceedings of the International MultiConference of Engineers and Computer Scientists 2010.Google Scholar
 Li X, Wang W, Jiang S, Huang Q, Gao W: Fast and effective text detection. ICIP 2008, 969972.Google Scholar
 Chen D, Odobez JM, Bourlard H: Text detection, recognition in images and video frames. Pattern Recognition 2004,37(3):595608. 10.1016/j.patcog.2003.06.001View ArticleGoogle Scholar
 Roy PP, Pal U, Lladós J: Touching Text Character Localization in Graphical Documents Using SIFT. GREC 2009, 199211.Google Scholar
 Lowe DG: Distinctive Image Features from ScaleInvariant Keypoints. International Journal of Computer Vision 2004,60(2):91110.View ArticleGoogle Scholar
 Jung C, Liu Q, Kim J: Accurate text localization in images based on SVM output scores. Image Vision Comput 2009,27(9):12951301. 10.1016/j.imavis.2008.11.012View ArticleGoogle Scholar
 Badekas E, Nikolaou NA, Papamarkos N: Text Localization and Binarization in Complex Color Documents. In MLDM Posters. Edited by: Perner P. IBaI publishing; 2007:115.Google Scholar
 Emmanouilidis C, Batsalas C, Papamarkos N: Development and Evaluation of Text Localization Techniques Based on Structural Texture Features and Neural Classifiers. ICDAR, IEEE Computer Society 2009, 12701274.Google Scholar
 Jung C, Liu Q, Kim J: A stroke filter and its application to text localization. Pattern Recognition Letters 2009,30(2):114122. 10.1016/j.patrec.2008.05.014View ArticleGoogle Scholar
 Guo J, Gurrin C, Lao S, Foley C, Smeaton AF: Localization and Recognition of the Scoreboard in Sports Video Based on SIFT Point Matching. MMM (2) 2011, 337347.Google Scholar
 Su YM, Hsieh CH: A Novel Modelbased Segmentation Approach to Extract Caption Contents on Sports Videos. ICME 2006, 18291832.Google Scholar
 Hsieh CH, Huang CP, Hung MH: Detection and Recognition of Scoreboard for Baseball Videos. ICIC (1) 2008, 337346.Google Scholar
 Zagoris K, Papamarkos N: Text Extraction Using Document Structure Features And Support Vector Machines. Proceedings of the 11th IASTED International Conference on Computer Graphics and Imaging 2010.Google Scholar
 Otsu N: A threshold selection method from graylevel histograms. IEEE Trans Sys, Man, Cyber 1979, 9: 6266.View ArticleGoogle Scholar
 Suzuki K, Horiba I, Sugie N: Lineartime connectedcomponent labeling based on sequential local operations. Computer Vision and Image Understanding 2003, 89: 123. 10.1016/S10773142(02)000309View ArticleGoogle Scholar
 Kavallieratou E, Fakotakis N, Kokkinakis G: Un Offline Unconstrained Handwritting Recognition System. International Journal of Document Analysis and Recognition 2002, 4: 226242. 10.1007/s100320200079View ArticleGoogle Scholar
 Boser BE, Guyon I, Vapnik V: A Training Algorithm for Optimal Margin Classifiers. COLT 1992, 144152.Google Scholar
 Cortes C, Vapnik V: Support vector networks. Machine Learning 1995, 20: 273197.Google Scholar
 Badekas E, Papamarkos N: Automatic Evaluation of Document Binarization Results.In Progress in Pattern Recognition, Image Analysis and Applications, of Lecture Notes in Computer Science Edited by: Sanfeliu A, Cortés M. Springer Berlin/Heidelberg; 2005, 3773: 10051014. [http://dx.doi.org/10.1007/11578079103] 10.1007/11578079_103View ArticleGoogle Scholar
 Chang CC, Lin CJ: LIBSVM: a library for support vector machines. Tech. rep., Taiwan University, Department of Computer Science and Information Engineering; 2010.Google Scholar
 Sauvola J, Kauniskangas H: MediaTeam Document Database II, a CDROM collection of document images. Tech. rep., University of Oulu, Finland; 1999.Google Scholar
 Sauvola JJ, Haapakoski S, Kauniskangas H, Seppänen T, Pietikäinen M, Doermann DS: A distributed management system for testing document image analysis algorithms. ICDAR 1997, 989995.Google Scholar
 Wang J, Neskovic P, Cooper L: Training Data Selection for Support Vector Machines.In Advances in Natural Computation, Volume of Lecture Notes in Computer Science Edited by: Wang L, Chen K, Ong Y. Springer Berlin/Heidelberg; 2005, 3610: 421421. [http://dx.doi.org/10.1007/1153908771]Google Scholar
 Foody G, Mathur A: Toward intelligent training of supervised image classifications: directing training data acquisition for SVM classification. Remote Sensing of Environment 2004,93(12):107117. 10.1016/j.rse.2004.06.017View ArticleGoogle Scholar
 Strouthopoulos C, Papamarkos N: Text identification for document image analysis using a neural network. Image Vision Comput 1998,16(1213):879896. 10.1016/S02628856(98)000559View ArticleGoogle Scholar
 Nagy G, Set S: Hierarchical representation of optically scanned documents. Proc 7th Int Conference on Pattern Recognition 1984.Google Scholar
 Lin M, Tapamo J, Ndovie B: A texturebased method for document segmentation and classification. South African Computer Journal 2006, 36: 4956.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.