From d8ad70d7f38cd25bddc9d4c7ef7dcaf52999b792 Mon Sep 17 00:00:00 2001 From: lluis Date: Thu, 31 Jul 2014 16:47:45 +0200 Subject: [PATCH 1/2] updates documentation for the text module --- modules/text/doc/erfilter.rst | 34 ++++++++++++++------ modules/text/doc/ocr.rst | 59 +++++++++++++++++++++++++++++++++++ modules/text/doc/text.rst | 9 ++++-- 3 files changed, 89 insertions(+), 13 deletions(-) create mode 100644 modules/text/doc/ocr.rst diff --git a/modules/text/doc/erfilter.rst b/modules/text/doc/erfilter.rst index 85d6bcc7f..685249c99 100644 --- a/modules/text/doc/erfilter.rst +++ b/modules/text/doc/erfilter.rst @@ -21,16 +21,20 @@ In the second stage, the ERs that passed the first stage are classified into cha This ER filtering process is done in different single-channel projections of the input image in order to increase the character localization recall. -After the ER filtering is done on each input channel, character candidates must be grouped in high-level text blocks (i.e. words, text lines, paragraphs, ...). The grouping algorithm used in this implementation has been proposed by Lluis Gomez and Dimosthenis Karatzas in [Gomez13] and basically consist in finding meaningful groups of regions using a perceptual organization based clustering analisys (see :ocv:func:`erGrouping`). +After the ER filtering is done on each input channel, character candidates must be grouped in high-level text blocks (i.e. words, text lines, paragraphs, ...). The opencv_text module implements two different grouping algorithms: the Exhaustive Search algorithm proposed in [Neumann11] for grouping horizontally aligned text, and the method proposed by Lluis Gomez and Dimosthenis Karatzas in [Gomez13][Gomez14] for grouping arbitrary oriented text (see :ocv:func:`erGrouping`). -To see the text detector at work, have a look at the textdetection demo: https://github.com/Itseez/opencv/blob/master/samples/cpp/textdetection.cpp +To see the text detector at work, have a look at the textdetection demo: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp .. [Neumann12] Neumann L., Matas J.: Real-Time Scene Text Localization and Recognition, CVPR 2012. The paper is available online at http://cmp.felk.cvut.cz/~neumalu1/neumann-cvpr2012.pdf +.. [Neumann11] Neumann L., Matas J.: Text Localization in Real-world Images using Efficiently Pruned Exhaustive Search, ICDAR 2011. The paper is available online at http://cmp.felk.cvut.cz/~neumalu1/icdar2011_article.pdf + .. [Gomez13] Gomez L. and Karatzas D.: Multi-script Text Extraction from Natural Scenes, ICDAR 2013. The paper is available online at http://158.109.8.37/files/GoK2013.pdf +.. [Gomez14] Gomez L. and Karatzas D.: A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction, arXiv:1407.7504 [cs.CV]. The paper is available online at http://arxiv.org/abs/1407.7504 + ERStat ------ @@ -198,14 +202,24 @@ erGrouping ---------- Find groups of Extremal Regions that are organized as text blocks. -.. ocv:function:: void erGrouping( InputArrayOfArrays src, std::vector > ®ions, const std::string& filename, float minProbablity, std::vector &groups) +.. ocv:function:: void erGrouping(InputArray img, InputArrayOfArrays channels, std::vector > ®ions, std::vector > &groups, std::vector &groups_rects, int method = ERGROUPING_ORIENTATION_HORIZ, const std::string& filename = std::string(), float minProbablity = 0.5) - :param src: Vector of sinle channel images CV_8UC1 from wich the regions were extracted - :param regions: Vector of ER's retreived from the ERFilter algorithm from each channel - :param filename: The XML or YAML file with the classifier model (e.g. trained_classifier_erGrouping.xml) - :param minProbability: The minimum probability for accepting a group - :param groups: The output of the algorithm are stored in this parameter as list of rectangles. + :param image: Original RGB or Greyscale image from wich the regions were extracted. + :param src: Vector of single channel images CV_8UC1 from wich the regions were extracted. + :param regions: Vector of ER's retreived from the ERFilter algorithm from each channel. + :param groups: The output of the algorithm is stored in this parameter as set of lists of indexes to provided regions. + :param groups_rects: The output of the algorithm are stored in this parameter as list of rectangles. + :param method: Grouping method (see the details below). Can be one of ``ERGROUPING_ORIENTATION_HORIZ``, ``ERGROUPING_ORIENTATION_ANY``. + :param filename: The XML or YAML file with the classifier model (e.g. samples/trained_classifier_erGrouping.xml). Only to use when grouping method is ``ERGROUPING_ORIENTATION_ANY``. + :param minProbability: The minimum probability for accepting a group. Only to use when grouping method is ``ERGROUPING_ORIENTATION_ANY``. -This function implements the grouping algorithm described in [Gomez13]. Notice that this implementation constrains the results to horizontally-aligned text and latin script (since ERFilter classifiers are trained only for latin script detection). -The algorithm combines two different clustering techniques in a single parameter-free procedure to detect groups of regions organized as text. The maximally meaningful groups are fist detected in several feature spaces, where each feature space is a combination of proximity information (x,y coordinates) and a similarity measure (intensity, color, size, gradient magnitude, etc.), thus providing a set of hypotheses of text groups. Evidence Accumulation framework is used to combine all these hypotheses to get the final estimate. Each of the resulting groups are finally validated using a classifier in order to assess if they form a valid horizontally-aligned text block. +This function implements two different grouping algorithms: + + * **ERGROUPING_ORIENTATION_HORIZ** + + Exhaustive Search algorithm proposed in [Neumann11] for grouping horizontally aligned text. The algorithm models a verification function for all the possible ER sequences. The verification fuction for ER pairs consists in a set of threshold-based pairwise rules which compare measurements of two regions (height ratio, centroid angle, and region distance). The verification function for ER triplets creates a word text line estimate using Least Median-Squares fitting for a given triplet and then verifies that the estimate is valid (based on thresholds created during training). Verification functions for sequences larger than 3 are approximated by verifying that the text line parameters of all (sub)sequences of length 3 are consistent. + + * **ERGROUPING_ORIENTATION_ANY** + + Text grouping method proposed in [Gomez13][Gomez14] for grouping arbitrary oriented text. Regions are agglomerated by Single Linkage Clustering in a weighted feature space that combines proximity (x,y coordinates) and similarity measures (color, size, gradient magnitude, stroke width, etc.). SLC provides a dendrogram where each node represents a text group hypothesis. Then the algorithm finds the branches corresponding to text groups by traversing this dendrogram with a stopping rule that combines the output of a rotation invariant text group classifier and a probabilistic measure for hierarchical clustering validity assessment. diff --git a/modules/text/doc/ocr.rst b/modules/text/doc/ocr.rst new file mode 100644 index 000000000..34c23e896 --- /dev/null +++ b/modules/text/doc/ocr.rst @@ -0,0 +1,59 @@ +Scene Text Recognition +====================== + +.. highlight:: cpp + +OCRTesseract +------------ +.. ocv:class:: OCRTesseract + +OCRTesseract class provides an interface with the tesseract-ocr API (v3.02.02) in C++. Notice that it is compiled only when tesseract-ocr is correctly installed. :: + + class CV_EXPORTS OCRTesseract + { + private: + tesseract::TessBaseAPI tess; + + public: + //! Default constructor + OCRTesseract(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL, + tesseract::OcrEngineMode oem=tesseract::OEM_DEFAULT, tesseract::PageSegMode psmode=tesseract::PSM_AUTO); + + ~OCRTesseract(); + + /*! + the key method. Takes image on input and returns recognized text in the output_text parameter + optionally provides also the Rects for individual text elements (e.g. words) and a list of + ranked recognition alternatives. + */ + void run(Mat& image, string& output_text, vector* component_rects=NULL, + vector* component_texts=NULL, vector* component_confidences=NULL, + int component_level=0); + }; + +To see the OCRTesseract combined with scene text detection, have a look at the end_to_end_recognition demo: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/end_to_end_recognition.cpp + +OCRTesseract::OCRTesseract +-------------------------- +Constructor. + +.. ocv:function:: void OCRTesseract::OCRTesseract(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL, tesseract::OcrEngineMode oem=tesseract::OEM_DEFAULT, tesseract::PageSegMode psmode=tesseract::PSM_AUTO); + + :param datapath: the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory. + :param language: an ISO 639-3 code or NULL will default to "eng". + :param char_whitelist: specifies the list of characters used for recognition. NULL defaults to "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ". + :param oem: tesseract-ocr offers different OCR Engine Modes (OEM), by deffault tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values. + :param psmode: tesseract-ocr offers different Page Segmentation Modes (PSM) tesseract::PSM_AUTO (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values. + +OCRTesseract::run +----------------- +Recognize text using the tesseract-ocr API. Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values. + +.. ocv:function:: void OCRTesseract::run(Mat& image, string& output_text, vector* component_rects=NULL, vector* component_texts=NULL, vector* component_confidences=NULL, int component_level=0); + + :param image: Input image ``CV_8UC1`` or ``CV_8UC3`` + :param output_text: Output text of the tesseract-ocr. + :param component_rects: If provided the method will output a list of Rects for the individual text elements found (e.g. words or text lines). + :param component_text: If provided the method will output a list of text strings for the recognition of individual text elements found (e.g. words or text lines). + :param component_confidences: If provided the method will output a list of confidence values for the recognition of individual text elements found (e.g. words or text lines). + :param component_level: ``OCR_LEVEL_WORD`` (by default), or ``OCR_LEVEL_TEXT_LINE``. diff --git a/modules/text/doc/text.rst b/modules/text/doc/text.rst index a72474381..8e319f92d 100644 --- a/modules/text/doc/text.rst +++ b/modules/text/doc/text.rst @@ -1,10 +1,13 @@ -*************************** -objdetect. Object Detection -*************************** +****************************************** +text. Scene Text Detection and Recognition +****************************************** .. highlight:: cpp +The opencv_text module provides different algorithms for text detection and recognition in natural scene images. + .. toctree:: :maxdepth: 2 erfilter + ocr From bcf38c3fbf8f6b421374ec2fd0e936ede06ddc36 Mon Sep 17 00:00:00 2001 From: lluis Date: Thu, 31 Jul 2014 16:56:48 +0200 Subject: [PATCH 2/2] fix docs warnings --- modules/text/doc/ocr.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/modules/text/doc/ocr.rst b/modules/text/doc/ocr.rst index 34c23e896..8dd9e3e8f 100644 --- a/modules/text/doc/ocr.rst +++ b/modules/text/doc/ocr.rst @@ -37,7 +37,7 @@ OCRTesseract::OCRTesseract -------------------------- Constructor. -.. ocv:function:: void OCRTesseract::OCRTesseract(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL, tesseract::OcrEngineMode oem=tesseract::OEM_DEFAULT, tesseract::PageSegMode psmode=tesseract::PSM_AUTO); +.. ocv:function:: void OCRTesseract::OCRTesseract(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL, tesseract::OcrEngineMode oem=tesseract::OEM_DEFAULT, tesseract::PageSegMode psmode=tesseract::PSM_AUTO) :param datapath: the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory. :param language: an ISO 639-3 code or NULL will default to "eng". @@ -49,7 +49,7 @@ OCRTesseract::run ----------------- Recognize text using the tesseract-ocr API. Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values. -.. ocv:function:: void OCRTesseract::run(Mat& image, string& output_text, vector* component_rects=NULL, vector* component_texts=NULL, vector* component_confidences=NULL, int component_level=0); +.. ocv:function:: void OCRTesseract::run(Mat& image, string& output_text, vector* component_rects=NULL, vector* component_texts=NULL, vector* component_confidences=NULL, int component_level=0) :param image: Input image ``CV_8UC1`` or ``CV_8UC3`` :param output_text: Output text of the tesseract-ocr.