Merge pull request #46 from lluisgomez/master

updates documentation for the text module
2025-10-23 18:09:25 +08:00 · 2014-08-01 12:50:24 +04:00
parent ff61080000 bcf38c3fbf
commit a001f2d68b
3 changed files with 89 additions and 13 deletions
--- a/modules/text/doc/erfilter.rst
+++ b/modules/text/doc/erfilter.rst
@@ -21,16 +21,20 @@ In the second stage, the ERs that passed the first stage are classified into cha

 This ER filtering process is done in different single-channel projections of the input image in order to increase the character localization recall.

-After the ER filtering is done on each input channel, character candidates must be grouped in high-level text blocks (i.e. words, text lines, paragraphs, ...). The grouping algorithm used in this implementation has been proposed by Lluis Gomez and Dimosthenis Karatzas in [Gomez13] and basically consist in finding meaningful groups of regions using a perceptual organization based clustering analisys (see :ocv:func:`erGrouping`).
+After the ER filtering is done on each input channel, character candidates must be grouped in high-level text blocks (i.e. words, text lines, paragraphs, ...). The opencv_text module implements two different grouping algorithms: the Exhaustive Search algorithm proposed in [Neumann11] for grouping horizontally aligned text, and the method proposed by Lluis Gomez and Dimosthenis Karatzas in [Gomez13][Gomez14] for grouping arbitrary oriented text (see :ocv:func:`erGrouping`).


-To see the text detector at work, have a look at the textdetection demo: https://github.com/Itseez/opencv/blob/master/samples/cpp/textdetection.cpp
+To see the text detector at work, have a look at the textdetection demo: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp


 .. [Neumann12] Neumann L., Matas J.: Real-Time Scene Text Localization and Recognition, CVPR 2012. The paper is available online at http://cmp.felk.cvut.cz/~neumalu1/neumann-cvpr2012.pdf

+.. [Neumann11] Neumann L., Matas J.: Text Localization in Real-world Images using Efficiently Pruned Exhaustive Search, ICDAR 2011. The paper is available online at http://cmp.felk.cvut.cz/~neumalu1/icdar2011_article.pdf
+
 .. [Gomez13] Gomez L. and Karatzas D.: Multi-script Text Extraction from Natural Scenes, ICDAR 2013. The paper is available online at http://158.109.8.37/files/GoK2013.pdf

+.. [Gomez14] Gomez L. and Karatzas D.: A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction, arXiv:1407.7504 [cs.CV]. The paper is available online at http://arxiv.org/abs/1407.7504
+

 ERStat
 ------
@@ -198,14 +202,24 @@ erGrouping
 ----------
 Find groups of Extremal Regions that are organized as text blocks.

-.. ocv:function:: void erGrouping( InputArrayOfArrays src, std::vector<std::vector<ERStat> > &regions, const std::string& filename, float minProbablity, std::vector<Rect > &groups)
+.. ocv:function:: void erGrouping(InputArray img, InputArrayOfArrays channels, std::vector<std::vector<ERStat> > &regions, std::vector<std::vector<Vec2i> > &groups, std::vector<Rect> &groups_rects, int method = ERGROUPING_ORIENTATION_HORIZ, const std::string& filename = std::string(), float minProbablity = 0.5)

-    :param src: Vector of sinle channel images CV_8UC1 from wich the regions were extracted
-    :param regions: Vector of ER's retreived from the ERFilter algorithm from each channel
-    :param filename: The XML or YAML file with the classifier model (e.g. trained_classifier_erGrouping.xml)
-    :param minProbability: The minimum probability for accepting a group
-    :param groups: The output of the algorithm are stored in this parameter as list of rectangles.
+    :param image: Original RGB or Greyscale image from wich the regions were extracted.
+    :param src: Vector of single channel images CV_8UC1 from wich the regions were extracted.
+    :param regions: Vector of ER's retreived from the ERFilter algorithm from each channel.
+    :param groups: The output of the algorithm is stored in this parameter as set of lists of indexes to provided regions.
+    :param groups_rects: The output of the algorithm are stored in this parameter as list of rectangles.
+    :param method: Grouping method (see the details below). Can be one of ``ERGROUPING_ORIENTATION_HORIZ``, ``ERGROUPING_ORIENTATION_ANY``.
+    :param filename: The XML or YAML file with the classifier model (e.g. samples/trained_classifier_erGrouping.xml). Only to use when grouping method is ``ERGROUPING_ORIENTATION_ANY``.
+    :param minProbability: The minimum probability for accepting a group. Only to use when grouping method is ``ERGROUPING_ORIENTATION_ANY``.

-This function implements the grouping algorithm described in [Gomez13]. Notice that this implementation constrains the results to horizontally-aligned text and latin script (since ERFilter classifiers are trained only for latin script detection).

-The algorithm combines two different clustering techniques in a single parameter-free procedure to detect groups of regions organized as text. The maximally meaningful groups are fist detected in several feature spaces, where each feature space is a combination of proximity information (x,y coordinates) and a similarity measure (intensity, color, size, gradient magnitude, etc.), thus providing a set of hypotheses of text groups. Evidence Accumulation framework is used to combine all these hypotheses to get the final estimate. Each of the resulting groups are finally validated using a classifier in order to assess if they form a valid horizontally-aligned text block.
+This function implements two different grouping algorithms:
+
+    * **ERGROUPING_ORIENTATION_HORIZ**
+      
+    Exhaustive Search algorithm proposed in [Neumann11] for grouping horizontally aligned text. The algorithm models a verification function for all the possible ER sequences. The verification fuction for ER pairs consists in a set of threshold-based pairwise rules which compare measurements of two regions (height ratio, centroid angle, and region distance). The verification function for ER triplets creates a word text line estimate using Least Median-Squares fitting for a given triplet and then verifies that the estimate is valid (based on thresholds created during training). Verification functions for sequences larger than 3 are approximated by verifying that the text line parameters of all (sub)sequences of length 3 are consistent.
+
+    * **ERGROUPING_ORIENTATION_ANY**
+      
+    Text grouping method proposed in [Gomez13][Gomez14] for grouping arbitrary oriented text. Regions are agglomerated by Single Linkage Clustering in a weighted feature space that combines proximity (x,y coordinates) and similarity measures (color, size, gradient magnitude, stroke width, etc.). SLC provides a dendrogram where each node represents a text group hypothesis. Then the algorithm finds the branches corresponding to text groups by traversing this dendrogram with a stopping rule that combines the output of a rotation invariant text group classifier and a probabilistic measure for hierarchical clustering validity assessment.
--- a/modules/text/doc/ocr.rst
+++ b/modules/text/doc/ocr.rst
@@ -0,0 +1,59 @@
+Scene Text Recognition
+======================
+
+.. highlight:: cpp
+
+OCRTesseract
+------------
+.. ocv:class:: OCRTesseract
+
+OCRTesseract class provides an interface with the tesseract-ocr API (v3.02.02) in C++. Notice that it is compiled only when tesseract-ocr is correctly installed. ::
+
+    class CV_EXPORTS OCRTesseract
+    {
+    private:
+        tesseract::TessBaseAPI tess;
+    
+    public:
+        //! Default constructor
+        OCRTesseract(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL,
+                     tesseract::OcrEngineMode oem=tesseract::OEM_DEFAULT, tesseract::PageSegMode psmode=tesseract::PSM_AUTO);
+    
+        ~OCRTesseract();
+    
+        /*!
+        the key method. Takes image on input and returns recognized text in the output_text parameter
+        optionally provides also the Rects for individual text elements (e.g. words) and a list of 
+        ranked recognition alternatives.
+        */
+        void run(Mat& image, string& output_text, vector<Rect>* component_rects=NULL,
+                 vector<string>* component_texts=NULL, vector<float>* component_confidences=NULL,
+                 int component_level=0);
+    };
+
+To see the OCRTesseract combined with scene text detection, have a look at the end_to_end_recognition demo: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/end_to_end_recognition.cpp
+
+OCRTesseract::OCRTesseract
+--------------------------
+Constructor.
+
+.. ocv:function:: void OCRTesseract::OCRTesseract(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL, tesseract::OcrEngineMode oem=tesseract::OEM_DEFAULT, tesseract::PageSegMode psmode=tesseract::PSM_AUTO)
+
+    :param datapath: the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.
+    :param language: an ISO 639-3 code or NULL will default to "eng".
+    :param char_whitelist: specifies the list of characters used for recognition. NULL defaults to "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".
+    :param oem: tesseract-ocr offers different OCR Engine Modes (OEM), by deffault tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values.
+    :param psmode: tesseract-ocr offers different Page Segmentation Modes (PSM) tesseract::PSM_AUTO (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values.
+
+OCRTesseract::run
+-----------------
+Recognize text using the tesseract-ocr API. Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values.
+
+.. ocv:function:: void OCRTesseract::run(Mat& image, string& output_text, vector<Rect>* component_rects=NULL, vector<string>* component_texts=NULL, vector<float>* component_confidences=NULL, int component_level=0)
+
+    :param image: Input image ``CV_8UC1`` or ``CV_8UC3``
+    :param output_text: Output text of the tesseract-ocr.
+    :param component_rects: If provided the method will output a list of Rects for the individual text elements found (e.g. words or text lines).
+    :param component_text: If provided the method will output a list of text strings for the recognition of individual text elements found (e.g. words or text lines).
+    :param component_confidences: If provided the method will output a list of confidence values for the recognition of individual text elements found (e.g. words or text lines).
+    :param component_level: ``OCR_LEVEL_WORD`` (by default), or ``OCR_LEVEL_TEXT_LINE``.
--- a/modules/text/doc/text.rst
+++ b/modules/text/doc/text.rst
@@ -1,10 +1,13 @@
-***************************
-objdetect. Object Detection
-***************************
+******************************************
+text. Scene Text Detection and Recognition
+******************************************

 .. highlight:: cpp

+The opencv_text module provides different algorithms for text detection and recognition in natural scene images.
+
 .. toctree::
    :maxdepth: 2

    erfilter
+    ocr