From d8ad70d7f38cd25bddc9d4c7ef7dcaf52999b792 Mon Sep 17 00:00:00 2001
From: lluis <lgomez@cvc.uab.es>
Date: Thu, 31 Jul 2014 16:47:45 +0200
Subject: [PATCH 1/2] updates documentation for the text module

---
 modules/text/doc/erfilter.rst | 34 ++++++++++++++------
 modules/text/doc/ocr.rst      | 59 +++++++++++++++++++++++++++++++++++
 modules/text/doc/text.rst     |  9 ++++--
 3 files changed, 89 insertions(+), 13 deletions(-)
 create mode 100644 modules/text/doc/ocr.rst
diff --git a/modules/text/doc/erfilter.rst b/modules/text/doc/erfilter.rst
index 85d6bcc7f..685249c99 100644
--- a/modules/text/doc/erfilter.rst
+++ b/modules/text/doc/erfilter.rst
@@ -21,16 +21,20 @@ In the second stage, the ERs that passed the first stage are classified into cha
 
 This ER filtering process is done in different single-channel projections of the input image in order to increase the character localization recall.
 
-After the ER filtering is done on each input channel, character candidates must be grouped in high-level text blocks (i.e. words, text lines, paragraphs, ...). The grouping algorithm used in this implementation has been proposed by Lluis Gomez and Dimosthenis Karatzas in [Gomez13] and basically consist in finding meaningful groups of regions using a perceptual organization based clustering analisys (see :ocv:func:`erGrouping`).
+After the ER filtering is done on each input channel, character candidates must be grouped in high-level text blocks (i.e. words, text lines, paragraphs, ...). The opencv_text module implements two different grouping algorithms: the Exhaustive Search algorithm proposed in [Neumann11] for grouping horizontally aligned text, and the method proposed by Lluis Gomez and Dimosthenis Karatzas in [Gomez13][Gomez14] for grouping arbitrary oriented text (see :ocv:func:`erGrouping`).
 
 
-To see the text detector at work, have a look at the textdetection demo: https://github.com/Itseez/opencv/blob/master/samples/cpp/textdetection.cpp
+To see the text detector at work, have a look at the textdetection demo: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp
 
 
 .. [Neumann12] Neumann L., Matas J.: Real-Time Scene Text Localization and Recognition, CVPR 2012. The paper is available online at http://cmp.felk.cvut.cz/~neumalu1/neumann-cvpr2012.pdf
 
+.. [Neumann11] Neumann L., Matas J.: Text Localization in Real-world Images using Efficiently Pruned Exhaustive Search, ICDAR 2011. The paper is available online at http://cmp.felk.cvut.cz/~neumalu1/icdar2011_article.pdf
+
 .. [Gomez13] Gomez L. and Karatzas D.: Multi-script Text Extraction from Natural Scenes, ICDAR 2013. The paper is available online at http://158.109.8.37/files/GoK2013.pdf
 
+.. [Gomez14] Gomez L. and Karatzas D.: A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction, arXiv:1407.7504 [cs.CV]. The paper is available online at http://arxiv.org/abs/1407.7504
+
 
 ERStat
 ------
@@ -198,14 +202,24 @@ erGrouping
 ----------
 Find groups of Extremal Regions that are organized as text blocks.
 
-.. ocv:function:: void erGrouping( InputArrayOfArrays src, std::vector<std::vector<ERStat> > &regions, const std::string& filename, float minProbablity, std::vector<Rect > &groups)
+.. ocv:function:: void erGrouping(InputArray img, InputArrayOfArrays channels, std::vector<std::vector<ERStat> > &regions, std::vector<std::vector<Vec2i> > &groups, std::vector<Rect> &groups_rects, int method = ERGROUPING_ORIENTATION_HORIZ, const std::string& filename = std::string(), float minProbablity = 0.5)
 
-    :param src: Vector of sinle channel images CV_8UC1 from wich the regions were extracted
-    :param regions: Vector of ER's retreived from the ERFilter algorithm from each channel
-    :param filename: The XML or YAML file with the classifier model (e.g. trained_classifier_erGrouping.xml)
-    :param minProbability: The minimum probability for accepting a group
-    :param groups: The output of the algorithm are stored in this parameter as list of rectangles.
+    :param image: Original RGB or Greyscale image from wich the regions were extracted.
+    :param src: Vector of single channel images CV_8UC1 from wich the regions were extracted.
+    :param regions: Vector of ER's retreived from the ERFilter algorithm from each channel.
+    :param groups: The output of the algorithm is stored in this parameter as set of lists of indexes to provided regions.
+    :param groups_rects: The output of the algorithm are stored in this parameter as list of rectangles.
+    :param method: Grouping method (see the details below). Can be one of ``ERGROUPING_ORIENTATION_HORIZ``, ``ERGROUPING_ORIENTATION_ANY``.
+    :param filename: The XML or YAML file with the classifier model (e.g. samples/trained_classifier_erGrouping.xml). Only to use when grouping method is ``ERGROUPING_ORIENTATION_ANY``.
+    :param minProbability: The minimum probability for accepting a group. Only to use when grouping method is ``ERGROUPING_ORIENTATION_ANY``.
 
-This function implements the grouping algorithm described in [Gomez13]. Notice that this implementation constrains the results to horizontally-aligned text and latin script (since ERFilter classifiers are trained only for latin script detection).
 
-The algorithm combines two different clustering techniques in a single parameter-free procedure to detect groups of regions organized as text. The maximally meaningful groups are fist detected in several feature spaces, where each feature space is a combination of proximity information (x,y coordinates) and a similarity measure (intensity, color, size, gradient magnitude, etc.), thus providing a set of hypotheses of text groups. Evidence Accumulation framework is used to combine all these hypotheses to get the final estimate. Each of the resulting groups are finally validated using a classifier in order to assess if they form a valid horizontally-aligned text block.
+This function implements two different grouping algorithms:
+
+    * **ERGROUPING_ORIENTATION_HORIZ**
+      
+    Exhaustive Search algorithm proposed in [Neumann11] for grouping horizontally aligned text. The algorithm models a verification function for all the possible ER sequences. The verification fuction for ER pairs consists in a set of threshold-based pairwise rules which compare measurements of two regions (height ratio, centroid angle, and region distance). The verification function for ER triplets creates a word text line estimate using Least Median-Squares fitting for a given triplet and then verifies that the estimate is valid (based on thresholds created during training). Verification functions for sequences larger than 3 are approximated by verifying that the text line parameters of all (sub)sequences of length 3 are consistent.
+
+    * **ERGROUPING_ORIENTATION_ANY**
+      
+    Text grouping method proposed in [Gomez13][Gomez14] for grouping arbitrary oriented text. Regions are agglomerated by Single Linkage Clustering in a weighted feature space that combines proximity (x,y coordinates) and similarity measures (color, size, gradient magnitude, stroke width, etc.). SLC provides a dendrogram where each node represents a text group hypothesis. Then the algorithm finds the branches corresponding to text groups by traversing this dendrogram with a stopping rule that combines the output of a rotation invariant text group classifier and a probabilistic measure for hierarchical clustering validity assessment.
diff --git a/modules/text/doc/ocr.rst b/modules/text/doc/ocr.rst
new file mode 100644
index 000000000..34c23e896
--- /dev/null
+++ b/modules/text/doc/ocr.rst
@@ -0,0 +1,59 @@
+Scene Text Recognition
+======================
+
+.. highlight:: cpp
+
+OCRTesseract
+------------
+.. ocv:class:: OCRTesseract
+
+OCRTesseract class provides an interface with the tesseract-ocr API (v3.02.02) in C++. Notice that it is compiled only when tesseract-ocr is correctly installed. ::
+
+    class CV_EXPORTS OCRTesseract
+    {
+    private:
+        tesseract::TessBaseAPI tess;
+    
+    public:
+        //! Default constructor
+        OCRTesseract(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL,
+                     tesseract::OcrEngineMode oem=tesseract::OEM_DEFAULT, tesseract::PageSegMode psmode=tesseract::PSM_AUTO);
+    
+        ~OCRTesseract();
+    
+        /*!
+        the key method. Takes image on input and returns recognized text in the output_text parameter
+        optionally provides also the Rects for individual text elements (e.g. words) and a list of 
+        ranked recognition alternatives.
+        */
+        void run(Mat& image, string& output_text, vector<Rect>* component_rects=NULL,
+                 vector<string>* component_texts=NULL, vector<float>* component_confidences=NULL,
+                 int component_level=0);
+    };
+
+To see the OCRTesseract combined with scene text detection, have a look at the end_to_end_recognition demo: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/end_to_end_recognition.cpp
+
+OCRTesseract::OCRTesseract
+--------------------------
+Constructor.
+
+.. ocv:function:: void OCRTesseract::OCRTesseract(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL, tesseract::OcrEngineMode oem=tesseract::OEM_DEFAULT, tesseract::PageSegMode psmode=tesseract::PSM_AUTO);
+
+    :param datapath: the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.
+    :param language: an ISO 639-3 code or NULL will default to "eng".
+    :param char_whitelist: specifies the list of characters used for recognition. NULL defaults to "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".
+    :param oem: tesseract-ocr offers different OCR Engine Modes (OEM), by deffault tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values.
+    :param psmode: tesseract-ocr offers different Page Segmentation Modes (PSM) tesseract::PSM_AUTO (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values.
+
+OCRTesseract::run
+-----------------
+Recognize text using the tesseract-ocr API. Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values.
+
+.. ocv:function:: void OCRTesseract::run(Mat& image, string& output_text, vector<Rect>* component_rects=NULL, vector<string>* component_texts=NULL, vector<float>* component_confidences=NULL, int component_level=0);
+
+    :param image: Input image ``CV_8UC1`` or ``CV_8UC3``
+    :param output_text: Output text of the tesseract-ocr.
+    :param component_rects: If provided the method will output a list of Rects for the individual text elements found (e.g. words or text lines).
+    :param component_text: If provided the method will output a list of text strings for the recognition of individual text elements found (e.g. words or text lines).
+    :param component_confidences: If provided the method will output a list of confidence values for the recognition of individual text elements found (e.g. words or text lines).
+    :param component_level: ``OCR_LEVEL_WORD`` (by default), or ``OCR_LEVEL_TEXT_LINE``.
diff --git a/modules/text/doc/text.rst b/modules/text/doc/text.rst
index a72474381..8e319f92d 100644
--- a/modules/text/doc/text.rst
+++ b/modules/text/doc/text.rst
@@ -1,10 +1,13 @@
-***************************
-objdetect. Object Detection
-***************************
+******************************************
+text. Scene Text Detection and Recognition
+******************************************
 
 .. highlight:: cpp
 
+The opencv_text module provides different algorithms for text detection and recognition in natural scene images.
+
 .. toctree::
     :maxdepth: 2
 
     erfilter
+    ocr

From bcf38c3fbf8f6b421374ec2fd0e936ede06ddc36 Mon Sep 17 00:00:00 2001
From: lluis <lgomez@cvc.uab.es>
Date: Thu, 31 Jul 2014 16:56:48 +0200
Subject: [PATCH 2/2] fix docs warnings

---
 modules/text/doc/ocr.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/modules/text/doc/ocr.rst b/modules/text/doc/ocr.rst
index 34c23e896..8dd9e3e8f 100644
--- a/modules/text/doc/ocr.rst
+++ b/modules/text/doc/ocr.rst
@@ -37,7 +37,7 @@ OCRTesseract::OCRTesseract
 --------------------------
 Constructor.
 
-.. ocv:function:: void OCRTesseract::OCRTesseract(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL, tesseract::OcrEngineMode oem=tesseract::OEM_DEFAULT, tesseract::PageSegMode psmode=tesseract::PSM_AUTO);
+.. ocv:function:: void OCRTesseract::OCRTesseract(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL, tesseract::OcrEngineMode oem=tesseract::OEM_DEFAULT, tesseract::PageSegMode psmode=tesseract::PSM_AUTO)
 
     :param datapath: the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.
     :param language: an ISO 639-3 code or NULL will default to "eng".
@@ -49,7 +49,7 @@ OCRTesseract::run
 -----------------
 Recognize text using the tesseract-ocr API. Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values.
 
-.. ocv:function:: void OCRTesseract::run(Mat& image, string& output_text, vector<Rect>* component_rects=NULL, vector<string>* component_texts=NULL, vector<float>* component_confidences=NULL, int component_level=0);
+.. ocv:function:: void OCRTesseract::run(Mat& image, string& output_text, vector<Rect>* component_rects=NULL, vector<string>* component_texts=NULL, vector<float>* component_confidences=NULL, int component_level=0)
 
     :param image: Input image ``CV_8UC1`` or ``CV_8UC3``
     :param output_text: Output text of the tesseract-ocr.