mirror of
https://github.com/opencv/opencv_contrib.git
synced 2025-10-23 18:09:25 +08:00
Merge pull request #46 from lluisgomez/master
updates documentation for the text module
This commit is contained in:
@@ -21,16 +21,20 @@ In the second stage, the ERs that passed the first stage are classified into cha
|
||||
|
||||
This ER filtering process is done in different single-channel projections of the input image in order to increase the character localization recall.
|
||||
|
||||
After the ER filtering is done on each input channel, character candidates must be grouped in high-level text blocks (i.e. words, text lines, paragraphs, ...). The grouping algorithm used in this implementation has been proposed by Lluis Gomez and Dimosthenis Karatzas in [Gomez13] and basically consist in finding meaningful groups of regions using a perceptual organization based clustering analisys (see :ocv:func:`erGrouping`).
|
||||
After the ER filtering is done on each input channel, character candidates must be grouped in high-level text blocks (i.e. words, text lines, paragraphs, ...). The opencv_text module implements two different grouping algorithms: the Exhaustive Search algorithm proposed in [Neumann11] for grouping horizontally aligned text, and the method proposed by Lluis Gomez and Dimosthenis Karatzas in [Gomez13][Gomez14] for grouping arbitrary oriented text (see :ocv:func:`erGrouping`).
|
||||
|
||||
|
||||
To see the text detector at work, have a look at the textdetection demo: https://github.com/Itseez/opencv/blob/master/samples/cpp/textdetection.cpp
|
||||
To see the text detector at work, have a look at the textdetection demo: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp
|
||||
|
||||
|
||||
.. [Neumann12] Neumann L., Matas J.: Real-Time Scene Text Localization and Recognition, CVPR 2012. The paper is available online at http://cmp.felk.cvut.cz/~neumalu1/neumann-cvpr2012.pdf
|
||||
|
||||
.. [Neumann11] Neumann L., Matas J.: Text Localization in Real-world Images using Efficiently Pruned Exhaustive Search, ICDAR 2011. The paper is available online at http://cmp.felk.cvut.cz/~neumalu1/icdar2011_article.pdf
|
||||
|
||||
.. [Gomez13] Gomez L. and Karatzas D.: Multi-script Text Extraction from Natural Scenes, ICDAR 2013. The paper is available online at http://158.109.8.37/files/GoK2013.pdf
|
||||
|
||||
.. [Gomez14] Gomez L. and Karatzas D.: A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction, arXiv:1407.7504 [cs.CV]. The paper is available online at http://arxiv.org/abs/1407.7504
|
||||
|
||||
|
||||
ERStat
|
||||
------
|
||||
@@ -198,14 +202,24 @@ erGrouping
|
||||
----------
|
||||
Find groups of Extremal Regions that are organized as text blocks.
|
||||
|
||||
.. ocv:function:: void erGrouping( InputArrayOfArrays src, std::vector<std::vector<ERStat> > ®ions, const std::string& filename, float minProbablity, std::vector<Rect > &groups)
|
||||
.. ocv:function:: void erGrouping(InputArray img, InputArrayOfArrays channels, std::vector<std::vector<ERStat> > ®ions, std::vector<std::vector<Vec2i> > &groups, std::vector<Rect> &groups_rects, int method = ERGROUPING_ORIENTATION_HORIZ, const std::string& filename = std::string(), float minProbablity = 0.5)
|
||||
|
||||
:param src: Vector of sinle channel images CV_8UC1 from wich the regions were extracted
|
||||
:param regions: Vector of ER's retreived from the ERFilter algorithm from each channel
|
||||
:param filename: The XML or YAML file with the classifier model (e.g. trained_classifier_erGrouping.xml)
|
||||
:param minProbability: The minimum probability for accepting a group
|
||||
:param groups: The output of the algorithm are stored in this parameter as list of rectangles.
|
||||
:param image: Original RGB or Greyscale image from wich the regions were extracted.
|
||||
:param src: Vector of single channel images CV_8UC1 from wich the regions were extracted.
|
||||
:param regions: Vector of ER's retreived from the ERFilter algorithm from each channel.
|
||||
:param groups: The output of the algorithm is stored in this parameter as set of lists of indexes to provided regions.
|
||||
:param groups_rects: The output of the algorithm are stored in this parameter as list of rectangles.
|
||||
:param method: Grouping method (see the details below). Can be one of ``ERGROUPING_ORIENTATION_HORIZ``, ``ERGROUPING_ORIENTATION_ANY``.
|
||||
:param filename: The XML or YAML file with the classifier model (e.g. samples/trained_classifier_erGrouping.xml). Only to use when grouping method is ``ERGROUPING_ORIENTATION_ANY``.
|
||||
:param minProbability: The minimum probability for accepting a group. Only to use when grouping method is ``ERGROUPING_ORIENTATION_ANY``.
|
||||
|
||||
This function implements the grouping algorithm described in [Gomez13]. Notice that this implementation constrains the results to horizontally-aligned text and latin script (since ERFilter classifiers are trained only for latin script detection).
|
||||
|
||||
The algorithm combines two different clustering techniques in a single parameter-free procedure to detect groups of regions organized as text. The maximally meaningful groups are fist detected in several feature spaces, where each feature space is a combination of proximity information (x,y coordinates) and a similarity measure (intensity, color, size, gradient magnitude, etc.), thus providing a set of hypotheses of text groups. Evidence Accumulation framework is used to combine all these hypotheses to get the final estimate. Each of the resulting groups are finally validated using a classifier in order to assess if they form a valid horizontally-aligned text block.
|
||||
This function implements two different grouping algorithms:
|
||||
|
||||
* **ERGROUPING_ORIENTATION_HORIZ**
|
||||
|
||||
Exhaustive Search algorithm proposed in [Neumann11] for grouping horizontally aligned text. The algorithm models a verification function for all the possible ER sequences. The verification fuction for ER pairs consists in a set of threshold-based pairwise rules which compare measurements of two regions (height ratio, centroid angle, and region distance). The verification function for ER triplets creates a word text line estimate using Least Median-Squares fitting for a given triplet and then verifies that the estimate is valid (based on thresholds created during training). Verification functions for sequences larger than 3 are approximated by verifying that the text line parameters of all (sub)sequences of length 3 are consistent.
|
||||
|
||||
* **ERGROUPING_ORIENTATION_ANY**
|
||||
|
||||
Text grouping method proposed in [Gomez13][Gomez14] for grouping arbitrary oriented text. Regions are agglomerated by Single Linkage Clustering in a weighted feature space that combines proximity (x,y coordinates) and similarity measures (color, size, gradient magnitude, stroke width, etc.). SLC provides a dendrogram where each node represents a text group hypothesis. Then the algorithm finds the branches corresponding to text groups by traversing this dendrogram with a stopping rule that combines the output of a rotation invariant text group classifier and a probabilistic measure for hierarchical clustering validity assessment.
|
||||
|
59
modules/text/doc/ocr.rst
Normal file
59
modules/text/doc/ocr.rst
Normal file
@@ -0,0 +1,59 @@
|
||||
Scene Text Recognition
|
||||
======================
|
||||
|
||||
.. highlight:: cpp
|
||||
|
||||
OCRTesseract
|
||||
------------
|
||||
.. ocv:class:: OCRTesseract
|
||||
|
||||
OCRTesseract class provides an interface with the tesseract-ocr API (v3.02.02) in C++. Notice that it is compiled only when tesseract-ocr is correctly installed. ::
|
||||
|
||||
class CV_EXPORTS OCRTesseract
|
||||
{
|
||||
private:
|
||||
tesseract::TessBaseAPI tess;
|
||||
|
||||
public:
|
||||
//! Default constructor
|
||||
OCRTesseract(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL,
|
||||
tesseract::OcrEngineMode oem=tesseract::OEM_DEFAULT, tesseract::PageSegMode psmode=tesseract::PSM_AUTO);
|
||||
|
||||
~OCRTesseract();
|
||||
|
||||
/*!
|
||||
the key method. Takes image on input and returns recognized text in the output_text parameter
|
||||
optionally provides also the Rects for individual text elements (e.g. words) and a list of
|
||||
ranked recognition alternatives.
|
||||
*/
|
||||
void run(Mat& image, string& output_text, vector<Rect>* component_rects=NULL,
|
||||
vector<string>* component_texts=NULL, vector<float>* component_confidences=NULL,
|
||||
int component_level=0);
|
||||
};
|
||||
|
||||
To see the OCRTesseract combined with scene text detection, have a look at the end_to_end_recognition demo: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/end_to_end_recognition.cpp
|
||||
|
||||
OCRTesseract::OCRTesseract
|
||||
--------------------------
|
||||
Constructor.
|
||||
|
||||
.. ocv:function:: void OCRTesseract::OCRTesseract(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL, tesseract::OcrEngineMode oem=tesseract::OEM_DEFAULT, tesseract::PageSegMode psmode=tesseract::PSM_AUTO)
|
||||
|
||||
:param datapath: the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.
|
||||
:param language: an ISO 639-3 code or NULL will default to "eng".
|
||||
:param char_whitelist: specifies the list of characters used for recognition. NULL defaults to "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".
|
||||
:param oem: tesseract-ocr offers different OCR Engine Modes (OEM), by deffault tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values.
|
||||
:param psmode: tesseract-ocr offers different Page Segmentation Modes (PSM) tesseract::PSM_AUTO (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values.
|
||||
|
||||
OCRTesseract::run
|
||||
-----------------
|
||||
Recognize text using the tesseract-ocr API. Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values.
|
||||
|
||||
.. ocv:function:: void OCRTesseract::run(Mat& image, string& output_text, vector<Rect>* component_rects=NULL, vector<string>* component_texts=NULL, vector<float>* component_confidences=NULL, int component_level=0)
|
||||
|
||||
:param image: Input image ``CV_8UC1`` or ``CV_8UC3``
|
||||
:param output_text: Output text of the tesseract-ocr.
|
||||
:param component_rects: If provided the method will output a list of Rects for the individual text elements found (e.g. words or text lines).
|
||||
:param component_text: If provided the method will output a list of text strings for the recognition of individual text elements found (e.g. words or text lines).
|
||||
:param component_confidences: If provided the method will output a list of confidence values for the recognition of individual text elements found (e.g. words or text lines).
|
||||
:param component_level: ``OCR_LEVEL_WORD`` (by default), or ``OCR_LEVEL_TEXT_LINE``.
|
@@ -1,10 +1,13 @@
|
||||
***************************
|
||||
objdetect. Object Detection
|
||||
***************************
|
||||
******************************************
|
||||
text. Scene Text Detection and Recognition
|
||||
******************************************
|
||||
|
||||
.. highlight:: cpp
|
||||
|
||||
The opencv_text module provides different algorithms for text detection and recognition in natural scene images.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
erfilter
|
||||
ocr
|
||||
|
Reference in New Issue
Block a user