1
0
mirror of https://github.com/opencv/opencv_contrib.git synced 2025-10-17 15:26:00 +08:00

Doxygen documentation for all modules

This commit is contained in:
Maksim Shabunin
2014-11-20 18:03:57 +03:00
parent 525c4d5ecd
commit a20c5c8dd9
179 changed files with 6621 additions and 1179 deletions

View File

@@ -41,4 +41,328 @@
#include "surface_matching/ppf_match_3d.hpp"
#include "surface_matching/icp.hpp"
/** @defgroup surface_matching Surface Matching
Introduction to Surface Matching
--------------------------------
Cameras and similar devices with the capability of sensation of 3D structure are becoming more
common. Thus, using depth and intensity information for matching 3D objects (or parts) are of
crucial importance for computer vision. Applications range from industrial control to guiding
everyday actions for visually impaired people. The task in recognition and pose estimation in range
images aims to identify and localize a queried 3D free-form object by matching it to the acquired
database.
From an industrial perspective, enabling robots to automatically locate and pick up randomly placed
and oriented objects from a bin is an important challenge in factory automation, replacing tedious
and heavy manual labor. A system should be able to recognize and locate objects with a predefined
shape and estimate the position with the precision necessary for a gripping robot to pick it up.
This is where vision guided robotics takes the stage. Similar tools are also capable of guiding
robots (and even people) through unstructured environments, leading to automated navigation. These
properties make 3D matching from point clouds a ubiquitous necessity. Within this context, I will
now describe the OpenCV implementation of a 3D object recognition and pose estimation algorithm
using 3D features.
Surface Matching Algorithm Through 3D Features
----------------------------------------------
The state of the algorithms in order to achieve the task 3D matching is heavily based on
@cite drost2010, which is one of the first and main practical methods presented in this area. The
approach is composed of extracting 3D feature points randomly from depth images or generic point
clouds, indexing them and later in runtime querying them efficiently. Only the 3D structure is
considered, and a trivial hash table is used for feature queries.
While being fully aware that utilization of the nice CAD model structure in order to achieve a smart
point sampling, I will be leaving that aside now in order to respect the generalizability of the
methods (Typically for such algorithms training on a CAD model is not needed, and a point cloud
would be sufficient). Below is the outline of the entire algorithm:
![Outline of the Algorithm](surface_matching/pics/outline.jpg)
As explained, the algorithm relies on the extraction and indexing of point pair features, which are
defined as follows:
\f[\bf{{F}}(\bf{{m1}}, \bf{{m2}}) = (||\bf{{d}}||_2, <(\bf{{n1}},\bf{{d}}), <(\bf{{n2}},\bf{{d}}), <(\bf{{n1}},\bf{{n2}}))\f]
where \f$\bf{{m1}}\f$ and \f$\bf{{m2}}\f$ are feature two selected points on the model (or scene),
\f$\bf{{d}}\f$ is the difference vector, \f$\bf{{n1}}\f$ and \f$\bf{{n2}}\f$ are the normals at \f$\bf{{m1}}\f$ and
\f$\bf{m2}\f$. During the training stage, this vector is quantized, indexed. In the test stage, same
features are extracted from the scene and compared to the database. With a few tricks like
separation of the rotational components, the pose estimation part can also be made efficient (check
the reference for more details). A Hough-like voting and clustering is employed to estimate the
object pose. To cluster the poses, the raw pose hypotheses are sorted in decreasing order of the
number of votes. From the highest vote, a new cluster is created. If the next pose hypothesis is
close to one of the existing clusters, the hypothesis is added to the cluster and the cluster center
is updated as the average of the pose hypotheses within the cluster. If the next hypothesis is not
close to any of the clusters, it creates a new cluster. The proximity testing is done with fixed
thresholds in translation and rotation. Distance computation and averaging for translation are
performed in the 3D Euclidean space, while those for rotation are performed using quaternion
representation. After clustering, the clusters are sorted in decreasing order of the total number of
votes which determines confidence of the estimated poses.
This pose is further refined using \f$ICP\f$ in order to obtain the final pose.
PPF presented above depends largely on robust computation of angles between 3D vectors. Even though
not reported in the paper, the naive way of doing this (\f$\theta = cos^{-1}({\bf{a}}\cdot{\bf{b}})\f$
remains numerically unstable. A better way to do this is then use inverse tangents, like:
\f[<(\bf{n1},\bf{n2})=tan^{-1}(||{\bf{n1} \wedge \bf{n2}}||_2, \bf{n1} \cdot \bf{n2})\f]
Rough Computation of Object Pose Given PPF
------------------------------------------
Let me summarize the following notation:
- \f$p^i_m\f$: \f$i^{th}\f$ point of the model (\f$p^j_m\f$ accordingly)
- \f$n^i_m\f$: Normal of the \f$i^{th}\f$ point of the model (\f$n^j_m\f$ accordingly)
- \f$p^i_s\f$: \f$i^{th}\f$ point of the scene (\f$p^j_s\f$ accordingly)
- \f$n^i_s\f$: Normal of the \f$i^{th}\f$ point of the scene (\f$n^j_s\f$ accordingly)
- \f$T_{m\rightarrow g}\f$: The transformation required to translate \f$p^i_m\f$ to the origin and rotate
its normal \f$n^i_m\f$ onto the \f$x\f$-axis.
- \f$R_{m\rightarrow g}\f$: Rotational component of \f$T_{m\rightarrow g}\f$.
- \f$t_{m\rightarrow g}\f$: Translational component of \f$T_{m\rightarrow g}\f$.
- \f$(p^i_m)^{'}\f$: \f$i^{th}\f$ point of the model transformed by \f$T_{m\rightarrow g}\f$. (\f$(p^j_m)^{'}\f$
accordingly).
- \f${\bf{R_{m\rightarrow g}}}\f$: Axis angle representation of rotation \f$R_{m\rightarrow g}\f$.
- \f$\theta_{m\rightarrow g}\f$: The angular component of the axis angle representation
\f${\bf{R_{m\rightarrow g}}}\f$.
The transformation in a point pair feature is computed by first finding the transformation
\f$T_{m\rightarrow g}\f$ from the first point, and applying the same transformation to the second one.
Transforming each point, together with the normal, to the ground plane leaves us with an angle to
find out, during a comparison with a new point pair.
We could now simply start writing
\f[(p^i_m)^{'} = T_{m\rightarrow g} p^i_m\f]
where
\f[T_{m\rightarrow g} = -t_{m\rightarrow g}R_{m\rightarrow g}\f]
Note that this is nothing but a stacked transformation. The translational component
\f$t_{m\rightarrow g}\f$ reads
\f[t_{m\rightarrow g} = -R_{m\rightarrow g}p^i_m\f]
and the rotational being
\f[\theta_{m\rightarrow g} = \cos^{-1}(n^i_m \cdot {\bf{x}})\\
{\bf{R_{m\rightarrow g}}} = n^i_m \wedge {\bf{x}}\f]
in axis angle format. Note that bold refers to the vector form. After this transformation, the
feature vectors of the model are registered onto the ground plane X and the angle with respect to
\f$x=0\f$ is called \f$\alpha_m\f$. Similarly, for the scene, it is called \f$\alpha_s\f$.
### Hough-like Voting Scheme
As shown in the outline, PPF (point pair features) are extracted from the model, quantized, stored
in the hashtable and indexed, during the training stage. During the runtime however, the similar
operation is perfomed on the input scene with the exception that this time a similarity lookup over
the hashtable is performed, instead of an insertion. This lookup also allows us to compute a
transformation to the ground plane for the scene pairs. After this point, computing the rotational
component of the pose reduces to computation of the difference \f$\alpha=\alpha_m-\alpha_s\f$. This
component carries the cue about the object pose. A Hough-like voting scheme is performed over the
local model coordinate vector and \f$\alpha\f$. The highest poses achieved for every scene point lets us
recover the object pose.
### Source Code for PPF Matching
~~~{cpp}
// pc is the loaded point cloud of the model
// (Nx6) and pcTest is a loaded point cloud of
// the scene (Mx6)
ppf_match_3d::PPF3DDetector detector(0.03, 0.05);
detector.trainModel(pc);
vector<Pose3DPtr> results;
detector.match(pcTest, results, 1.0/10.0, 0.05);
cout << "Poses: " << endl;
// print the poses
for (size_t i=0; i<results.size(); i++)
{
Pose3DPtr pose = results[i];
cout << "Pose Result " << i << endl;
pose->printPose();
}
~~~
Pose Registration via ICP
-------------------------
The matching process terminates with the attainment of the pose. However, due to the multiple
matching points, erroneous hypothesis, pose averaging and etc. such pose is very open to noise and
many times is far from being perfect. Although the visual results obtained in that stage are
pleasing, the quantitative evaluation shows \f$~10\f$ degrees variation (error), which is an acceptable
level of matching. Many times, the requirement might be set well beyond this margin and it is
desired to refine the computed pose.
Furthermore, in typical RGBD scenes and point clouds, 3D structure can capture only less than half
of the model due to the visibility in the scene. Therefore, a robust pose refinement algorithm,
which can register occluded and partially visible shapes quickly and correctly is not an unrealistic
wish.
At this point, a trivial option would be to use the well known iterative closest point algorithm .
However, utilization of the basic ICP leads to slow convergence, bad registration, outlier
sensitivity and failure to register partial shapes. Thus, it is definitely not suited to the
problem. For this reason, many variants have been proposed . Different variants contribute to
different stages of the pose estimation process.
ICP is composed of \f$6\f$ stages and the improvements I propose for each stage is summarized below.
### Sampling
To improve convergence speed and computation time, it is common to use less points than the model
actually has. However, sampling the correct points to register is an issue in itself. The naive way
would be to sample uniformly and hope to get a reasonable subset. More smarter ways try to identify
the critical points, which are found to highly contribute to the registration process. Gelfand et.
al. exploit the covariance matrix in order to constrain the eigenspace, so that a set of points
which affect both translation and rotation are used. This is a clever way of subsampling, which I
will optionally be using in the implementation.
### Correspondence Search
As the name implies, this step is actually the assignment of the points in the data and the model in
a closest point fashion. Correct assignments will lead to a correct pose, where wrong assignments
strongly degrade the result. In general, KD-trees are used in the search of nearest neighbors, to
increase the speed. However this is not an optimality guarantee and many times causes wrong points
to be matched. Luckily the assignments are corrected over iterations.
To overcome some of the limitations, Picky ICP @cite pickyicp and BC-ICP (ICP using bi-unique
correspondences) are two well-known methods. Picky ICP first finds the correspondences in the
old-fashioned way and then among the resulting corresponding pairs, if more than one scene point
\f$p_i\f$ is assigned to the same model point \f$m_j\f$, it selects \f$p_i\f$ that corresponds to the minimum
distance. BC-ICP on the other hand, allows multiple correspondences first and then resolves the
assignments by establishing bi-unique correspondences. It also defines a novel no-correspondence
outlier, which intrinsically eases the process of identifying outliers.
For reference, both methods are used. Because P-ICP is a bit faster, with not-so-significant
performance drawback, it will be the method of choice in refinment of correspondences.
### Weighting of Pairs
In my implementation, I currently do not use a weighting scheme. But the common approaches involve
*normal compatibility* (\f$w_i=n^1_i\cdot n^2_j\f$) or assigning lower weights to point pairs with
greater distances (\f$w=1-\frac{||dist(m_i,s_i)||_2}{dist_{max}}\f$).
### Rejection of Pairs
The rejections are done using a dynamic thresholding based on a robust estimate of the standard
deviation. In other words, in each iteration, I find the MAD estimate of the Std. Dev. I denote this
as \f$mad_i\f$. I reject the pairs with distances \f$d_i>\tau mad_i\f$. Here \f$\tau\f$ is the threshold of
rejection and by default set to \f$3\f$. The weighting is applied prior to Picky refinement, explained
in the previous stage.
### Error Metric
As described in , a linearization of point to plane as in @cite koklimlow error metric is used. This
both speeds up the registration process and improves convergence.
### Minimization
Even though many non-linear optimizers (such as Levenberg Mardquardt) are proposed, due to the
linearization in the previous step, pose estimation reduces to solving a linear system of equations.
This is what I do exactly using cv::solve with DECOMP_SVD option.
### ICP Algorithm
Having described the steps above, here I summarize the layout of the ICP algorithm.
#### Efficient ICP Through Point Cloud Pyramids
While the up-to-now-proposed variants deal well with some outliers and bad initializations, they
require significant number of iterations. Yet, multi-resolution scheme can help reducing the number
of iterations by allowing the registration to start from a coarse level and propagate to the lower
and finer levels. Such approach both improves the performances and enhances the runtime.
The search is done through multiple levels, in a hierarchical fashion. The registration starts with
a very coarse set of samples of the model. Iteratively, the points are densified and sought. After
each iteration the previously estimated pose is used as an initial pose and refined with the ICP.
#### Visual Results
##### Results on Synthetic Data
In all of the results, the pose is initiated by PPF and the rest is left as:
\f$[\theta_x, \theta_y, \theta_z, t_x, t_y, t_z]=[0]\f$
### Source Code for Pose Refinement Using ICP
~~~{cpp}
ICP icp(200, 0.001f, 2.5f, 8);
// Using the previously declared pc and pcTest
// This will perform registration for every pose
// contained in results
icp.registerModelToScene(pc, pcTest, results);
// results now contain the refined poses
~~~
Results
-------
This section is dedicated to the results of surface matching (point-pair-feature matching and a
following ICP refinement):
![Several matches of a single frog model using ppf + icp](surface_matching/pics/gsoc_forg_matches.jpg)
Matches of different models for Mian dataset is presented below:
![Matches of different models for Mian dataset](surface_matching/pics/snapshot27.jpg)
You might checkout the video on [youTube here](http://www.youtube.com/watch?v=uFnqLFznuZU).
A Complete Sample
-----------------
### Parameter Tuning
Surface matching module treats its parameters relative to the model diameter (diameter of the axis
parallel bounding box), whenever it can. This makes the parameters independent from the model size.
This is why, both model and scene cloud were subsampled such that all points have a minimum distance
of \f$RelativeSamplingStep*DimensionRange\f$, where \f$DimensionRange\f$ is the distance along a given
dimension. All three dimensions are sampled in similar manner. For example, if
\f$RelativeSamplingStep\f$ is set to 0.05 and the diameter of model is 1m (1000mm), the points sampled
from the object's surface will be approximately 50 mm apart. From another point of view, if the
sampling RelativeSamplingStep is set to 0.05, at most \f$20x20x20 = 8000\f$ model points are generated
(depending on how the model fills in the volume). Consequently this results in at most 8000x8000
pairs. In practice, because the models are not uniformly distributed over a rectangular prism, much
less points are to be expected. Decreasing this value, results in more model points and thus a more
accurate representation. However, note that number of point pair features to be computed is now
quadratically increased as the complexity is O(N\^2). This is especially a concern for 32 bit
systems, where large models can easily overshoot the available memory. Typically, values in the
range of 0.025 - 0.05 seem adequate for most of the applications, where the default value is 0.03.
(Note that there is a difference in this paremeter with the one presented in @cite drost2010. In
@cite drost2010 a uniform cuboid is used for quantization and model diameter is used for reference of
sampling. In my implementation, the cuboid is a rectangular prism, and each dimension is quantized
independently. I do not take reference from the diameter but along the individual dimensions.
It would very wise to remove the outliers from the model and prepare an ideal model initially. This
is because, the outliers directly affect the relative computations and degrade the matching
accuracy.
During runtime stage, the scene is again sampled by \f$RelativeSamplingStep\f$, as described above.
However this time, only a portion of the scene points are used as reference. This portion is
controlled by the parameter \f$RelativeSceneSampleStep\f$, where
\f$SceneSampleStep = (int)(1.0/RelativeSceneSampleStep)\f$. In other words, if the
\f$RelativeSceneSampleStep = 1.0/5.0\f$, the subsampled scene will once again be uniformly sampled to
1/5 of the number of points. Maximum value of this parameter is 1 and increasing this parameter also
increases the stability, but decreases the speed. Again, because of the initial scene-independent
relative sampling, fine tuning this parameter is not a big concern. This would only be an issue when
the model shape occupies a volume uniformly, or when the model shape is condensed in a tiny place
within the quantization volume (e.g. The octree representation would have too much empty cells).
\f$RelativeDistanceStep\f$ acts as a step of discretization over the hash table. The point pair features
are quantized to be mapped to the buckets of the hashtable. This discretization involves a
multiplication and a casting to the integer. Adjusting RelativeDistanceStep in theory controls the
collision rate. Note that, more collisions on the hashtable results in less accurate estimations.
Reducing this parameter increases the affect of quantization but starts to assign non-similar point
pairs to the same bins. Increasing it however, wanes the ability to group the similar pairs.
Generally, because during the sampling stage, the training model points are selected uniformly with
a distance controlled by RelativeSamplingStep, RelativeDistanceStep is expected to equate to this
value. Yet again, values in the range of 0.025-0.05 are sensible. This time however, when the model
is dense, it is not advised to decrease this value. For noisy scenes, the value can be increased to
improve the robustness of the matching against noisy points.
*/
#endif

View File

@@ -35,15 +35,12 @@
// and on any theory of liability, whether in contract, strict liability,
// or tort (including negligence or otherwise) arising in any way out of
// the use of this software, even if advised of the possibility of such damage.
//
// Author: Tolga Birdal <tbirdal AT gmail.com>
/**
* @file icp.hpp
* @file
*
* @brief Implementation of ICP (Iterative Closest Point) Algorithm
* @author Tolga Birdal
* @author Tolga Birdal <tbirdal AT gmail.com>
*/
#ifndef __OPENCV_SURFACE_MATCHING_ICP_HPP__
@@ -58,8 +55,11 @@ namespace cv
{
namespace ppf_match_3d
{
//! @addtogroup surface_matching
//! @{
/**
* @class ICP
* @brief This class implements a very efficient and robust variant of the iterative closest point (ICP) algorithm.
* The task is to register a 3D model (or point cloud) against a set of noisy target data. The variants are put together
* by myself after certain tests. The task is to be able to match partial, noisy point clouds in cluttered scenes, quickly.
@@ -161,6 +161,8 @@ private:
};
//! @}
} // namespace ppf_match_3d
} // namespace cv

View File

@@ -35,8 +35,10 @@
// and on any theory of liability, whether in contract, strict liability,
// or tort (including negligence or otherwise) arising in any way out of
// the use of this software, even if advised of the possibility of such damage.
//
// Author: Tolga Birdal <tbirdal AT gmail.com>
/** @file
@author Tolga Birdal <tbirdal AT gmail.com>
*/
#ifndef __OPENCV_SURFACE_MATCHING_POSE3D_HPP__
#define __OPENCV_SURFACE_MATCHING_POSE3D_HPP__
@@ -50,6 +52,9 @@ namespace cv
namespace ppf_match_3d
{
//! @addtogroup surface_matching
//! @{
class Pose3D;
typedef Ptr<Pose3D> Pose3DPtr;
@@ -57,7 +62,6 @@ class PoseCluster3D;
typedef Ptr<PoseCluster3D> PoseCluster3DPtr;
/**
* @class Pose3D
* @brief Class, allowing the storage of a pose. The data structure stores both
* the quaternions and the matrix forms. It supports IO functionality together with
* various helper methods to work with poses
@@ -127,7 +131,6 @@ public:
};
/**
* @class PoseCluster3D
* @brief When multiple poses (see Pose3D) are grouped together (contribute to the same transformation)
* pose clusters occur. This class is a general container for such groups of poses. It is possible to store,
* load and perform IO on these poses.
@@ -176,6 +179,7 @@ public:
int id;
};
//! @}
} // namespace ppf_match_3d
} // namespace cv

View File

@@ -36,7 +36,10 @@
// or tort (including negligence or otherwise) arising in any way out of
// the use of this software, even if advised of the possibility of such damage.
//
// Author: Tolga Birdal <tbirdal AT gmail.com>
/** @file
@author Tolga Birdal <tbirdal AT gmail.com>
*/
#ifndef __OPENCV_SURFACE_MATCHING_HELPERS_HPP__
#define __OPENCV_SURFACE_MATCHING_HELPERS_HPP__
@@ -48,6 +51,9 @@ namespace cv
namespace ppf_match_3d
{
//! @addtogroup surface_matching
//! @{
/**
* @brief Load a PLY file
* @param [in] fileName The PLY model to read
@@ -140,6 +146,9 @@ CV_EXPORTS Mat addNoisePC(Mat pc, double scale);
* @return Returns 0 on success
*/
CV_EXPORTS int computeNormalsPC3d(const Mat& PC, Mat& PCNormals, const int NumNeighbors, const bool FlipViewpoint, const double viewpoint[3]);
//! @}
} // namespace ppf_match_3d
} // namespace cv

View File

@@ -50,7 +50,10 @@
Model Globally, Match Locally: Efficient and Robust 3D Object Recognition
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, California (USA), June 2010.
***/
// Author: Tolga Birdal <tbirdal AT gmail.com>
/** @file
@author Tolga Birdal <tbirdal AT gmail.com>
*/
#ifndef __OPENCV_SURFACE_MATCHING_PPF_MATCH_3D_HPP__
@@ -67,8 +70,10 @@ namespace cv
namespace ppf_match_3d
{
//! @addtogroup surface_matching
//! @{
/**
* @struct THash
* @brief Struct, holding a node in the hashtable
*/
typedef struct THash
@@ -78,17 +83,16 @@ typedef struct THash
} THash;
/**
* @class PPF3DDetector
* @brief Class, allowing the load and matching 3D models.
* Typical Use:
*
* @code
* // Train a model
* ppf_match_3d::PPF3DDetector detector(0.05, 0.05);
* ppf_match_3d::PPF3DDetector detector(0.05, 0.05);
* detector.trainModel(pc);
* // Search the model in a given scene
* vector<Pose3DPtr> results;
* vector<Pose3DPtr> results;
* detector.match(pcTest, results, 1.0/5.0,0.05);
*
* @endcode
*/
class CV_EXPORTS PPF3DDetector
{
@@ -167,6 +171,8 @@ private:
bool trained;
};
//! @}
} // namespace ppf_match_3d
} // namespace cv

View File

@@ -36,7 +36,10 @@
// or tort (including negligence or otherwise) arising in any way out of
// the use of this software, even if advised of the possibility of such damage.
//
// Author: Tolga Birdal <tbirdal AT gmail.com>
/** @file
@author Tolga Birdal <tbirdal AT gmail.com>
*/
#ifndef __OPENCV_SURFACE_MATCHING_T_HASH_INT_HPP__
#define __OPENCV_SURFACE_MATCHING_T_HASH_INT_HPP__
@@ -49,6 +52,9 @@ namespace cv
namespace ppf_match_3d
{
//! @addtogroup surface_matching
//! @{
typedef unsigned int KeyType;
typedef struct hashnode_i
@@ -66,10 +72,12 @@ typedef struct HSHTBL_i
} hashtable_int;
/** @brief Round up to the next highest power of 2
from http://www-graphics.stanford.edu/~seander/bithacks.html
*/
inline static unsigned int next_power_of_two(unsigned int value)
{
/* Round up to the next highest power of 2 */
/* from http://www-graphics.stanford.edu/~seander/bithacks.html */
--value;
value |= value >> 1;
@@ -95,6 +103,8 @@ hashtable_int *hashtableRead(FILE* f);
int hashtableWrite(const hashtable_int * hashtbl, const size_t dataSize, FILE* f);
void hashtablePrint(hashtable_int *hashtbl);
//! @}
} // namespace ppf_match_3d
} // namespace cv