mirror of
https://github.com/PCRE2Project/pcre2.git
synced 2025-10-22 16:08:34 +08:00
Documentation update for new PCRE2_EXTRA caseless and ASCII options
This commit is contained in:
@@ -118,21 +118,22 @@ and \B, because they are defined in terms of \w and \W. If you want
|
||||
to test for a wider sense of, say, "digit", you can use explicit Unicode
|
||||
property tests such as \p{Nd}. Alternatively, if you set the PCRE2_UCP option,
|
||||
the way that the character escapes work is changed so that Unicode properties
|
||||
are used to determine which characters match. There are more details in the
|
||||
section on
|
||||
are used to determine which characters match, though there are some options
|
||||
that suppress this for individual escapes. For details see the section on
|
||||
<a href="pcre2pattern.html#genericchartypes">generic character types</a>
|
||||
in the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
Similarly, characters that match the POSIX named character classes are all
|
||||
low-valued characters, unless the PCRE2_UCP option is set.
|
||||
Like the escapes, characters that match the POSIX named character classes are
|
||||
all low-valued characters unless the PCRE2_UCP option is set, but there is an
|
||||
option to override this.
|
||||
</P>
|
||||
<P>
|
||||
However, the special horizontal and vertical white space matching escapes (\h,
|
||||
\H, \v, and \V) do match all the appropriate Unicode characters, whether or
|
||||
not PCRE2_UCP is set.
|
||||
In contrast to the character escapes and character classes, the special
|
||||
horizontal and vertical white space escapes (\h, \H, \v, and \V) do match
|
||||
all the appropriate Unicode characters, whether or not PCRE2_UCP is set.
|
||||
</P>
|
||||
<br><b>
|
||||
UNICODE CASE-EQUIVALENCE
|
||||
@@ -145,6 +146,14 @@ lookup is used for speed. A few Unicode characters such as Greek sigma have
|
||||
more than two code points that are case-equivalent, and these are treated
|
||||
specially. Setting PCRE2_UCP without PCRE2_UTF allows Unicode-style case
|
||||
processing for non-UTF character encodings such as UCS-2.
|
||||
</P>
|
||||
<P>
|
||||
There are two ASCII characters (S and K) that, in addition to their ASCII lower
|
||||
case equivalents, have a non-ASCII one as well (long S and Kelvin sign).
|
||||
Recognition of these non-ASCII characters as case-equivalent to their ASCII
|
||||
counterparts can be disabled by setting the PCRE2_EXTRA_CASELESS_RESTRICT
|
||||
option. When this is set, all characters in a case equivalence must either be
|
||||
ASCII or non-ASCII; there can be no mixing.
|
||||
<a name="scriptruns"></a></P>
|
||||
<br><b>
|
||||
SCRIPT RUNS
|
||||
@@ -501,7 +510,7 @@ Cambridge, England.
|
||||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 20 January 2023
|
||||
Last updated: 04 February 2023
|
||||
<br>
|
||||
Copyright © 1997-2023 University of Cambridge.
|
||||
<br>
|
||||
|
Reference in New Issue
Block a user