Documentation update for new PCRE2_EXTRA caseless and ASCII options

This commit is contained in:
Philip Hazel
2023-02-04 17:19:56 +00:00
parent 9c905ce0c1
commit 6bf8045997
18 changed files with 2797 additions and 2538 deletions

View File

@@ -118,21 +118,22 @@ and \B, because they are defined in terms of \w and \W. If you want
to test for a wider sense of, say, "digit", you can use explicit Unicode
property tests such as \p{Nd}. Alternatively, if you set the PCRE2_UCP option,
the way that the character escapes work is changed so that Unicode properties
are used to determine which characters match. There are more details in the
section on
are used to determine which characters match, though there are some options
that suppress this for individual escapes. For details see the section on
<a href="pcre2pattern.html#genericchartypes">generic character types</a>
in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
documentation.
</P>
<P>
Similarly, characters that match the POSIX named character classes are all
low-valued characters, unless the PCRE2_UCP option is set.
Like the escapes, characters that match the POSIX named character classes are
all low-valued characters unless the PCRE2_UCP option is set, but there is an
option to override this.
</P>
<P>
However, the special horizontal and vertical white space matching escapes (\h,
\H, \v, and \V) do match all the appropriate Unicode characters, whether or
not PCRE2_UCP is set.
In contrast to the character escapes and character classes, the special
horizontal and vertical white space escapes (\h, \H, \v, and \V) do match
all the appropriate Unicode characters, whether or not PCRE2_UCP is set.
</P>
<br><b>
UNICODE CASE-EQUIVALENCE
@@ -145,6 +146,14 @@ lookup is used for speed. A few Unicode characters such as Greek sigma have
more than two code points that are case-equivalent, and these are treated
specially. Setting PCRE2_UCP without PCRE2_UTF allows Unicode-style case
processing for non-UTF character encodings such as UCS-2.
</P>
<P>
There are two ASCII characters (S and K) that, in addition to their ASCII lower
case equivalents, have a non-ASCII one as well (long S and Kelvin sign).
Recognition of these non-ASCII characters as case-equivalent to their ASCII
counterparts can be disabled by setting the PCRE2_EXTRA_CASELESS_RESTRICT
option. When this is set, all characters in a case equivalence must either be
ASCII or non-ASCII; there can be no mixing.
<a name="scriptruns"></a></P>
<br><b>
SCRIPT RUNS
@@ -501,7 +510,7 @@ Cambridge, England.
REVISION
</b><br>
<P>
Last updated: 20 January 2023
Last updated: 04 February 2023
<br>
Copyright &copy; 1997-2023 University of Cambridge.
<br>