Update HTML and plain text documentation

This commit is contained in:
Philip Hazel
2023-08-11 19:09:17 +01:00
parent 5974a84364
commit 8314be909f
8 changed files with 989 additions and 972 deletions

View File

@@ -271,7 +271,7 @@ library. They are also documented in the pcre2build man page.
performance in the 8-bit and 16-bit libraries. In the 32-bit library, the performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
link size setting is ignored, as 4-byte offsets are always used. link size setting is ignored, as 4-byte offsets are always used.
, Lookbehind assertions in which one or more branches can match a variable . Lookbehind assertions in which one or more branches can match a variable
number of characters are supported only if there is a maximum matching length number of characters are supported only if there is a maximum matching length
for each top-level branch. There is a limit to this maximum that defaults to for each top-level branch. There is a limit to this maximum that defaults to
255 characters. You can alter this default by a setting such as 255 characters. You can alter this default by a setting such as
@@ -933,4 +933,4 @@ The distribution should contain the files listed below.
Philip Hazel Philip Hazel
Email local part: Philip.Hazel Email local part: Philip.Hazel
Email domain: gmail.com Email domain: gmail.com
Last updated: 09 August 2023 Last updated: 11 August 2023

View File

@@ -16,10 +16,10 @@ please consult the man page, in case the conversion went wrong.
DIFFERENCES BETWEEN PCRE2 AND PERL DIFFERENCES BETWEEN PCRE2 AND PERL
</b><br> </b><br>
<P> <P>
This document describes some of the differences in the ways that PCRE2 and Perl This document describes some of the known differences in the ways that PCRE2
handle regular expressions. The differences described here are with respect to and Perl handle regular expressions. The differences described here are with
Perl version 5.34.0, but as both Perl and PCRE2 are continually changing, the respect to Perl version 5.34.0, but as both Perl and PCRE2 are continually
information may at times be out of date. changing, the information may at times be out of date.
</P> </P>
<P> <P>
1. When PCRE2_DOTALL (equivalent to Perl's /s qualifier) is not set, the 1. When PCRE2_DOTALL (equivalent to Perl's /s qualifier) is not set, the
@@ -173,59 +173,48 @@ of which (such as named parentheses) were in PCRE2 for some time before. This
list is with respect to Perl 5.34: list is with respect to Perl 5.34:
<br> <br>
<br> <br>
(a) Although lookbehind assertions in PCRE2 must match fixed length strings, (a) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
each alternative toplevel branch of a lookbehind assertion can match a
different length of string. Perl used to require them all to have the same
length, but the latest version has some variable length support.
<br>
<br>
(b) From PCRE2 10.23, backreferences to groups of fixed length are supported
in lookbehinds, provided that there is no possibility of referencing a
non-unique number or name. Perl does not support backreferences in lookbehinds.
<br>
<br>
(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
meta-character matches only at the very end of the string. meta-character matches only at the very end of the string.
<br> <br>
<br> <br>
(d) A backslash followed by a letter with no special meaning is faulted. (Perl (b) A backslash followed by a letter with no special meaning is faulted. (Perl
can be made to issue a warning.) can be made to issue a warning.)
<br> <br>
<br> <br>
(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is (c) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
inverted, that is, by default they are not greedy, but if followed by a inverted, that is, by default they are not greedy, but if followed by a
question mark they are. question mark they are.
<br> <br>
<br> <br>
(f) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried (d) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
only at the first matching position in the subject string. only at the first matching position in the subject string.
<br> <br>
<br> <br>
(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART (e) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART
options have no Perl equivalents. options have no Perl equivalents.
<br> <br>
<br> <br>
(h) The \R escape sequence can be restricted to match only CR, LF, or CRLF (f) The \R escape sequence can be restricted to match only CR, LF, or CRLF
by the PCRE2_BSR_ANYCRLF option. by the PCRE2_BSR_ANYCRLF option.
<br> <br>
<br> <br>
(i) The callout facility is PCRE2-specific. Perl supports codeblocks and (g) The callout facility is PCRE2-specific. Perl supports codeblocks and
variable interpolation, but not general hooks on every match. variable interpolation, but not general hooks on every match.
<br> <br>
<br> <br>
(j) The partial matching facility is PCRE2-specific. (h) The partial matching facility is PCRE2-specific.
<br> <br>
<br> <br>
(k) The alternative matching function (<b>pcre2_dfa_match()</b> matches in a (i) The alternative matching function (<b>pcre2_dfa_match()</b> matches in a
different way and is not Perl-compatible. different way and is not Perl-compatible.
<br> <br>
<br> <br>
(l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at (j) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
the start of a pattern. These set overall options that cannot be changed within the start of a pattern. These set overall options that cannot be changed within
the pattern. the pattern.
<br> <br>
<br> <br>
(m) PCRE2 supports non-atomic positive lookaround assertions. This is an (k) PCRE2 supports non-atomic positive lookaround assertions. This is an
extension to the lookaround facilities. The default, Perl-compatible extension to the lookaround facilities. The default, Perl-compatible
lookarounds are atomic. lookarounds are atomic.
</P> </P>
@@ -252,7 +241,7 @@ Cambridge, England.
REVISION REVISION
</b><br> </b><br>
<P> <P>
Last updated: 03 February 2023 Last updated: 11 August 2023
<br> <br>
Copyright &copy; 1997-2023 University of Cambridge. Copyright &copy; 1997-2023 University of Cambridge.
<br> <br>

View File

@@ -47,7 +47,12 @@ and unset offsets.
All values in repeating quantifiers must be less than 65536. All values in repeating quantifiers must be less than 65536.
</P> </P>
<P> <P>
The maximum length of a lookbehind assertion is 65535 characters. There are two different limits that apply to branches of lookbehind assertions.
If every branch in such an assertion matches a fixed number of characters,
the maximum length of any branch is 65535 characters. If any branch matches a
variable number of characters, then the maximum matching length for every
branch is limited. The default limit is set at compile time, defaulting to 255,
but can be changed by the calling program.
</P> </P>
<P> <P>
There is no limit to the number of parenthesized groups, but there can be no There is no limit to the number of parenthesized groups, but there can be no
@@ -91,9 +96,9 @@ Cambridge, England.
REVISION REVISION
</b><br> </b><br>
<P> <P>
Last updated: 26 July 2022 Last updated: August 2023
<br> <br>
Copyright &copy; 1997-2022 University of Cambridge. Copyright &copy; 1997-2023 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@@ -2452,10 +2452,10 @@ If every top-level alternative matches a fixed length, for example
<pre> <pre>
(?&#60;=colour|color) (?&#60;=colour|color)
</pre> </pre>
there is no restriction on the lengths, which do not have to be the same, as there is a limit of 65535 characters to the lengths, which do not have to be
this example demonstrates. This is the only kind of lookbehind supported by the same, as this example demonstrates. This is the only kind of lookbehind
PCRE2 versions earlier than 10.43 and by the alternative matching function supported by PCRE2 versions earlier than 10.43 and by the alternative matching
<b>pcre2_dfa_match()</b>. function <b>pcre2_dfa_match()</b>.
</P> </P>
<P> <P>
In PCRE2 10.43 and later, <b>pcre2_match()</b> supports lookbehind assertions in In PCRE2 10.43 and later, <b>pcre2_match()</b> supports lookbehind assertions in
@@ -2464,9 +2464,9 @@ for example
<pre> <pre>
(?&#60;=colou?r) (?&#60;=colou?r)
</pre> </pre>
The maximum matching length for the whole lookbehind is limited to a value set The maximum matching length for any branch of the lookbehind is limited to a
by the calling program (default 255 characters). Unlimited repetition (for value set by the calling program (default 255 characters). Unlimited repetition
example \d*) is not supported. In some cases, the escape sequence \K (for example \d*) is not supported. In some cases, the escape sequence \K
<a href="#resetmatchstart">(see above)</a> <a href="#resetmatchstart">(see above)</a>
can be used instead of a lookbehind assertion at the start of a pattern to get can be used instead of a lookbehind assertion at the start of a pattern to get
round the length limit restriction. round the length limit restriction.
@@ -3781,7 +3781,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC32" href="#TOC1">REVISION</a><br> <br><a name="SEC32" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 09 August 2023 Last updated: 11 August 2023
<br> <br>
Copyright &copy; 1997-2023 University of Cambridge. Copyright &copy; 1997-2023 University of Cambridge.
<br> <br>

View File

@@ -463,8 +463,10 @@ setting with a similar syntax.
</pre> </pre>
Each top-level branch of a lookbehind must have a limit for the number of Each top-level branch of a lookbehind must have a limit for the number of
characters it matches. If any branch can match a variable number of characters, characters it matches. If any branch can match a variable number of characters,
the maximum for the entire lookbehind is limited. If every branch matches a the maximum for each branch is limited to a value set by the caller of
fixed number of characters, there is no constraint. <b>pcre2_compile()</b> or defaulted. The default is set when PCRE2 is built
(ultimate default 255). If every branch matches a fixed number of characters,
the limit for each branch is 65535 characters.
</P> </P>
<br><a name="SEC22" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br> <br><a name="SEC22" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
<P> <P>
@@ -602,7 +604,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC31" href="#TOC1">REVISION</a><br> <br><a name="SEC31" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 09 August 2023 Last updated: 11 August 2023
<br> <br>
Copyright &copy; 1997-2023 University of Cambridge. Copyright &copy; 1997-2023 University of Cambridge.
<br> <br>

View File

@@ -696,11 +696,12 @@ heavily used in the test files.
jitfast use JIT fast path jitfast use JIT fast path
jitverify verify JIT use jitverify verify JIT use
locale=&#60;name&#62; use this locale locale=&#60;name&#62; use this locale
max_pattern_length=&#60;n&#62; set the maximum pattern length max_pattern_length=&#60;n&#62; set maximum pattern length
max_varlookbehind=&#60;n&#62; set maximum variable lookbehind length
memory show memory used memory show memory used
newline=&#60;type&#62; set newline type newline=&#60;type&#62; set newline type
null_context compile with a NULL context null_context compile with a NULL context
null_pattern pass pattern as NULL null_pattern pass pattern as NULL
parens_nest_limit=&#60;n&#62; set maximum parentheses depth parens_nest_limit=&#60;n&#62; set maximum parentheses depth
posix use the POSIX API posix use the POSIX API
posix_nosub use the POSIX API with REG_NOSUB posix_nosub use the POSIX API with REG_NOSUB
@@ -848,6 +849,17 @@ If <b>hex</b> or <b>use_length</b> is used with the POSIX wrapper API (see
below), the REG_PEND extension is used to pass the pattern's length. below), the REG_PEND extension is used to pass the pattern's length.
</P> </P>
<br><b> <br><b>
Specifying a maximum for variable lookbehinds
</b><br>
<P>
Variable lookbehind assertions are supported only if, for each one, there is a
maximum length (in characters) that it can match. There is a limit on this,
whose default can be set at build time, with an ultimate default of 255. The
<b>max_varlookbehind</b> modifier uses the <b>pcre2_set_max_varlookbehind()</b>
function to change the limit. Lookbehinds whose branches each match a fixed
length are limited to 65535 characters per branch.
</P>
<br><b>
Specifying wide characters in 16-bit and 32-bit modes Specifying wide characters in 16-bit and 32-bit modes
</b><br> </b><br>
<P> <P>
@@ -2181,7 +2193,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br> <br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 17 July 2023 Last updated: 11 August 2023
<br> <br>
Copyright &copy; 1997-2023 University of Cambridge. Copyright &copy; 1997-2023 University of Cambridge.
<br> <br>

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff