Update HTML and plain text documentation

This commit is contained in:
Philip Hazel
2023-08-11 19:09:17 +01:00
parent 5974a84364
commit 8314be909f
8 changed files with 989 additions and 972 deletions

View File

@@ -271,7 +271,7 @@ library. They are also documented in the pcre2build man page.
performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
link size setting is ignored, as 4-byte offsets are always used.
, Lookbehind assertions in which one or more branches can match a variable
. Lookbehind assertions in which one or more branches can match a variable
number of characters are supported only if there is a maximum matching length
for each top-level branch. There is a limit to this maximum that defaults to
255 characters. You can alter this default by a setting such as
@@ -933,4 +933,4 @@ The distribution should contain the files listed below.
Philip Hazel
Email local part: Philip.Hazel
Email domain: gmail.com
Last updated: 09 August 2023
Last updated: 11 August 2023

View File

@@ -16,10 +16,10 @@ please consult the man page, in case the conversion went wrong.
DIFFERENCES BETWEEN PCRE2 AND PERL
</b><br>
<P>
This document describes some of the differences in the ways that PCRE2 and Perl
handle regular expressions. The differences described here are with respect to
Perl version 5.34.0, but as both Perl and PCRE2 are continually changing, the
information may at times be out of date.
This document describes some of the known differences in the ways that PCRE2
and Perl handle regular expressions. The differences described here are with
respect to Perl version 5.34.0, but as both Perl and PCRE2 are continually
changing, the information may at times be out of date.
</P>
<P>
1. When PCRE2_DOTALL (equivalent to Perl's /s qualifier) is not set, the
@@ -173,59 +173,48 @@ of which (such as named parentheses) were in PCRE2 for some time before. This
list is with respect to Perl 5.34:
<br>
<br>
(a) Although lookbehind assertions in PCRE2 must match fixed length strings,
each alternative toplevel branch of a lookbehind assertion can match a
different length of string. Perl used to require them all to have the same
length, but the latest version has some variable length support.
<br>
<br>
(b) From PCRE2 10.23, backreferences to groups of fixed length are supported
in lookbehinds, provided that there is no possibility of referencing a
non-unique number or name. Perl does not support backreferences in lookbehinds.
<br>
<br>
(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
(a) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
meta-character matches only at the very end of the string.
<br>
<br>
(d) A backslash followed by a letter with no special meaning is faulted. (Perl
(b) A backslash followed by a letter with no special meaning is faulted. (Perl
can be made to issue a warning.)
<br>
<br>
(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
(c) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
inverted, that is, by default they are not greedy, but if followed by a
question mark they are.
<br>
<br>
(f) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
(d) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
only at the first matching position in the subject string.
<br>
<br>
(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART
(e) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART
options have no Perl equivalents.
<br>
<br>
(h) The \R escape sequence can be restricted to match only CR, LF, or CRLF
(f) The \R escape sequence can be restricted to match only CR, LF, or CRLF
by the PCRE2_BSR_ANYCRLF option.
<br>
<br>
(i) The callout facility is PCRE2-specific. Perl supports codeblocks and
(g) The callout facility is PCRE2-specific. Perl supports codeblocks and
variable interpolation, but not general hooks on every match.
<br>
<br>
(j) The partial matching facility is PCRE2-specific.
(h) The partial matching facility is PCRE2-specific.
<br>
<br>
(k) The alternative matching function (<b>pcre2_dfa_match()</b> matches in a
(i) The alternative matching function (<b>pcre2_dfa_match()</b> matches in a
different way and is not Perl-compatible.
<br>
<br>
(l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
(j) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
the start of a pattern. These set overall options that cannot be changed within
the pattern.
<br>
<br>
(m) PCRE2 supports non-atomic positive lookaround assertions. This is an
(k) PCRE2 supports non-atomic positive lookaround assertions. This is an
extension to the lookaround facilities. The default, Perl-compatible
lookarounds are atomic.
</P>
@@ -252,7 +241,7 @@ Cambridge, England.
REVISION
</b><br>
<P>
Last updated: 03 February 2023
Last updated: 11 August 2023
<br>
Copyright &copy; 1997-2023 University of Cambridge.
<br>

View File

@@ -47,7 +47,12 @@ and unset offsets.
All values in repeating quantifiers must be less than 65536.
</P>
<P>
The maximum length of a lookbehind assertion is 65535 characters.
There are two different limits that apply to branches of lookbehind assertions.
If every branch in such an assertion matches a fixed number of characters,
the maximum length of any branch is 65535 characters. If any branch matches a
variable number of characters, then the maximum matching length for every
branch is limited. The default limit is set at compile time, defaulting to 255,
but can be changed by the calling program.
</P>
<P>
There is no limit to the number of parenthesized groups, but there can be no
@@ -91,9 +96,9 @@ Cambridge, England.
REVISION
</b><br>
<P>
Last updated: 26 July 2022
Last updated: August 2023
<br>
Copyright &copy; 1997-2022 University of Cambridge.
Copyright &copy; 1997-2023 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@@ -2452,10 +2452,10 @@ If every top-level alternative matches a fixed length, for example
<pre>
(?&#60;=colour|color)
</pre>
there is no restriction on the lengths, which do not have to be the same, as
this example demonstrates. This is the only kind of lookbehind supported by
PCRE2 versions earlier than 10.43 and by the alternative matching function
<b>pcre2_dfa_match()</b>.
there is a limit of 65535 characters to the lengths, which do not have to be
the same, as this example demonstrates. This is the only kind of lookbehind
supported by PCRE2 versions earlier than 10.43 and by the alternative matching
function <b>pcre2_dfa_match()</b>.
</P>
<P>
In PCRE2 10.43 and later, <b>pcre2_match()</b> supports lookbehind assertions in
@@ -2464,9 +2464,9 @@ for example
<pre>
(?&#60;=colou?r)
</pre>
The maximum matching length for the whole lookbehind is limited to a value set
by the calling program (default 255 characters). Unlimited repetition (for
example \d*) is not supported. In some cases, the escape sequence \K
The maximum matching length for any branch of the lookbehind is limited to a
value set by the calling program (default 255 characters). Unlimited repetition
(for example \d*) is not supported. In some cases, the escape sequence \K
<a href="#resetmatchstart">(see above)</a>
can be used instead of a lookbehind assertion at the start of a pattern to get
round the length limit restriction.
@@ -3781,7 +3781,7 @@ Cambridge, England.
</P>
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
<P>
Last updated: 09 August 2023
Last updated: 11 August 2023
<br>
Copyright &copy; 1997-2023 University of Cambridge.
<br>

View File

@@ -463,8 +463,10 @@ setting with a similar syntax.
</pre>
Each top-level branch of a lookbehind must have a limit for the number of
characters it matches. If any branch can match a variable number of characters,
the maximum for the entire lookbehind is limited. If every branch matches a
fixed number of characters, there is no constraint.
the maximum for each branch is limited to a value set by the caller of
<b>pcre2_compile()</b> or defaulted. The default is set when PCRE2 is built
(ultimate default 255). If every branch matches a fixed number of characters,
the limit for each branch is 65535 characters.
</P>
<br><a name="SEC22" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
<P>
@@ -602,7 +604,7 @@ Cambridge, England.
</P>
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
<P>
Last updated: 09 August 2023
Last updated: 11 August 2023
<br>
Copyright &copy; 1997-2023 University of Cambridge.
<br>

View File

@@ -696,7 +696,8 @@ heavily used in the test files.
jitfast use JIT fast path
jitverify verify JIT use
locale=&#60;name&#62; use this locale
max_pattern_length=&#60;n&#62; set the maximum pattern length
max_pattern_length=&#60;n&#62; set maximum pattern length
max_varlookbehind=&#60;n&#62; set maximum variable lookbehind length
memory show memory used
newline=&#60;type&#62; set newline type
null_context compile with a NULL context
@@ -848,6 +849,17 @@ If <b>hex</b> or <b>use_length</b> is used with the POSIX wrapper API (see
below), the REG_PEND extension is used to pass the pattern's length.
</P>
<br><b>
Specifying a maximum for variable lookbehinds
</b><br>
<P>
Variable lookbehind assertions are supported only if, for each one, there is a
maximum length (in characters) that it can match. There is a limit on this,
whose default can be set at build time, with an ultimate default of 255. The
<b>max_varlookbehind</b> modifier uses the <b>pcre2_set_max_varlookbehind()</b>
function to change the limit. Lookbehinds whose branches each match a fixed
length are limited to 65535 characters per branch.
</P>
<br><b>
Specifying wide characters in 16-bit and 32-bit modes
</b><br>
<P>
@@ -2181,7 +2193,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 17 July 2023
Last updated: 11 August 2023
<br>
Copyright &copy; 1997-2023 University of Cambridge.
<br>

View File

@@ -5033,10 +5033,11 @@ NAME
DIFFERENCES BETWEEN PCRE2 AND PERL
This document describes some of the differences in the ways that PCRE2
and Perl handle regular expressions. The differences described here are
with respect to Perl version 5.34.0, but as both Perl and PCRE2 are
continually changing, the information may at times be out of date.
This document describes some of the known differences in the ways that
PCRE2 and Perl handle regular expressions. The differences described
here are with respect to Perl version 5.34.0, but as both Perl and
PCRE2 are continually changing, the information may at times be out of
date.
1. When PCRE2_DOTALL (equivalent to Perl's /s qualifier) is not set,
the behaviour of the '.' metacharacter differs from Perl. In PCRE2, '.'
@@ -5174,49 +5175,38 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
versions of Perl, some of which (such as named parentheses) were in
PCRE2 for some time before. This list is with respect to Perl 5.34:
(a) Although lookbehind assertions in PCRE2 must match fixed length
strings, each alternative toplevel branch of a lookbehind assertion can
match a different length of string. Perl used to require them all to
have the same length, but the latest version has some variable length
support.
(b) From PCRE2 10.23, backreferences to groups of fixed length are sup-
ported in lookbehinds, provided that there is no possibility of refer-
encing a non-unique number or name. Perl does not support backrefer-
ences in lookbehinds.
(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the
(a) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the
$ meta-character matches only at the very end of the string.
(d) A backslash followed by a letter with no special meaning is
(b) A backslash followed by a letter with no special meaning is
faulted. (Perl can be made to issue a warning.)
(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quanti-
(c) If PCRE2_UNGREEDY is set, the greediness of the repetition quanti-
fiers is inverted, that is, by default they are not greedy, but if fol-
lowed by a question mark they are.
(f) PCRE2_ANCHORED can be used at matching time to force a pattern to
(d) PCRE2_ANCHORED can be used at matching time to force a pattern to
be tried only at the first matching position in the subject string.
(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and
(e) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and
PCRE2_NOTEMPTY_ATSTART options have no Perl equivalents.
(h) The \R escape sequence can be restricted to match only CR, LF, or
(f) The \R escape sequence can be restricted to match only CR, LF, or
CRLF by the PCRE2_BSR_ANYCRLF option.
(i) The callout facility is PCRE2-specific. Perl supports codeblocks
(g) The callout facility is PCRE2-specific. Perl supports codeblocks
and variable interpolation, but not general hooks on every match.
(j) The partial matching facility is PCRE2-specific.
(h) The partial matching facility is PCRE2-specific.
(k) The alternative matching function (pcre2_dfa_match() matches in a
(i) The alternative matching function (pcre2_dfa_match() matches in a
different way and is not Perl-compatible.
(l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT)
(j) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT)
at the start of a pattern. These set overall options that cannot be
changed within the pattern.
(m) PCRE2 supports non-atomic positive lookaround assertions. This is
(k) PCRE2 supports non-atomic positive lookaround assertions. This is
an extension to the lookaround facilities. The default, Perl-compatible
lookarounds are atomic.
@@ -5237,11 +5227,11 @@ AUTHOR
REVISION
Last updated: 03 February 2023
Last updated: 11 August 2023
Copyright (c) 1997-2023 University of Cambridge.
PCRE2 10.43 03 February 2023 PCRE2COMPAT(3)
PCRE2 10.43 11 August 2023 PCRE2COMPAT(3)
------------------------------------------------------------------------------
@@ -5726,7 +5716,13 @@ SIZE AND OTHER LIMITATIONS
All values in repeating quantifiers must be less than 65536.
The maximum length of a lookbehind assertion is 65535 characters.
There are two different limits that apply to branches of lookbehind as-
sertions. If every branch in such an assertion matches a fixed number
of characters, the maximum length of any branch is 65535 characters. If
any branch matches a variable number of characters, then the maximum
matching length for every branch is limited. The default limit is set
at compile time, defaulting to 255, but can be changed by the calling
program.
There is no limit to the number of parenthesized groups, but there can
be no more than 65535 capture groups, and there is a limit to the depth
@@ -5760,11 +5756,11 @@ AUTHOR
REVISION
Last updated: 26 July 2022
Copyright (c) 1997-2022 University of Cambridge.
Last updated: August 2023
Copyright (c) 1997-2023 University of Cambridge.
PCRE2 10.41 26 July 2022 PCRE2LIMITS(3)
PCRE2 10.43 1 August 2023 PCRE2LIMITS(3)
------------------------------------------------------------------------------
@@ -8625,10 +8621,10 @@ ASSERTIONS
(?<=colour|color)
there is no restriction on the lengths, which do not have to be the
same, as this example demonstrates. This is the only kind of lookbehind
supported by PCRE2 versions earlier than 10.43 and by the alternative
matching function pcre2_dfa_match().
there is a limit of 65535 characters to the lengths, which do not have
to be the same, as this example demonstrates. This is the only kind of
lookbehind supported by PCRE2 versions earlier than 10.43 and by the
alternative matching function pcre2_dfa_match().
In PCRE2 10.43 and later, pcre2_match() supports lookbehind assertions
in which one or more top-level alternatives can match more than one
@@ -8636,12 +8632,12 @@ ASSERTIONS
(?<=colou?r)
The maximum matching length for the whole lookbehind is limited to a
value set by the calling program (default 255 characters). Unlimited
repetition (for example \d*) is not supported. In some cases, the es-
cape sequence \K (see above) can be used instead of a lookbehind asser-
tion at the start of a pattern to get round the length limit restric-
tion.
The maximum matching length for any branch of the lookbehind is limited
to a value set by the calling program (default 255 characters). Unlim-
ited repetition (for example \d*) is not supported. In some cases, the
escape sequence \K (see above) can be used instead of a lookbehind as-
sertion at the start of a pattern to get round the length limit re-
striction.
In UTF-8 and UTF-16 modes, PCRE2 does not allow the \C escape (which
matches a single code unit even in a UTF mode) to appear in lookbehind
@@ -9872,11 +9868,11 @@ AUTHOR
REVISION
Last updated: 09 August 2023
Last updated: 11 August 2023
Copyright (c) 1997-2023 University of Cambridge.
PCRE2 10.43 09 August 2023 PCRE2PATTERN(3)
PCRE2 10.43 11 August 2023 PCRE2PATTERN(3)
------------------------------------------------------------------------------
@@ -11177,8 +11173,10 @@ LOOKAHEAD AND LOOKBEHIND ASSERTIONS
Each top-level branch of a lookbehind must have a limit for the number
of characters it matches. If any branch can match a variable number of
characters, the maximum for the entire lookbehind is limited. If every
branch matches a fixed number of characters, there is no constraint.
characters, the maximum for each branch is limited to a value set by
the caller of pcre2_compile() or defaulted. The default is set when
PCRE2 is built (ultimate default 255). If every branch matches a fixed
number of characters, the limit for each branch is 65535 characters.
NON-ATOMIC LOOKAROUND ASSERTIONS
@@ -11315,11 +11313,11 @@ AUTHOR
REVISION
Last updated: 09 August 2023
Last updated: 11 August 2023
Copyright (c) 1997-2023 University of Cambridge.
PCRE2 10.43 09 August 2023 PCRE2SYNTAX(3)
PCRE2 10.43 11 August 2023 PCRE2SYNTAX(3)
------------------------------------------------------------------------------

View File

@@ -629,7 +629,8 @@ PATTERN MODIFIERS
jitfast use JIT fast path
jitverify verify JIT use
locale=<name> use this locale
max_pattern_length=<n> set the maximum pattern length
max_pattern_length=<n> set maximum pattern length
max_varlookbehind=<n> set maximum variable lookbehind length
memory show memory used
newline=<type> set newline type
null_context compile with a NULL context
@@ -766,6 +767,16 @@ PATTERN MODIFIERS
POSIX wrapper API" below), the REG_PEND extension is used to pass the
pattern's length.
Specifying a maximum for variable lookbehinds
Variable lookbehind assertions are supported only if, for each one,
there is a maximum length (in characters) that it can match. There is a
limit on this, whose default can be set at build time, with an ultimate
default of 255. The max_varlookbehind modifier uses the
pcre2_set_max_varlookbehind() function to change the limit. Lookbehinds
whose branches each match a fixed length are limited to 65535 charac-
ters per branch.
Specifying wide characters in 16-bit and 32-bit modes
In 16-bit and 32-bit modes, all input is automatically treated as UTF-8
@@ -1985,8 +1996,8 @@ AUTHOR
REVISION
Last updated: 17 July 2023
Last updated: 11 August 2023
Copyright (c) 1997-2023 University of Cambridge.
PCRE 10.43 17 July 2023 PCRE2TEST(1)
PCRE 10.43 11 August PCRE2TEST(1)