mirror of
https://github.com/PCRE2Project/pcre2.git
synced 2025-10-20 12:55:08 +08:00
Update HTML and plain text documentation
This commit is contained in:
@@ -271,7 +271,7 @@ library. They are also documented in the pcre2build man page.
|
|||||||
performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
|
performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
|
||||||
link size setting is ignored, as 4-byte offsets are always used.
|
link size setting is ignored, as 4-byte offsets are always used.
|
||||||
|
|
||||||
, Lookbehind assertions in which one or more branches can match a variable
|
. Lookbehind assertions in which one or more branches can match a variable
|
||||||
number of characters are supported only if there is a maximum matching length
|
number of characters are supported only if there is a maximum matching length
|
||||||
for each top-level branch. There is a limit to this maximum that defaults to
|
for each top-level branch. There is a limit to this maximum that defaults to
|
||||||
255 characters. You can alter this default by a setting such as
|
255 characters. You can alter this default by a setting such as
|
||||||
@@ -933,4 +933,4 @@ The distribution should contain the files listed below.
|
|||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: Philip.Hazel
|
Email local part: Philip.Hazel
|
||||||
Email domain: gmail.com
|
Email domain: gmail.com
|
||||||
Last updated: 09 August 2023
|
Last updated: 11 August 2023
|
||||||
|
@@ -16,10 +16,10 @@ please consult the man page, in case the conversion went wrong.
|
|||||||
DIFFERENCES BETWEEN PCRE2 AND PERL
|
DIFFERENCES BETWEEN PCRE2 AND PERL
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
This document describes some of the differences in the ways that PCRE2 and Perl
|
This document describes some of the known differences in the ways that PCRE2
|
||||||
handle regular expressions. The differences described here are with respect to
|
and Perl handle regular expressions. The differences described here are with
|
||||||
Perl version 5.34.0, but as both Perl and PCRE2 are continually changing, the
|
respect to Perl version 5.34.0, but as both Perl and PCRE2 are continually
|
||||||
information may at times be out of date.
|
changing, the information may at times be out of date.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
1. When PCRE2_DOTALL (equivalent to Perl's /s qualifier) is not set, the
|
1. When PCRE2_DOTALL (equivalent to Perl's /s qualifier) is not set, the
|
||||||
@@ -173,59 +173,48 @@ of which (such as named parentheses) were in PCRE2 for some time before. This
|
|||||||
list is with respect to Perl 5.34:
|
list is with respect to Perl 5.34:
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
(a) Although lookbehind assertions in PCRE2 must match fixed length strings,
|
(a) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
|
||||||
each alternative toplevel branch of a lookbehind assertion can match a
|
|
||||||
different length of string. Perl used to require them all to have the same
|
|
||||||
length, but the latest version has some variable length support.
|
|
||||||
<br>
|
|
||||||
<br>
|
|
||||||
(b) From PCRE2 10.23, backreferences to groups of fixed length are supported
|
|
||||||
in lookbehinds, provided that there is no possibility of referencing a
|
|
||||||
non-unique number or name. Perl does not support backreferences in lookbehinds.
|
|
||||||
<br>
|
|
||||||
<br>
|
|
||||||
(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
|
|
||||||
meta-character matches only at the very end of the string.
|
meta-character matches only at the very end of the string.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
(d) A backslash followed by a letter with no special meaning is faulted. (Perl
|
(b) A backslash followed by a letter with no special meaning is faulted. (Perl
|
||||||
can be made to issue a warning.)
|
can be made to issue a warning.)
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
|
(c) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
|
||||||
inverted, that is, by default they are not greedy, but if followed by a
|
inverted, that is, by default they are not greedy, but if followed by a
|
||||||
question mark they are.
|
question mark they are.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
(f) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
|
(d) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
|
||||||
only at the first matching position in the subject string.
|
only at the first matching position in the subject string.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART
|
(e) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART
|
||||||
options have no Perl equivalents.
|
options have no Perl equivalents.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
(h) The \R escape sequence can be restricted to match only CR, LF, or CRLF
|
(f) The \R escape sequence can be restricted to match only CR, LF, or CRLF
|
||||||
by the PCRE2_BSR_ANYCRLF option.
|
by the PCRE2_BSR_ANYCRLF option.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
(i) The callout facility is PCRE2-specific. Perl supports codeblocks and
|
(g) The callout facility is PCRE2-specific. Perl supports codeblocks and
|
||||||
variable interpolation, but not general hooks on every match.
|
variable interpolation, but not general hooks on every match.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
(j) The partial matching facility is PCRE2-specific.
|
(h) The partial matching facility is PCRE2-specific.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
(k) The alternative matching function (<b>pcre2_dfa_match()</b> matches in a
|
(i) The alternative matching function (<b>pcre2_dfa_match()</b> matches in a
|
||||||
different way and is not Perl-compatible.
|
different way and is not Perl-compatible.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
(l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
|
(j) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
|
||||||
the start of a pattern. These set overall options that cannot be changed within
|
the start of a pattern. These set overall options that cannot be changed within
|
||||||
the pattern.
|
the pattern.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
(m) PCRE2 supports non-atomic positive lookaround assertions. This is an
|
(k) PCRE2 supports non-atomic positive lookaround assertions. This is an
|
||||||
extension to the lookaround facilities. The default, Perl-compatible
|
extension to the lookaround facilities. The default, Perl-compatible
|
||||||
lookarounds are atomic.
|
lookarounds are atomic.
|
||||||
</P>
|
</P>
|
||||||
@@ -252,7 +241,7 @@ Cambridge, England.
|
|||||||
REVISION
|
REVISION
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 03 February 2023
|
Last updated: 11 August 2023
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2023 University of Cambridge.
|
Copyright © 1997-2023 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
@@ -47,7 +47,12 @@ and unset offsets.
|
|||||||
All values in repeating quantifiers must be less than 65536.
|
All values in repeating quantifiers must be less than 65536.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The maximum length of a lookbehind assertion is 65535 characters.
|
There are two different limits that apply to branches of lookbehind assertions.
|
||||||
|
If every branch in such an assertion matches a fixed number of characters,
|
||||||
|
the maximum length of any branch is 65535 characters. If any branch matches a
|
||||||
|
variable number of characters, then the maximum matching length for every
|
||||||
|
branch is limited. The default limit is set at compile time, defaulting to 255,
|
||||||
|
but can be changed by the calling program.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
There is no limit to the number of parenthesized groups, but there can be no
|
There is no limit to the number of parenthesized groups, but there can be no
|
||||||
@@ -91,9 +96,9 @@ Cambridge, England.
|
|||||||
REVISION
|
REVISION
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 26 July 2022
|
Last updated: August 2023
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2022 University of Cambridge.
|
Copyright © 1997-2023 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
@@ -2452,10 +2452,10 @@ If every top-level alternative matches a fixed length, for example
|
|||||||
<pre>
|
<pre>
|
||||||
(?<=colour|color)
|
(?<=colour|color)
|
||||||
</pre>
|
</pre>
|
||||||
there is no restriction on the lengths, which do not have to be the same, as
|
there is a limit of 65535 characters to the lengths, which do not have to be
|
||||||
this example demonstrates. This is the only kind of lookbehind supported by
|
the same, as this example demonstrates. This is the only kind of lookbehind
|
||||||
PCRE2 versions earlier than 10.43 and by the alternative matching function
|
supported by PCRE2 versions earlier than 10.43 and by the alternative matching
|
||||||
<b>pcre2_dfa_match()</b>.
|
function <b>pcre2_dfa_match()</b>.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
In PCRE2 10.43 and later, <b>pcre2_match()</b> supports lookbehind assertions in
|
In PCRE2 10.43 and later, <b>pcre2_match()</b> supports lookbehind assertions in
|
||||||
@@ -2464,9 +2464,9 @@ for example
|
|||||||
<pre>
|
<pre>
|
||||||
(?<=colou?r)
|
(?<=colou?r)
|
||||||
</pre>
|
</pre>
|
||||||
The maximum matching length for the whole lookbehind is limited to a value set
|
The maximum matching length for any branch of the lookbehind is limited to a
|
||||||
by the calling program (default 255 characters). Unlimited repetition (for
|
value set by the calling program (default 255 characters). Unlimited repetition
|
||||||
example \d*) is not supported. In some cases, the escape sequence \K
|
(for example \d*) is not supported. In some cases, the escape sequence \K
|
||||||
<a href="#resetmatchstart">(see above)</a>
|
<a href="#resetmatchstart">(see above)</a>
|
||||||
can be used instead of a lookbehind assertion at the start of a pattern to get
|
can be used instead of a lookbehind assertion at the start of a pattern to get
|
||||||
round the length limit restriction.
|
round the length limit restriction.
|
||||||
@@ -3781,7 +3781,7 @@ Cambridge, England.
|
|||||||
</P>
|
</P>
|
||||||
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 09 August 2023
|
Last updated: 11 August 2023
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2023 University of Cambridge.
|
Copyright © 1997-2023 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
@@ -463,8 +463,10 @@ setting with a similar syntax.
|
|||||||
</pre>
|
</pre>
|
||||||
Each top-level branch of a lookbehind must have a limit for the number of
|
Each top-level branch of a lookbehind must have a limit for the number of
|
||||||
characters it matches. If any branch can match a variable number of characters,
|
characters it matches. If any branch can match a variable number of characters,
|
||||||
the maximum for the entire lookbehind is limited. If every branch matches a
|
the maximum for each branch is limited to a value set by the caller of
|
||||||
fixed number of characters, there is no constraint.
|
<b>pcre2_compile()</b> or defaulted. The default is set when PCRE2 is built
|
||||||
|
(ultimate default 255). If every branch matches a fixed number of characters,
|
||||||
|
the limit for each branch is 65535 characters.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC22" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
|
<br><a name="SEC22" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
@@ -602,7 +604,7 @@ Cambridge, England.
|
|||||||
</P>
|
</P>
|
||||||
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 09 August 2023
|
Last updated: 11 August 2023
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2023 University of Cambridge.
|
Copyright © 1997-2023 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
@@ -696,11 +696,12 @@ heavily used in the test files.
|
|||||||
jitfast use JIT fast path
|
jitfast use JIT fast path
|
||||||
jitverify verify JIT use
|
jitverify verify JIT use
|
||||||
locale=<name> use this locale
|
locale=<name> use this locale
|
||||||
max_pattern_length=<n> set the maximum pattern length
|
max_pattern_length=<n> set maximum pattern length
|
||||||
|
max_varlookbehind=<n> set maximum variable lookbehind length
|
||||||
memory show memory used
|
memory show memory used
|
||||||
newline=<type> set newline type
|
newline=<type> set newline type
|
||||||
null_context compile with a NULL context
|
null_context compile with a NULL context
|
||||||
null_pattern pass pattern as NULL
|
null_pattern pass pattern as NULL
|
||||||
parens_nest_limit=<n> set maximum parentheses depth
|
parens_nest_limit=<n> set maximum parentheses depth
|
||||||
posix use the POSIX API
|
posix use the POSIX API
|
||||||
posix_nosub use the POSIX API with REG_NOSUB
|
posix_nosub use the POSIX API with REG_NOSUB
|
||||||
@@ -848,6 +849,17 @@ If <b>hex</b> or <b>use_length</b> is used with the POSIX wrapper API (see
|
|||||||
below), the REG_PEND extension is used to pass the pattern's length.
|
below), the REG_PEND extension is used to pass the pattern's length.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
|
Specifying a maximum for variable lookbehinds
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
Variable lookbehind assertions are supported only if, for each one, there is a
|
||||||
|
maximum length (in characters) that it can match. There is a limit on this,
|
||||||
|
whose default can be set at build time, with an ultimate default of 255. The
|
||||||
|
<b>max_varlookbehind</b> modifier uses the <b>pcre2_set_max_varlookbehind()</b>
|
||||||
|
function to change the limit. Lookbehinds whose branches each match a fixed
|
||||||
|
length are limited to 65535 characters per branch.
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
Specifying wide characters in 16-bit and 32-bit modes
|
Specifying wide characters in 16-bit and 32-bit modes
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
@@ -2181,7 +2193,7 @@ Cambridge, England.
|
|||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 17 July 2023
|
Last updated: 11 August 2023
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2023 University of Cambridge.
|
Copyright © 1997-2023 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
1080
doc/pcre2.txt
1080
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user