mirror of
https://github.com/PCRE2Project/pcre2.git
synced 2025-10-20 12:55:08 +08:00
Update HTML and plain text documentation
This commit is contained in:
@@ -271,7 +271,7 @@ library. They are also documented in the pcre2build man page.
|
||||
performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
|
||||
link size setting is ignored, as 4-byte offsets are always used.
|
||||
|
||||
, Lookbehind assertions in which one or more branches can match a variable
|
||||
. Lookbehind assertions in which one or more branches can match a variable
|
||||
number of characters are supported only if there is a maximum matching length
|
||||
for each top-level branch. There is a limit to this maximum that defaults to
|
||||
255 characters. You can alter this default by a setting such as
|
||||
@@ -933,4 +933,4 @@ The distribution should contain the files listed below.
|
||||
Philip Hazel
|
||||
Email local part: Philip.Hazel
|
||||
Email domain: gmail.com
|
||||
Last updated: 09 August 2023
|
||||
Last updated: 11 August 2023
|
||||
|
@@ -16,10 +16,10 @@ please consult the man page, in case the conversion went wrong.
|
||||
DIFFERENCES BETWEEN PCRE2 AND PERL
|
||||
</b><br>
|
||||
<P>
|
||||
This document describes some of the differences in the ways that PCRE2 and Perl
|
||||
handle regular expressions. The differences described here are with respect to
|
||||
Perl version 5.34.0, but as both Perl and PCRE2 are continually changing, the
|
||||
information may at times be out of date.
|
||||
This document describes some of the known differences in the ways that PCRE2
|
||||
and Perl handle regular expressions. The differences described here are with
|
||||
respect to Perl version 5.34.0, but as both Perl and PCRE2 are continually
|
||||
changing, the information may at times be out of date.
|
||||
</P>
|
||||
<P>
|
||||
1. When PCRE2_DOTALL (equivalent to Perl's /s qualifier) is not set, the
|
||||
@@ -173,59 +173,48 @@ of which (such as named parentheses) were in PCRE2 for some time before. This
|
||||
list is with respect to Perl 5.34:
|
||||
<br>
|
||||
<br>
|
||||
(a) Although lookbehind assertions in PCRE2 must match fixed length strings,
|
||||
each alternative toplevel branch of a lookbehind assertion can match a
|
||||
different length of string. Perl used to require them all to have the same
|
||||
length, but the latest version has some variable length support.
|
||||
<br>
|
||||
<br>
|
||||
(b) From PCRE2 10.23, backreferences to groups of fixed length are supported
|
||||
in lookbehinds, provided that there is no possibility of referencing a
|
||||
non-unique number or name. Perl does not support backreferences in lookbehinds.
|
||||
<br>
|
||||
<br>
|
||||
(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
|
||||
(a) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
|
||||
meta-character matches only at the very end of the string.
|
||||
<br>
|
||||
<br>
|
||||
(d) A backslash followed by a letter with no special meaning is faulted. (Perl
|
||||
(b) A backslash followed by a letter with no special meaning is faulted. (Perl
|
||||
can be made to issue a warning.)
|
||||
<br>
|
||||
<br>
|
||||
(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
|
||||
(c) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
|
||||
inverted, that is, by default they are not greedy, but if followed by a
|
||||
question mark they are.
|
||||
<br>
|
||||
<br>
|
||||
(f) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
|
||||
(d) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
|
||||
only at the first matching position in the subject string.
|
||||
<br>
|
||||
<br>
|
||||
(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART
|
||||
(e) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART
|
||||
options have no Perl equivalents.
|
||||
<br>
|
||||
<br>
|
||||
(h) The \R escape sequence can be restricted to match only CR, LF, or CRLF
|
||||
(f) The \R escape sequence can be restricted to match only CR, LF, or CRLF
|
||||
by the PCRE2_BSR_ANYCRLF option.
|
||||
<br>
|
||||
<br>
|
||||
(i) The callout facility is PCRE2-specific. Perl supports codeblocks and
|
||||
(g) The callout facility is PCRE2-specific. Perl supports codeblocks and
|
||||
variable interpolation, but not general hooks on every match.
|
||||
<br>
|
||||
<br>
|
||||
(j) The partial matching facility is PCRE2-specific.
|
||||
(h) The partial matching facility is PCRE2-specific.
|
||||
<br>
|
||||
<br>
|
||||
(k) The alternative matching function (<b>pcre2_dfa_match()</b> matches in a
|
||||
(i) The alternative matching function (<b>pcre2_dfa_match()</b> matches in a
|
||||
different way and is not Perl-compatible.
|
||||
<br>
|
||||
<br>
|
||||
(l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
|
||||
(j) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
|
||||
the start of a pattern. These set overall options that cannot be changed within
|
||||
the pattern.
|
||||
<br>
|
||||
<br>
|
||||
(m) PCRE2 supports non-atomic positive lookaround assertions. This is an
|
||||
(k) PCRE2 supports non-atomic positive lookaround assertions. This is an
|
||||
extension to the lookaround facilities. The default, Perl-compatible
|
||||
lookarounds are atomic.
|
||||
</P>
|
||||
@@ -252,7 +241,7 @@ Cambridge, England.
|
||||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 03 February 2023
|
||||
Last updated: 11 August 2023
|
||||
<br>
|
||||
Copyright © 1997-2023 University of Cambridge.
|
||||
<br>
|
||||
|
@@ -47,7 +47,12 @@ and unset offsets.
|
||||
All values in repeating quantifiers must be less than 65536.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of a lookbehind assertion is 65535 characters.
|
||||
There are two different limits that apply to branches of lookbehind assertions.
|
||||
If every branch in such an assertion matches a fixed number of characters,
|
||||
the maximum length of any branch is 65535 characters. If any branch matches a
|
||||
variable number of characters, then the maximum matching length for every
|
||||
branch is limited. The default limit is set at compile time, defaulting to 255,
|
||||
but can be changed by the calling program.
|
||||
</P>
|
||||
<P>
|
||||
There is no limit to the number of parenthesized groups, but there can be no
|
||||
@@ -91,9 +96,9 @@ Cambridge, England.
|
||||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 26 July 2022
|
||||
Last updated: August 2023
|
||||
<br>
|
||||
Copyright © 1997-2022 University of Cambridge.
|
||||
Copyright © 1997-2023 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
@@ -2452,10 +2452,10 @@ If every top-level alternative matches a fixed length, for example
|
||||
<pre>
|
||||
(?<=colour|color)
|
||||
</pre>
|
||||
there is no restriction on the lengths, which do not have to be the same, as
|
||||
this example demonstrates. This is the only kind of lookbehind supported by
|
||||
PCRE2 versions earlier than 10.43 and by the alternative matching function
|
||||
<b>pcre2_dfa_match()</b>.
|
||||
there is a limit of 65535 characters to the lengths, which do not have to be
|
||||
the same, as this example demonstrates. This is the only kind of lookbehind
|
||||
supported by PCRE2 versions earlier than 10.43 and by the alternative matching
|
||||
function <b>pcre2_dfa_match()</b>.
|
||||
</P>
|
||||
<P>
|
||||
In PCRE2 10.43 and later, <b>pcre2_match()</b> supports lookbehind assertions in
|
||||
@@ -2464,9 +2464,9 @@ for example
|
||||
<pre>
|
||||
(?<=colou?r)
|
||||
</pre>
|
||||
The maximum matching length for the whole lookbehind is limited to a value set
|
||||
by the calling program (default 255 characters). Unlimited repetition (for
|
||||
example \d*) is not supported. In some cases, the escape sequence \K
|
||||
The maximum matching length for any branch of the lookbehind is limited to a
|
||||
value set by the calling program (default 255 characters). Unlimited repetition
|
||||
(for example \d*) is not supported. In some cases, the escape sequence \K
|
||||
<a href="#resetmatchstart">(see above)</a>
|
||||
can be used instead of a lookbehind assertion at the start of a pattern to get
|
||||
round the length limit restriction.
|
||||
@@ -3781,7 +3781,7 @@ Cambridge, England.
|
||||
</P>
|
||||
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 09 August 2023
|
||||
Last updated: 11 August 2023
|
||||
<br>
|
||||
Copyright © 1997-2023 University of Cambridge.
|
||||
<br>
|
||||
|
@@ -463,8 +463,10 @@ setting with a similar syntax.
|
||||
</pre>
|
||||
Each top-level branch of a lookbehind must have a limit for the number of
|
||||
characters it matches. If any branch can match a variable number of characters,
|
||||
the maximum for the entire lookbehind is limited. If every branch matches a
|
||||
fixed number of characters, there is no constraint.
|
||||
the maximum for each branch is limited to a value set by the caller of
|
||||
<b>pcre2_compile()</b> or defaulted. The default is set when PCRE2 is built
|
||||
(ultimate default 255). If every branch matches a fixed number of characters,
|
||||
the limit for each branch is 65535 characters.
|
||||
</P>
|
||||
<br><a name="SEC22" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
|
||||
<P>
|
||||
@@ -602,7 +604,7 @@ Cambridge, England.
|
||||
</P>
|
||||
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 09 August 2023
|
||||
Last updated: 11 August 2023
|
||||
<br>
|
||||
Copyright © 1997-2023 University of Cambridge.
|
||||
<br>
|
||||
|
@@ -696,11 +696,12 @@ heavily used in the test files.
|
||||
jitfast use JIT fast path
|
||||
jitverify verify JIT use
|
||||
locale=<name> use this locale
|
||||
max_pattern_length=<n> set the maximum pattern length
|
||||
max_pattern_length=<n> set maximum pattern length
|
||||
max_varlookbehind=<n> set maximum variable lookbehind length
|
||||
memory show memory used
|
||||
newline=<type> set newline type
|
||||
null_context compile with a NULL context
|
||||
null_pattern pass pattern as NULL
|
||||
null_pattern pass pattern as NULL
|
||||
parens_nest_limit=<n> set maximum parentheses depth
|
||||
posix use the POSIX API
|
||||
posix_nosub use the POSIX API with REG_NOSUB
|
||||
@@ -848,6 +849,17 @@ If <b>hex</b> or <b>use_length</b> is used with the POSIX wrapper API (see
|
||||
below), the REG_PEND extension is used to pass the pattern's length.
|
||||
</P>
|
||||
<br><b>
|
||||
Specifying a maximum for variable lookbehinds
|
||||
</b><br>
|
||||
<P>
|
||||
Variable lookbehind assertions are supported only if, for each one, there is a
|
||||
maximum length (in characters) that it can match. There is a limit on this,
|
||||
whose default can be set at build time, with an ultimate default of 255. The
|
||||
<b>max_varlookbehind</b> modifier uses the <b>pcre2_set_max_varlookbehind()</b>
|
||||
function to change the limit. Lookbehinds whose branches each match a fixed
|
||||
length are limited to 65535 characters per branch.
|
||||
</P>
|
||||
<br><b>
|
||||
Specifying wide characters in 16-bit and 32-bit modes
|
||||
</b><br>
|
||||
<P>
|
||||
@@ -2181,7 +2193,7 @@ Cambridge, England.
|
||||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 17 July 2023
|
||||
Last updated: 11 August 2023
|
||||
<br>
|
||||
Copyright © 1997-2023 University of Cambridge.
|
||||
<br>
|
||||
|
1080
doc/pcre2.txt
1080
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user