Documentation for added interpretation in replacement strings (PR #483)

This commit is contained in:
Philip Hazel
2024-09-20 15:00:29 +01:00
parent 86d9ac3ef5
commit b463821c45
7 changed files with 255 additions and 113 deletions

View File

@@ -93,6 +93,8 @@ Perl.
18. Merged PR473, which implements Python-style backrefs in substitutions.
19. Merged PR483, which adding \g<n> and $<name> to replacement strings.
Version 10.44 07-June-2024
--------------------------

View File

@@ -3680,14 +3680,18 @@ character (backslash is treated as literal). The following forms are always
recognized:
<pre>
$$ insert a dollar character
$&#60;n&#62; or ${&#60;n&#62;} insert the contents of group &#60;n&#62;
$n or ${n} insert the contents of group <i>n</i>
$*MARK or ${*MARK} insert a control verb name
</pre>
Either a group number or a group name can be given for &#60;n&#62;. Curly brackets are
required only if the following character would be interpreted as part of the
number or name. The number may be zero to include the entire matched string.
For example, if the pattern a(b)c is matched with "=abc=" and the replacement
string "+$1$0$1+", the result is "=+babcb+=".
Either a group number or a group name can be given for <i>n</i>, for example $2 or
$NAME. Curly brackets are required only if the following character would be
interpreted as part of the number or name. The number may be zero to include
the entire matched string. For example, if the pattern a(b)c is matched with
"=abc=" and the replacement string "+$1$0$1+", the result is "=+babcb+=".
</P>
<P>
The JavaScript form $&#60;name&#62;, where the angle brackets are part of the syntax,
is also recognized for group names, but not for group numbers or *MARK.
</P>
<P>
$*MARK inserts the name from the last encountered backtracking control verb on
@@ -3757,6 +3761,11 @@ in a pattern, which in Perl has some ambiguities. Details are given in the
page.
</P>
<P>
The Python form \g&#60;n&#62;, where the angle brackets are part of the syntax and <i>n</i>
is either a group name or number, is recognized as an altertive way of
inserting the contents of a group, for example \g&#60;3&#62;.
</P>
<P>
There are also four escape sequences for forcing the case of inserted letters.
Case forcing applies to all inserted characters, including those from capture
groups and letters within \Q...\E quoted sequences. The insertion mechanism
@@ -3794,16 +3803,16 @@ The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
flexibility to capture group substitution. The syntax is similar to that used
by Bash:
<pre>
${&#60;n&#62;:-&#60;string&#62;}
${&#60;n&#62;:+&#60;string1&#62;:&#60;string2&#62;}
${n:-string}
${n:+string1:string2}
</pre>
As before, &#60;n&#62; may be a group number or a name. The first form specifies a
default value. If group &#60;n&#62; is set, its value is inserted; if not, &#60;string&#62; is
As before, <i>n</i> may be a group number or a name. The first form specifies a
default value. If group <i>n</i> is set, its value is inserted; if not, the string is
expanded and the result inserted. The second form specifies strings that are
expanded and inserted when group &#60;n&#62; is set or unset, respectively. The first
expanded and inserted when group <i>n</i> is set or unset, respectively. The first
form is just a convenient shorthand for
<pre>
${&#60;n&#62;:+${&#60;n&#62;}:&#60;string&#62;}
${n:+${n}:string}
</pre>
Backslash can be used to escape colons and closing curly brackets in the
replacement strings. A change of the case forcing state within a replacement
@@ -4205,7 +4214,7 @@ Cambridge, England.
</P>
<br><a name="SEC43" href="#TOC1">REVISION</a><br>
<P>
Last updated: 17 September 2024
Last updated: 20 September 2024
<br>
Copyright &copy; 1997-2024 University of Cambridge.
<br>

View File

@@ -43,16 +43,21 @@ please consult the man page, in case the conversion went wrong.
<li><a name="TOC28" href="#SEC28">CONDITIONAL PATTERNS</a>
<li><a name="TOC29" href="#SEC29">BACKTRACKING CONTROL</a>
<li><a name="TOC30" href="#SEC30">CALLOUTS</a>
<li><a name="TOC31" href="#SEC31">SEE ALSO</a>
<li><a name="TOC32" href="#SEC32">AUTHOR</a>
<li><a name="TOC33" href="#SEC33">REVISION</a>
<li><a name="TOC31" href="#SEC31">REPLACEMENT STRINGS</a>
<li><a name="TOC32" href="#SEC32">SEE ALSO</a>
<li><a name="TOC33" href="#SEC33">AUTHOR</a>
<li><a name="TOC34" href="#SEC34">REVISION</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
<P>
The full syntax and semantics of the regular expressions that are supported by
PCRE2 are described in the
The full syntax and semantics of the regular expression patterns that are
supported by PCRE2 are described in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
documentation. This document contains a quick-reference summary of the syntax.
documentation. This document contains a quick-reference summary of the pattern
syntax followed by the syntax of replacement strings in substitution function.
The full description of the latter is in the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation.
</P>
<br><a name="SEC2" href="#TOC1">QUOTING</a><br>
<P>
@@ -634,12 +639,46 @@ The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
start and the end), and the starting delimiter { matched with the ending
delimiter }. To encode the ending delimiter within the string, double it.
</P>
<br><a name="SEC31" href="#TOC1">SEE ALSO</a><br>
<br><a name="SEC31" href="#TOC1">REPLACEMENT STRINGS</a><br>
<P>
If the PCRE2_SUBSTITUTE_LITERAL option is set, a replacement string for
<b>pcre2_substitute()</b> is not interpreted. Otherwise, by default, the only
special character is the dollar character in one of the following forms:
<pre>
$$ insert a dollar character
$n or ${n} insert the contents of group <i>n</i> (name or number)
$&#60;name&#62; insert the contents of named group
$*MARK or ${*MARK} insert a control verb name
</pre>
If PCRE2_SUBSTITUTE_EXTENDED is set, there is additional interpretation:
</P>
<P>
1. Backslash is an escape character, and the forms described in "ESCAPED
CHARACTERS" above are recognized. Also:
<pre>
\Q...\E can be used to suppress interpretation
\l force the next character to lower case
\u force the next character to upper case
\L force subsequent characters to lower case
\U force subsequent characters to upper case
\u\L force next character to upper case, then all lower
\l\U force next character to lower case, then all upper
\E end \L or \U case forcing
</pre>
2. Capture substitution supports the following additional forms:
<pre>
${n:-string} default for unset group
${n:+string1:string2} values for set/unset group
</pre>
The substitution strings themselves are expanded. Backslash can be used to
escape colons and closing curly brackets.
</P>
<br><a name="SEC32" href="#TOC1">SEE ALSO</a><br>
<P>
<b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
<b>pcre2matching</b>(3), <b>pcre2</b>(3).
</P>
<br><a name="SEC32" href="#TOC1">AUTHOR</a><br>
<br><a name="SEC33" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
@@ -648,9 +687,9 @@ Retired from University Computing Service
Cambridge, England.
<br>
</P>
<br><a name="SEC33" href="#TOC1">REVISION</a><br>
<br><a name="SEC34" href="#TOC1">REVISION</a><br>
<P>
Last updated: 17 September 2024
Last updated: 20 September 2024
<br>
Copyright &copy; 1997-2024 University of Cambridge.
<br>

View File

@@ -3550,15 +3550,19 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
eral). The following forms are always recognized:
$$ insert a dollar character
$<n> or ${<n>} insert the contents of group <n>
$n or ${n} insert the contents of group n
$*MARK or ${*MARK} insert a control verb name
Either a group number or a group name can be given for <n>. Curly
brackets are required only if the following character would be inter-
preted as part of the number or name. The number may be zero to include
the entire matched string. For example, if the pattern a(b)c is
matched with "=abc=" and the replacement string "+$1$0$1+", the result
is "=+babcb+=".
Either a group number or a group name can be given for n, for example
$2 or $NAME. Curly brackets are required only if the following charac-
ter would be interpreted as part of the number or name. The number may
be zero to include the entire matched string. For example, if the pat-
tern a(b)c is matched with "=abc=" and the replacement string
"+$1$0$1+", the result is "=+babcb+=".
The JavaScript form $<name>, where the angle brackets are part of the
syntax, is also recognized for group names, but not for group numbers
or *MARK.
$*MARK inserts the name from the last encountered backtracking control
verb on the matching path that has a name. (*MARK) must always include
@@ -3622,6 +3626,10 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
same as in a pattern, which in Perl has some ambiguities. Details are
given in the pcre2pattern page.
The Python form \g<n>, where the angle brackets are part of the syntax
and n is either a group name or number, is recognized as an altertive
way of inserting the contents of a group, for example \g<3>.
There are also four escape sequences for forcing the case of inserted
letters. Case forcing applies to all inserted characters, including
those from capture groups and letters within \Q...\E quoted sequences.
@@ -3657,17 +3665,16 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
flexibility to capture group substitution. The syntax is similar to
that used by Bash:
${<n>:-<string>}
${<n>:+<string1>:<string2>}
${n:-string}
${n:+string1:string2}
As before, <n> may be a group number or a name. The first form speci-
fies a default value. If group <n> is set, its value is inserted; if
not, <string> is expanded and the result inserted. The second form
specifies strings that are expanded and inserted when group <n> is set
or unset, respectively. The first form is just a convenient shorthand
for
As before, n may be a group number or a name. The first form specifies
a default value. If group n is set, its value is inserted; if not, the
string is expanded and the result inserted. The second form specifies
strings that are expanded and inserted when group n is set or unset,
respectively. The first form is just a convenient shorthand for
${<n>:+${<n>}:<string>}
${n:+${n}:string}
Backslash can be used to escape colons and closing curly brackets in
the replacement strings. A change of the case forcing state within a
@@ -4035,11 +4042,11 @@ AUTHOR
REVISION
Last updated: 17 September 2024
Last updated: 20 September 2024
Copyright (c) 1997-2024 University of Cambridge.
PCRE2 10.45 17 September 2024 PCRE2API(3)
PCRE2 10.45 20 September 2024 PCRE2API(3)
------------------------------------------------------------------------------
@@ -11069,9 +11076,11 @@ NAME
PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY
The full syntax and semantics of the regular expressions that are sup-
ported by PCRE2 are described in the pcre2pattern documentation. This
document contains a quick-reference summary of the syntax.
The full syntax and semantics of the regular expression patterns that
are supported by PCRE2 are described in the pcre2pattern documentation.
This document contains a quick-reference summary of the pattern syntax
followed by the syntax of replacement strings in substitution function.
The full description of the latter is in the pcre2api documentation.
QUOTING
@@ -11645,6 +11654,42 @@ CALLOUTS
double it.
REPLACEMENT STRINGS
If the PCRE2_SUBSTITUTE_LITERAL option is set, a replacement string for
pcre2_substitute() is not interpreted. Otherwise, by default, the only
special character is the dollar character in one of the following
forms:
$$ insert a dollar character
$n or ${n} insert the contents of group n (name or number)
$<name> insert the contents of named group
$*MARK or ${*MARK} insert a control verb name
If PCRE2_SUBSTITUTE_EXTENDED is set, there is additional interpreta-
tion:
1. Backslash is an escape character, and the forms described in "ES-
CAPED CHARACTERS" above are recognized. Also:
\Q...\E can be used to suppress interpretation
\l force the next character to lower case
\u force the next character to upper case
\L force subsequent characters to lower case
\U force subsequent characters to upper case
\u\L force next character to upper case, then all lower
\l\U force next character to lower case, then all upper
\E end \L or \U case forcing
2. Capture substitution supports the following additional forms:
${n:-string} default for unset group
${n:+string1:string2} values for set/unset group
The substitution strings themselves are expanded. Backslash can be used
to escape colons and closing curly brackets.
SEE ALSO
pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
@@ -11660,11 +11705,11 @@ AUTHOR
REVISION
Last updated: 17 September 2024
Last updated: 20 September 2024
Copyright (c) 1997-2024 University of Cambridge.
PCRE2 10.45 17 September 2024 PCRE2SYNTAX(3)
PCRE2 10.45 20 September 2024 PCRE2SYNTAX(3)
------------------------------------------------------------------------------

View File

@@ -1,4 +1,4 @@
.TH PCRE2API 3 "17 September 2024" "PCRE2 10.45"
.TH PCRE2API 3 "20 September 2024" "PCRE2 10.45"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@@ -3684,14 +3684,17 @@ character (backslash is treated as literal). The following forms are always
recognized:
.sp
$$ insert a dollar character
$<n> or ${<n>} insert the contents of group <n>
$n or ${n} insert the contents of group \fIn\fP
$*MARK or ${*MARK} insert a control verb name
.sp
Either a group number or a group name can be given for <n>. Curly brackets are
required only if the following character would be interpreted as part of the
number or name. The number may be zero to include the entire matched string.
For example, if the pattern a(b)c is matched with "=abc=" and the replacement
string "+$1$0$1+", the result is "=+babcb+=".
Either a group number or a group name can be given for \fIn\fP, for example $2 or
$NAME. Curly brackets are required only if the following character would be
interpreted as part of the number or name. The number may be zero to include
the entire matched string. For example, if the pattern a(b)c is matched with
"=abc=" and the replacement string "+$1$0$1+", the result is "=+babcb+=".
.P
The JavaScript form $<name>, where the angle brackets are part of the syntax,
is also recognized for group names, but not for group numbers or *MARK.
.P
$*MARK inserts the name from the last encountered backtracking control verb on
the matching path that has a name. (*MARK) must always include a name, but the
@@ -3755,6 +3758,10 @@ in a pattern, which in Perl has some ambiguities. Details are given in the
.\"
page.
.P
The Python form \eg<n>, where the angle brackets are part of the syntax and \fIn\fP
is either a group name or number, is recognized as an altertive way of
inserting the contents of a group, for example \eg<3>.
.P
There are also four escape sequences for forcing the case of inserted letters.
Case forcing applies to all inserted characters, including those from capture
groups and letters within \eQ...\eE quoted sequences. The insertion mechanism
@@ -3788,16 +3795,16 @@ The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
flexibility to capture group substitution. The syntax is similar to that used
by Bash:
.sp
${<n>:-<string>}
${<n>:+<string1>:<string2>}
${n:-string}
${n:+string1:string2}
.sp
As before, <n> may be a group number or a name. The first form specifies a
default value. If group <n> is set, its value is inserted; if not, <string> is
As before, \fIn\fP may be a group number or a name. The first form specifies a
default value. If group \fIn\fP is set, its value is inserted; if not, the string is
expanded and the result inserted. The second form specifies strings that are
expanded and inserted when group <n> is set or unset, respectively. The first
expanded and inserted when group \fIn\fP is set or unset, respectively. The first
form is just a convenient shorthand for
.sp
${<n>:+${<n>}:<string>}
${n:+${n}:string}
.sp
Backslash can be used to escape colons and closing curly brackets in the
replacement strings. A change of the case forcing state within a replacement
@@ -4208,6 +4215,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 17 September 2024
Last updated: 20 September 2024
Copyright (c) 1997-2024 University of Cambridge.
.fi

View File

@@ -1,4 +1,4 @@
.TH PCRE2DEMO 3 "17 September 2024" "PCRE2 10.44"
.TH PCRE2DEMO 3 "20 September 2024" "PCRE2 10.44"
.\"AUTOMATICALLY GENERATED BY PrepareRelease - do not EDIT!
.SH NAME
PCRE2DEMO - A demonstration C program for PCRE2

View File

@@ -1,16 +1,21 @@
.TH PCRE2SYNTAX 3 "17 September 2024" "PCRE2 10.45"
.TH PCRE2SYNTAX 3 "20 September 2024" "PCRE2 10.45"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
.rs
.sp
The full syntax and semantics of the regular expressions that are supported by
PCRE2 are described in the
The full syntax and semantics of the regular expression patterns that are
supported by PCRE2 are described in the
.\" HREF
\fBpcre2pattern\fP
.\"
documentation. This document contains a quick-reference summary of the syntax.
.
documentation. This document contains a quick-reference summary of the pattern
syntax followed by the syntax of replacement strings in substitution function.
The full description of the latter is in the
.\" HREF
\fBpcre2api\fP
.\"
documentation.
.
.SH "QUOTING"
.rs
@@ -618,6 +623,41 @@ start and the end), and the starting delimiter { matched with the ending
delimiter }. To encode the ending delimiter within the string, double it.
.
.
.SH "REPLACEMENT STRINGS"
.rs
.sp
If the PCRE2_SUBSTITUTE_LITERAL option is set, a replacement string for
\fBpcre2_substitute()\fP is not interpreted. Otherwise, by default, the only
special character is the dollar character in one of the following forms:
.sp
$$ insert a dollar character
$n or ${n} insert the contents of group \fIn\fP (name or number)
$<name> insert the contents of named group
$*MARK or ${*MARK} insert a control verb name
.sp
If PCRE2_SUBSTITUTE_EXTENDED is set, there is additional interpretation:
.P
1. Backslash is an escape character, and the forms described in "ESCAPED
CHARACTERS" above are recognized. Also:
.sp
\eQ...\eE can be used to suppress interpretation
\el force the next character to lower case
\eu force the next character to upper case
\eL force subsequent characters to lower case
\eU force subsequent characters to upper case
\eu\eL force next character to upper case, then all lower
\el\eU force next character to lower case, then all upper
\eE end \eL or \eU case forcing
.sp
2. Capture substitution supports the following additional forms:
.sp
${n:-string} default for unset group
${n:+string1:string2} values for set/unset group
.sp
The substitution strings themselves are expanded. Backslash can be used to
escape colons and closing curly brackets.
.
.
.SH "SEE ALSO"
.rs
.sp
@@ -639,6 +679,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 17 September 2024
Last updated: 20 September 2024
Copyright (c) 1997-2024 University of Cambridge.
.fi