Documentation for added interpretation in replacement strings (PR #483)

This commit is contained in:
Philip Hazel
2024-09-20 15:00:29 +01:00
parent 86d9ac3ef5
commit b463821c45
7 changed files with 255 additions and 113 deletions

View File

@@ -93,6 +93,8 @@ Perl.
18. Merged PR473, which implements Python-style backrefs in substitutions. 18. Merged PR473, which implements Python-style backrefs in substitutions.
19. Merged PR483, which adding \g<n> and $<name> to replacement strings.
Version 10.44 07-June-2024 Version 10.44 07-June-2024
-------------------------- --------------------------

View File

@@ -3680,14 +3680,18 @@ character (backslash is treated as literal). The following forms are always
recognized: recognized:
<pre> <pre>
$$ insert a dollar character $$ insert a dollar character
$&#60;n&#62; or ${&#60;n&#62;} insert the contents of group &#60;n&#62; $n or ${n} insert the contents of group <i>n</i>
$*MARK or ${*MARK} insert a control verb name $*MARK or ${*MARK} insert a control verb name
</pre> </pre>
Either a group number or a group name can be given for &#60;n&#62;. Curly brackets are Either a group number or a group name can be given for <i>n</i>, for example $2 or
required only if the following character would be interpreted as part of the $NAME. Curly brackets are required only if the following character would be
number or name. The number may be zero to include the entire matched string. interpreted as part of the number or name. The number may be zero to include
For example, if the pattern a(b)c is matched with "=abc=" and the replacement the entire matched string. For example, if the pattern a(b)c is matched with
string "+$1$0$1+", the result is "=+babcb+=". "=abc=" and the replacement string "+$1$0$1+", the result is "=+babcb+=".
</P>
<P>
The JavaScript form $&#60;name&#62;, where the angle brackets are part of the syntax,
is also recognized for group names, but not for group numbers or *MARK.
</P> </P>
<P> <P>
$*MARK inserts the name from the last encountered backtracking control verb on $*MARK inserts the name from the last encountered backtracking control verb on
@@ -3757,6 +3761,11 @@ in a pattern, which in Perl has some ambiguities. Details are given in the
page. page.
</P> </P>
<P> <P>
The Python form \g&#60;n&#62;, where the angle brackets are part of the syntax and <i>n</i>
is either a group name or number, is recognized as an altertive way of
inserting the contents of a group, for example \g&#60;3&#62;.
</P>
<P>
There are also four escape sequences for forcing the case of inserted letters. There are also four escape sequences for forcing the case of inserted letters.
Case forcing applies to all inserted characters, including those from capture Case forcing applies to all inserted characters, including those from capture
groups and letters within \Q...\E quoted sequences. The insertion mechanism groups and letters within \Q...\E quoted sequences. The insertion mechanism
@@ -3794,16 +3803,16 @@ The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
flexibility to capture group substitution. The syntax is similar to that used flexibility to capture group substitution. The syntax is similar to that used
by Bash: by Bash:
<pre> <pre>
${&#60;n&#62;:-&#60;string&#62;} ${n:-string}
${&#60;n&#62;:+&#60;string1&#62;:&#60;string2&#62;} ${n:+string1:string2}
</pre> </pre>
As before, &#60;n&#62; may be a group number or a name. The first form specifies a As before, <i>n</i> may be a group number or a name. The first form specifies a
default value. If group &#60;n&#62; is set, its value is inserted; if not, &#60;string&#62; is default value. If group <i>n</i> is set, its value is inserted; if not, the string is
expanded and the result inserted. The second form specifies strings that are expanded and the result inserted. The second form specifies strings that are
expanded and inserted when group &#60;n&#62; is set or unset, respectively. The first expanded and inserted when group <i>n</i> is set or unset, respectively. The first
form is just a convenient shorthand for form is just a convenient shorthand for
<pre> <pre>
${&#60;n&#62;:+${&#60;n&#62;}:&#60;string&#62;} ${n:+${n}:string}
</pre> </pre>
Backslash can be used to escape colons and closing curly brackets in the Backslash can be used to escape colons and closing curly brackets in the
replacement strings. A change of the case forcing state within a replacement replacement strings. A change of the case forcing state within a replacement
@@ -4205,7 +4214,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC43" href="#TOC1">REVISION</a><br> <br><a name="SEC43" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 17 September 2024 Last updated: 20 September 2024
<br> <br>
Copyright &copy; 1997-2024 University of Cambridge. Copyright &copy; 1997-2024 University of Cambridge.
<br> <br>

View File

@@ -43,16 +43,21 @@ please consult the man page, in case the conversion went wrong.
<li><a name="TOC28" href="#SEC28">CONDITIONAL PATTERNS</a> <li><a name="TOC28" href="#SEC28">CONDITIONAL PATTERNS</a>
<li><a name="TOC29" href="#SEC29">BACKTRACKING CONTROL</a> <li><a name="TOC29" href="#SEC29">BACKTRACKING CONTROL</a>
<li><a name="TOC30" href="#SEC30">CALLOUTS</a> <li><a name="TOC30" href="#SEC30">CALLOUTS</a>
<li><a name="TOC31" href="#SEC31">SEE ALSO</a> <li><a name="TOC31" href="#SEC31">REPLACEMENT STRINGS</a>
<li><a name="TOC32" href="#SEC32">AUTHOR</a> <li><a name="TOC32" href="#SEC32">SEE ALSO</a>
<li><a name="TOC33" href="#SEC33">REVISION</a> <li><a name="TOC33" href="#SEC33">AUTHOR</a>
<li><a name="TOC34" href="#SEC34">REVISION</a>
</ul> </ul>
<br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br> <br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
<P> <P>
The full syntax and semantics of the regular expressions that are supported by The full syntax and semantics of the regular expression patterns that are
PCRE2 are described in the supported by PCRE2 are described in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a> <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
documentation. This document contains a quick-reference summary of the syntax. documentation. This document contains a quick-reference summary of the pattern
syntax followed by the syntax of replacement strings in substitution function.
The full description of the latter is in the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation.
</P> </P>
<br><a name="SEC2" href="#TOC1">QUOTING</a><br> <br><a name="SEC2" href="#TOC1">QUOTING</a><br>
<P> <P>
@@ -634,12 +639,46 @@ The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
start and the end), and the starting delimiter { matched with the ending start and the end), and the starting delimiter { matched with the ending
delimiter }. To encode the ending delimiter within the string, double it. delimiter }. To encode the ending delimiter within the string, double it.
</P> </P>
<br><a name="SEC31" href="#TOC1">SEE ALSO</a><br> <br><a name="SEC31" href="#TOC1">REPLACEMENT STRINGS</a><br>
<P>
If the PCRE2_SUBSTITUTE_LITERAL option is set, a replacement string for
<b>pcre2_substitute()</b> is not interpreted. Otherwise, by default, the only
special character is the dollar character in one of the following forms:
<pre>
$$ insert a dollar character
$n or ${n} insert the contents of group <i>n</i> (name or number)
$&#60;name&#62; insert the contents of named group
$*MARK or ${*MARK} insert a control verb name
</pre>
If PCRE2_SUBSTITUTE_EXTENDED is set, there is additional interpretation:
</P>
<P>
1. Backslash is an escape character, and the forms described in "ESCAPED
CHARACTERS" above are recognized. Also:
<pre>
\Q...\E can be used to suppress interpretation
\l force the next character to lower case
\u force the next character to upper case
\L force subsequent characters to lower case
\U force subsequent characters to upper case
\u\L force next character to upper case, then all lower
\l\U force next character to lower case, then all upper
\E end \L or \U case forcing
</pre>
2. Capture substitution supports the following additional forms:
<pre>
${n:-string} default for unset group
${n:+string1:string2} values for set/unset group
</pre>
The substitution strings themselves are expanded. Backslash can be used to
escape colons and closing curly brackets.
</P>
<br><a name="SEC32" href="#TOC1">SEE ALSO</a><br>
<P> <P>
<b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3), <b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
<b>pcre2matching</b>(3), <b>pcre2</b>(3). <b>pcre2matching</b>(3), <b>pcre2</b>(3).
</P> </P>
<br><a name="SEC32" href="#TOC1">AUTHOR</a><br> <br><a name="SEC33" href="#TOC1">AUTHOR</a><br>
<P> <P>
Philip Hazel Philip Hazel
<br> <br>
@@ -648,9 +687,9 @@ Retired from University Computing Service
Cambridge, England. Cambridge, England.
<br> <br>
</P> </P>
<br><a name="SEC33" href="#TOC1">REVISION</a><br> <br><a name="SEC34" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 17 September 2024 Last updated: 20 September 2024
<br> <br>
Copyright &copy; 1997-2024 University of Cambridge. Copyright &copy; 1997-2024 University of Cambridge.
<br> <br>

View File

@@ -3550,15 +3550,19 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
eral). The following forms are always recognized: eral). The following forms are always recognized:
$$ insert a dollar character $$ insert a dollar character
$<n> or ${<n>} insert the contents of group <n> $n or ${n} insert the contents of group n
$*MARK or ${*MARK} insert a control verb name $*MARK or ${*MARK} insert a control verb name
Either a group number or a group name can be given for <n>. Curly Either a group number or a group name can be given for n, for example
brackets are required only if the following character would be inter- $2 or $NAME. Curly brackets are required only if the following charac-
preted as part of the number or name. The number may be zero to include ter would be interpreted as part of the number or name. The number may
the entire matched string. For example, if the pattern a(b)c is be zero to include the entire matched string. For example, if the pat-
matched with "=abc=" and the replacement string "+$1$0$1+", the result tern a(b)c is matched with "=abc=" and the replacement string
is "=+babcb+=". "+$1$0$1+", the result is "=+babcb+=".
The JavaScript form $<name>, where the angle brackets are part of the
syntax, is also recognized for group names, but not for group numbers
or *MARK.
$*MARK inserts the name from the last encountered backtracking control $*MARK inserts the name from the last encountered backtracking control
verb on the matching path that has a name. (*MARK) must always include verb on the matching path that has a name. (*MARK) must always include
@@ -3622,6 +3626,10 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
same as in a pattern, which in Perl has some ambiguities. Details are same as in a pattern, which in Perl has some ambiguities. Details are
given in the pcre2pattern page. given in the pcre2pattern page.
The Python form \g<n>, where the angle brackets are part of the syntax
and n is either a group name or number, is recognized as an altertive
way of inserting the contents of a group, for example \g<3>.
There are also four escape sequences for forcing the case of inserted There are also four escape sequences for forcing the case of inserted
letters. Case forcing applies to all inserted characters, including letters. Case forcing applies to all inserted characters, including
those from capture groups and letters within \Q...\E quoted sequences. those from capture groups and letters within \Q...\E quoted sequences.
@@ -3657,17 +3665,16 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
flexibility to capture group substitution. The syntax is similar to flexibility to capture group substitution. The syntax is similar to
that used by Bash: that used by Bash:
${<n>:-<string>} ${n:-string}
${<n>:+<string1>:<string2>} ${n:+string1:string2}
As before, <n> may be a group number or a name. The first form speci- As before, n may be a group number or a name. The first form specifies
fies a default value. If group <n> is set, its value is inserted; if a default value. If group n is set, its value is inserted; if not, the
not, <string> is expanded and the result inserted. The second form string is expanded and the result inserted. The second form specifies
specifies strings that are expanded and inserted when group <n> is set strings that are expanded and inserted when group n is set or unset,
or unset, respectively. The first form is just a convenient shorthand respectively. The first form is just a convenient shorthand for
for
${<n>:+${<n>}:<string>} ${n:+${n}:string}
Backslash can be used to escape colons and closing curly brackets in Backslash can be used to escape colons and closing curly brackets in
the replacement strings. A change of the case forcing state within a the replacement strings. A change of the case forcing state within a
@@ -4035,11 +4042,11 @@ AUTHOR
REVISION REVISION
Last updated: 17 September 2024 Last updated: 20 September 2024
Copyright (c) 1997-2024 University of Cambridge. Copyright (c) 1997-2024 University of Cambridge.
PCRE2 10.45 17 September 2024 PCRE2API(3) PCRE2 10.45 20 September 2024 PCRE2API(3)
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@@ -11069,9 +11076,11 @@ NAME
PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY
The full syntax and semantics of the regular expressions that are sup- The full syntax and semantics of the regular expression patterns that
ported by PCRE2 are described in the pcre2pattern documentation. This are supported by PCRE2 are described in the pcre2pattern documentation.
document contains a quick-reference summary of the syntax. This document contains a quick-reference summary of the pattern syntax
followed by the syntax of replacement strings in substitution function.
The full description of the latter is in the pcre2api documentation.
QUOTING QUOTING
@@ -11645,6 +11654,42 @@ CALLOUTS
double it. double it.
REPLACEMENT STRINGS
If the PCRE2_SUBSTITUTE_LITERAL option is set, a replacement string for
pcre2_substitute() is not interpreted. Otherwise, by default, the only
special character is the dollar character in one of the following
forms:
$$ insert a dollar character
$n or ${n} insert the contents of group n (name or number)
$<name> insert the contents of named group
$*MARK or ${*MARK} insert a control verb name
If PCRE2_SUBSTITUTE_EXTENDED is set, there is additional interpreta-
tion:
1. Backslash is an escape character, and the forms described in "ES-
CAPED CHARACTERS" above are recognized. Also:
\Q...\E can be used to suppress interpretation
\l force the next character to lower case
\u force the next character to upper case
\L force subsequent characters to lower case
\U force subsequent characters to upper case
\u\L force next character to upper case, then all lower
\l\U force next character to lower case, then all upper
\E end \L or \U case forcing
2. Capture substitution supports the following additional forms:
${n:-string} default for unset group
${n:+string1:string2} values for set/unset group
The substitution strings themselves are expanded. Backslash can be used
to escape colons and closing curly brackets.
SEE ALSO SEE ALSO
pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
@@ -11660,11 +11705,11 @@ AUTHOR
REVISION REVISION
Last updated: 17 September 2024 Last updated: 20 September 2024
Copyright (c) 1997-2024 University of Cambridge. Copyright (c) 1997-2024 University of Cambridge.
PCRE2 10.45 17 September 2024 PCRE2SYNTAX(3) PCRE2 10.45 20 September 2024 PCRE2SYNTAX(3)
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@@ -1,4 +1,4 @@
.TH PCRE2API 3 "17 September 2024" "PCRE2 10.45" .TH PCRE2API 3 "20 September 2024" "PCRE2 10.45"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@@ -3684,14 +3684,17 @@ character (backslash is treated as literal). The following forms are always
recognized: recognized:
.sp .sp
$$ insert a dollar character $$ insert a dollar character
$<n> or ${<n>} insert the contents of group <n> $n or ${n} insert the contents of group \fIn\fP
$*MARK or ${*MARK} insert a control verb name $*MARK or ${*MARK} insert a control verb name
.sp .sp
Either a group number or a group name can be given for <n>. Curly brackets are Either a group number or a group name can be given for \fIn\fP, for example $2 or
required only if the following character would be interpreted as part of the $NAME. Curly brackets are required only if the following character would be
number or name. The number may be zero to include the entire matched string. interpreted as part of the number or name. The number may be zero to include
For example, if the pattern a(b)c is matched with "=abc=" and the replacement the entire matched string. For example, if the pattern a(b)c is matched with
string "+$1$0$1+", the result is "=+babcb+=". "=abc=" and the replacement string "+$1$0$1+", the result is "=+babcb+=".
.P
The JavaScript form $<name>, where the angle brackets are part of the syntax,
is also recognized for group names, but not for group numbers or *MARK.
.P .P
$*MARK inserts the name from the last encountered backtracking control verb on $*MARK inserts the name from the last encountered backtracking control verb on
the matching path that has a name. (*MARK) must always include a name, but the the matching path that has a name. (*MARK) must always include a name, but the
@@ -3755,6 +3758,10 @@ in a pattern, which in Perl has some ambiguities. Details are given in the
.\" .\"
page. page.
.P .P
The Python form \eg<n>, where the angle brackets are part of the syntax and \fIn\fP
is either a group name or number, is recognized as an altertive way of
inserting the contents of a group, for example \eg<3>.
.P
There are also four escape sequences for forcing the case of inserted letters. There are also four escape sequences for forcing the case of inserted letters.
Case forcing applies to all inserted characters, including those from capture Case forcing applies to all inserted characters, including those from capture
groups and letters within \eQ...\eE quoted sequences. The insertion mechanism groups and letters within \eQ...\eE quoted sequences. The insertion mechanism
@@ -3788,16 +3795,16 @@ The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
flexibility to capture group substitution. The syntax is similar to that used flexibility to capture group substitution. The syntax is similar to that used
by Bash: by Bash:
.sp .sp
${<n>:-<string>} ${n:-string}
${<n>:+<string1>:<string2>} ${n:+string1:string2}
.sp .sp
As before, <n> may be a group number or a name. The first form specifies a As before, \fIn\fP may be a group number or a name. The first form specifies a
default value. If group <n> is set, its value is inserted; if not, <string> is default value. If group \fIn\fP is set, its value is inserted; if not, the string is
expanded and the result inserted. The second form specifies strings that are expanded and the result inserted. The second form specifies strings that are
expanded and inserted when group <n> is set or unset, respectively. The first expanded and inserted when group \fIn\fP is set or unset, respectively. The first
form is just a convenient shorthand for form is just a convenient shorthand for
.sp .sp
${<n>:+${<n>}:<string>} ${n:+${n}:string}
.sp .sp
Backslash can be used to escape colons and closing curly brackets in the Backslash can be used to escape colons and closing curly brackets in the
replacement strings. A change of the case forcing state within a replacement replacement strings. A change of the case forcing state within a replacement
@@ -4208,6 +4215,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 17 September 2024 Last updated: 20 September 2024
Copyright (c) 1997-2024 University of Cambridge. Copyright (c) 1997-2024 University of Cambridge.
.fi .fi

View File

@@ -1,4 +1,4 @@
.TH PCRE2DEMO 3 "17 September 2024" "PCRE2 10.44" .TH PCRE2DEMO 3 "20 September 2024" "PCRE2 10.44"
.\"AUTOMATICALLY GENERATED BY PrepareRelease - do not EDIT! .\"AUTOMATICALLY GENERATED BY PrepareRelease - do not EDIT!
.SH NAME .SH NAME
PCRE2DEMO - A demonstration C program for PCRE2 PCRE2DEMO - A demonstration C program for PCRE2

View File

@@ -1,16 +1,21 @@
.TH PCRE2SYNTAX 3 "17 September 2024" "PCRE2 10.45" .TH PCRE2SYNTAX 3 "20 September 2024" "PCRE2 10.45"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY" .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
.rs .rs
.sp .sp
The full syntax and semantics of the regular expressions that are supported by The full syntax and semantics of the regular expression patterns that are
PCRE2 are described in the supported by PCRE2 are described in the
.\" HREF .\" HREF
\fBpcre2pattern\fP \fBpcre2pattern\fP
.\" .\"
documentation. This document contains a quick-reference summary of the syntax. documentation. This document contains a quick-reference summary of the pattern
. syntax followed by the syntax of replacement strings in substitution function.
The full description of the latter is in the
.\" HREF
\fBpcre2api\fP
.\"
documentation.
. .
.SH "QUOTING" .SH "QUOTING"
.rs .rs
@@ -618,6 +623,41 @@ start and the end), and the starting delimiter { matched with the ending
delimiter }. To encode the ending delimiter within the string, double it. delimiter }. To encode the ending delimiter within the string, double it.
. .
. .
.SH "REPLACEMENT STRINGS"
.rs
.sp
If the PCRE2_SUBSTITUTE_LITERAL option is set, a replacement string for
\fBpcre2_substitute()\fP is not interpreted. Otherwise, by default, the only
special character is the dollar character in one of the following forms:
.sp
$$ insert a dollar character
$n or ${n} insert the contents of group \fIn\fP (name or number)
$<name> insert the contents of named group
$*MARK or ${*MARK} insert a control verb name
.sp
If PCRE2_SUBSTITUTE_EXTENDED is set, there is additional interpretation:
.P
1. Backslash is an escape character, and the forms described in "ESCAPED
CHARACTERS" above are recognized. Also:
.sp
\eQ...\eE can be used to suppress interpretation
\el force the next character to lower case
\eu force the next character to upper case
\eL force subsequent characters to lower case
\eU force subsequent characters to upper case
\eu\eL force next character to upper case, then all lower
\el\eU force next character to lower case, then all upper
\eE end \eL or \eU case forcing
.sp
2. Capture substitution supports the following additional forms:
.sp
${n:-string} default for unset group
${n:+string1:string2} values for set/unset group
.sp
The substitution strings themselves are expanded. Backslash can be used to
escape colons and closing curly brackets.
.
.
.SH "SEE ALSO" .SH "SEE ALSO"
.rs .rs
.sp .sp
@@ -639,6 +679,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 17 September 2024 Last updated: 20 September 2024
Copyright (c) 1997-2024 University of Cambridge. Copyright (c) 1997-2024 University of Cambridge.
.fi .fi