Commit Graph

312 Commits

Author SHA1 Message Date
Nicholas Wilson
5447a395dc Automatic update of doc files #noupdate 2025-02-04 13:03:43 +00:00
Nicholas Wilson
23b4df750b Completely redo the substitute-case-callout work (#638)
Fixes #564

The previous API was not extensible to handle multi-character case rules. It required a fair bit of reworking in order to accommodate this. I had to delay the casing transformations to be done later, by buffering up the string to transform, and then allowing the callback to do an in-place transformation on the entire input to be transformed.
2024-12-26 23:46:21 +00:00
Nicholas Wilson
af03ceaf97 Update ChangeLog and NEWS for 10.45 (#643) 2024-12-26 15:12:15 +00:00
Nicholas Wilson
09c07ac7ab Small improvement to combination of substitution callout + overflow (#637)
I reckon that callers are assuming that when you use the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option, it will calculate the entire memory requirement in one go. Just two calls should be sufficient (rather than needing to loop with a gradually-increasing buffer size).

However, with a substitution callout this is not true. If you call once with PCRE2_SUBSTITUTE_OVERFLOW_LENGTH, the buffer length returned might still not be sufficient for the second call to succeed.

This is because the callout might not be called the first time, but the second time it will be called and can affect control flow, by requiring even more buffer to be used. This occurs even if the callout is completely stateless, idempotent and well-behaved.

This fix ensures that when we skip a callout (due to overflow), we still request enough buffer size for either option that the callout might return.
2024-12-19 10:46:03 +00:00
Nicholas Wilson
f15bdd334d Update all man page dates #noupdate (#634) 2024-12-18 14:12:58 +00:00
Nicholas Wilson
f0819ca7c5 Update references to maintainers in the README (#633) 2024-12-18 13:54:08 +00:00
Nicholas Wilson
ac528f2d26 Details on new maintainership (#603)
* Add details on new maintainership
* Remove checked-in autoconf outputs
* Sync & cleanup files with Detrail
* Add CI job for ensuring PrepareRelease is run
* Add Ubuntu-20.04 autoconf runner
* Make CMake installed files match autoconf
* Update acknowledgements
2024-12-11 09:53:59 +00:00
Nicholas Wilson
aee5e9a97e Fix null-dereference bug in pcre2_substitute (#618)
Avoid one crash introduced with recent changes to substitute code as well as clarify what the expected offset value should be when overflowing the provided buffer.

---------

Co-authored-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2024-12-10 14:27:06 +00:00
Nicholas Wilson
0f22e67e7c Auto-format and minimal cleanup to CMake (#592)
I haven't tackled any controversial steps in this PR - simply tidying the formatting.

I have used the `gersemi` tool, which simply "does its thing". I have additionally renamed a few variables to match standard casing conventions (but I am aware that some lowercased variables are used, for example in package-config files, and have left those alone).
2024-12-07 19:31:05 +00:00
Philip Hazel
55fda7f384 Update EBCDIC documentation; in pcre2pattern move it all into a separate section. 2024-11-27 17:28:11 +00:00
Philip Hazel
adab4b69d8 Expand documentation and error messages for extended character classes 2024-11-26 16:00:01 +00:00
Zoltan Herczeg
1cb968d116 Move character matching code into pcre2_jit_char_inc.h (#569)
Useful for eclass jit implementation.
2024-11-26 12:27:39 +01:00
Nicholas Wilson
e0d4eee05e Implement Perl extended character classes (#553)
Fixes #536
2024-11-15 15:55:10 +01:00
Nicholas Wilson
fc38d9e784 Implement ALT_EXTENDED_CLASS flag (#523)
* Move some existing character class code into pcre2_compile_class.c
* Add a new flag PCRE2_ALT_EXTENDED_CLASS to change the behaviour of
  parsing [...] character classes, to emit new META codes, and new
  OP_ECLASS codes for nested character classes with operators
* Document the behaviour relative to the UTS#18 standard
* No JIT support; it falls back to the interpreter. DFA is supported.
2024-10-30 11:33:29 +01:00
Nicholas Wilson
b72cc97186 Add support for Turkish I casefolding (#521)
New flag: PCRE2_EXTRA_TURKISH_CASING, and pre-pattern flag
(*TURKISH_CASING).

Also added a pre-pattern flag (*CASELESS_RESTRICT) for this existing
flag.
2024-10-14 17:00:06 +01:00
Carlo Marcelo Arenas Belón
0d087cce82 pcre2grep: add $& as an alias for $0 (#519)
Perl does not use $0 anymore to refer to the text of the matched subject
and `pcre2_substitute()` was recently updated to also provide that value
using the variable Perl prefers: `$&`.

In a similar context, either as part of the formatted output from a match
or during the processing of a callback, teach pcre2grep to also populate
$&.

While at it, update the ChangeLog with recent changes.
2024-10-09 09:08:27 +01:00
Philip Hazel
60fd745ebc Minor documentation updates 2024-10-04 17:21:33 +01:00
Nicholas Wilson
9503e68b7c Add substitute case callout function (#512)
* Add substitute case callout function

* Fix foolish misunderstanding

* Fix trivial build error

* Fix non-Unicode tests
2024-10-04 16:57:58 +01:00
Nicholas Wilson
32f03ad588 Add option to disable callouts (#499)
* Add option to disable callouts

* Fix pcre2grep issue, and docs

* Add pcre2test docs
2024-10-02 12:00:02 +01:00
Philip Hazel
012ab39bd8 Correct substitution documentation 2024-09-24 09:23:40 +01:00
Zoltan Herczeg
d3095b4761 Improve character classes (#474)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2024-09-21 14:26:49 +01:00
Alex Dowad
b868b411e2 Add new API function pcre2_set_optimization() for controlling enabled optimizations (#471)
It is anticipated that over time, more and more optimizations will be
added to PCRE2, and we want to be able to switch optimizations off/on,
both for testing purposes and to be able to work around bugs in a
released library version.

The number of free bits left in the compile options word is very small.
Hence, we will start putting all optimization enable/disable flags in
a separate word. To switch these off/on, the new API function
pcre2_set_optimization() will be used.

The values which can be passed to pcre2_set_optimization() are
different from the internal flag bit values. The values accepted by
pcre2_set_optimization() are contiguous integers, so there is no
danger of ever running out of them. This means in the future, the
internal representation can be changed at any time without breaking
backwards compatibility. Further, the 'directives' passed to
pcre2_set_optimization() are not restricted to control a single,
specific optimization. As an example, passing PCRE2_OPTIMIZATION_FULL
will turn on all optimizations supported by whatever version of
PCRE2 the client program happens to be linked with.

Co-authored-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2024-09-21 14:14:32 +01:00
Philip Hazel
f964982eec Add documentation for PCRE2_EXTRA_BS0 and PCRE2_EXTRA_PYTHON_OCTAL 2024-09-21 10:17:10 +01:00
Philip Hazel
b463821c45 Documentation for added interpretation in replacement strings (PR #483) 2024-09-20 15:00:29 +01:00
Philip Hazel
d8b7f31671 Documentation for substitions processing changes 2024-09-17 16:55:08 +01:00
Philip Hazel
b6d05541ae Document substitute title-casing feature 2024-09-16 17:56:54 +01:00
Philip Hazel
6412606942 Update documentation for scan substring patterns - now supports a list of groups 2024-09-04 12:35:14 +01:00
Alex Dowad
64137d23e9 Add missing 'expand' modifier to list in pcre2test manpage (#458) 2024-09-04 11:48:53 +01:00
Philip Hazel
3808655ed4 Documentation update re (*THEN) 2024-09-01 17:01:38 +01:00
Philip Hazel
7a0eda1f66 Update documentation for scan_substring; also some code trailing space tidies 2024-08-30 17:31:55 +01:00
Philip Hazel
cedb1fb546 Update documentation of \Q...\E 2024-08-12 17:48:59 +01:00
Philip Hazel
e4ccef3034 Document no support for Unicode special casing rules 2024-08-04 17:32:46 +01:00
Philip Hazel
75b1025ae4 Tidy up Unicode class description parsing for \p and \P, including one bug fix. 2024-07-29 16:53:57 +01:00
Philip Hazel
4249b67c7f Document JIT allocation test feature and add to pcre2test 2024-07-24 14:53:21 +01:00
Philip Hazel
6d82f0cd3d Alter case-independent matching of \p{Lu} etc. to match Perl 2024-07-23 15:54:29 +01:00
Carlo Marcelo Arenas Belón
c63d7c992e pcre2grep: add --posix-pattern-file for compatibility with other grep (#428)
Historically, pcre2grep has done minor processing of the patterns that
were read through the `-f` option.

The end result is that for some patterns there are different results
depending if they were provided through `-e`, `-f` or as a parameter
in the command line.

Add a flag that could be provided to skip that processing so that the
same pattern file used with other grep implementations could be used
directly for the same result.
2024-06-18 15:45:13 +01:00
Philip Hazel
6ae58beca0 Final file tidies for 10.44 2024-06-07 15:09:00 +01:00
Philip Hazel
067c2f1f58 Fix bug in \X matching too many characters. Fixes issue #410. 2024-06-04 17:14:47 +01:00
Philip Hazel
05aafb2e30 Implement pcre2_set_max_pattern_compiled_length() and set this limit in the fuzzer 2024-04-24 09:32:25 +01:00
Philip Hazel
cbff6bbb1b Install OpenVMS support files 2024-04-16 12:11:06 +01:00
Philip Hazel
ced3b0f06f Increase name length to 128 2024-03-11 15:50:52 +00:00
Philip Hazel
04ca5be6c1 Remove ARM v5 from supported architecture list in pcre2jit documentation 2024-02-21 16:23:28 +00:00
Philip Hazel
3864abdb71 File tidies for 10.43 2024-02-16 17:12:25 +00:00
Philip Hazel
7d59ddebb1 Implement PCRE2_DISABLE_RECURSELOOP_CHECK 2024-01-27 15:54:07 +00:00
Philip Hazel
d71e89b6ea Check documentation for double-word typos 2024-01-19 16:48:53 +00:00
Thomas Voss
68852219e6 Fix various typos in documentation (#372)
Most of these typos were found with the following command:

    find doc -type f -name '*.3' -exec aspell -c {} \;
2024-01-19 16:24:58 +00:00
Philip Hazel
7b649dce27 For the pcre2demo man page, put the source code in its own section rather than everything under NAME 2024-01-06 14:44:11 +00:00
Philip Hazel
aadef0c3b4 File tidies for 10.43-RC1 release 2023-12-28 16:34:04 +00:00
Philip Hazel
c9e03ce866 Minor doc update 2023-12-08 09:34:43 +00:00
Philip Hazel
014c82d7bc Fix data type anomaly in pcre2_substring_list_free()prototype 2023-12-02 17:09:31 +00:00