Commit Graph

332 Commits

Author SHA1 Message Date
Nicholas Wilson
e62c0e0916 Re-apply "Use standard CMake constructs to export the targets. (#260)" (#739)
Additionally, I have attempted to clean up some CMake issues to make the
package's build interface cleaner, in particular, avoiding polluting the
parent directory's include path with our config.h file (if PCRE2 is being
included as a subdirectory).

This re-adds changes from Theodore's commit:
    def175f4a9
and partially reverts changes from Carlo's commit:
    92d56a1f7c

---------

Co-authored-by: Theodore Tsirpanis <teo@tsirpanis.gr>
2025-04-08 17:37:19 +01:00
Nicholas Wilson
a73417315a Add documentation for subroutine return values (#738) 2025-03-28 14:53:39 +00:00
Nicholas Wilson
eb3bd3cf14 New pcre2_next_match() API to simplify pcre2demo, test, and substitute (#733)
* The primary purpose of pcre2_next_match() is to make it much easier for
  PCRE2 clients to iterate over matches, without needing an advanced knowledge
  of regular expressions.
* Secondly, we can simplify our own code by merging the three duplicate
  implementations of the /g global match behaviour: pcre2demo, pcre2_substitute,
  and pcre2test.
* Thirdly, as I look closely at the issue, I can improve the documentation.
* Fourthly, I would like to actually simplify the logic, removing a complex loop
  which makes several match attempts, swallows duplicate matches, and more.
  We can have identical behaviour with a simple retry using
  PCRE2_NOTEMPTY_ATSTART.
2025-03-24 13:29:52 +00:00
Nicholas Wilson
f63b5d2658 Add a little additional documentation on how to emulate pcre2_substitute's loop (#735)
We won't implement more advanced/alternative global replacement strategies, but we can at least write a few sentences explaining how to do it in application code.
2025-03-24 10:08:12 +00:00
Nicholas Wilson
990d53f192 Add linker scripts with symbol versioning (#721)
Both the Autoconf and CMake build systems are updated to detect linker support for symbol versioning.

Currently, Linux, Solaris, and FreeBSD are tested and working. Windows (COFF) and macOS (Mach-O) have no symbol versioning.

There is an Autoconf/CMake flag to opt out of the versioning behaviour.
2025-03-18 08:55:38 +00:00
Nicholas Wilson
b3ecb621bd Remove the old WORKSPACE.bazel file (#732) 2025-03-17 20:24:33 +00:00
github-actions[bot]
773486b4b5 Sync autogenerated files #noupdate 2025-02-28 22:29:19 +00:00
Nicholas Wilson
b79ee1dea5 Rename files which are #included (#708)
We have four files which have .c extensions, but which are actually #included rather than treated as their own compilation unit.

This goes against conventions - Autotools, CMake, and Bazel all assume that the .h/.c distinction indicates which files are compilation units.

pcre2_jit_match.c -> _inc.h
pcre2_jit_misc.c -> _inc.h
pcre2_printint.c -> _inc.h
pcre2_ucptables.c -> _inc.h
2025-02-27 06:57:44 +00:00
github-actions[bot]
3e68381dae Sync autogenerated files #noupdate 2025-02-26 22:29:20 +00:00
Nicholas Wilson
500c68b986 Add testing for malloc() failures (#697)
An additional testing argument, `-malloc` is added to pcre2test and to RunTest.

The ManyConfig tests run this now in CI.

We exercise each malloc failure in the core code by counting how many mallocs are done, then repeating compilation and matching with a failure on each successive malloc.
2025-02-23 09:51:32 +00:00
Nicholas Wilson
fb3b380abb Another batch of very small typos & issues (#707) 2025-02-22 12:31:53 +00:00
Nicholas Wilson
ce42cfac5c Fix two typos in pcre2api, plus some other minor issues (#703) 2025-02-19 19:15:38 +00:00
Joshua Rogers
fc04890d63 Fix documentation pcre2test.1 (#701)
The error -47 corresponds to PCRE2_ERROR_MATCHLIMIT not PCRE2_ERROR_NOMEMORY.
2025-02-18 18:07:49 +00:00
Zoltan Herczeg
861a8aae41 Improve named group handling (#700)
Add a simple hash code for group names to improve search speed.
Ignore duplicates when group names are searched.
Improve finding of duplicates (they have the same name pointer).
Improve creating name table (duplicates are handled in one step).
Create a new file for name management.
2025-02-18 18:04:14 +01:00
Nicholas Wilson
db3b532aa0 Improve RunTest to continue after a test failure (#696)
This makes it easier to se all the failures at once, rather than
having to fix one at a time. The output is now grouped into
directories.
2025-02-15 11:50:25 +00:00
Nicholas Wilson
0d0ac3aa0f Update EBCDIC support to support testing on normal ASCII systems (#656)
The pcre2test utility needs quite a few changes to accommodate this.
It is simpler to add a new mode to it, than to make it fully
EBCDIC-native. On an ASCII system, pcre2test performs ASCII I/O, but
tranlates the input when passing it to the fully-EBCDIC-supporting
library.
2025-02-12 22:31:00 +00:00
Lucas Trzesniewski
b52de60d67 Fix typo (#690) 2025-02-07 09:13:37 +00:00
Nicholas Wilson
2aa7681fb5 Update with my release procedure (#684) 2025-02-05 09:53:59 +00:00
Nicholas Wilson
1fffb0d44e Updates to the README and some documentation (#681) 2025-02-01 15:50:20 +00:00
Nicholas Wilson
f724b6117b Declutter one cmake file (#662) 2025-01-11 10:29:49 +00:00
Nicholas Wilson
236853194f Update modification dates #noupdate 2024-12-27 00:49:58 +00:00
Nicholas Wilson
23b4df750b Completely redo the substitute-case-callout work (#638)
Fixes #564

The previous API was not extensible to handle multi-character case rules. It required a fair bit of reworking in order to accommodate this. I had to delay the casing transformations to be done later, by buffering up the string to transform, and then allowing the callback to do an in-place transformation on the entire input to be transformed.
2024-12-26 23:46:21 +00:00
Nicholas Wilson
af03ceaf97 Update ChangeLog and NEWS for 10.45 (#643) 2024-12-26 15:12:15 +00:00
Nicholas Wilson
09c07ac7ab Small improvement to combination of substitution callout + overflow (#637)
I reckon that callers are assuming that when you use the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option, it will calculate the entire memory requirement in one go. Just two calls should be sufficient (rather than needing to loop with a gradually-increasing buffer size).

However, with a substitution callout this is not true. If you call once with PCRE2_SUBSTITUTE_OVERFLOW_LENGTH, the buffer length returned might still not be sufficient for the second call to succeed.

This is because the callout might not be called the first time, but the second time it will be called and can affect control flow, by requiring even more buffer to be used. This occurs even if the callout is completely stateless, idempotent and well-behaved.

This fix ensures that when we skip a callout (due to overflow), we still request enough buffer size for either option that the callout might return.
2024-12-19 10:46:03 +00:00
Nicholas Wilson
f15bdd334d Update all man page dates #noupdate (#634) 2024-12-18 14:12:58 +00:00
Nicholas Wilson
f0819ca7c5 Update references to maintainers in the README (#633) 2024-12-18 13:54:08 +00:00
Nicholas Wilson
ac528f2d26 Details on new maintainership (#603)
* Add details on new maintainership
* Remove checked-in autoconf outputs
* Sync & cleanup files with Detrail
* Add CI job for ensuring PrepareRelease is run
* Add Ubuntu-20.04 autoconf runner
* Make CMake installed files match autoconf
* Update acknowledgements
2024-12-11 09:53:59 +00:00
Nicholas Wilson
aee5e9a97e Fix null-dereference bug in pcre2_substitute (#618)
Avoid one crash introduced with recent changes to substitute code as well as clarify what the expected offset value should be when overflowing the provided buffer.

---------

Co-authored-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2024-12-10 14:27:06 +00:00
Nicholas Wilson
0f22e67e7c Auto-format and minimal cleanup to CMake (#592)
I haven't tackled any controversial steps in this PR - simply tidying the formatting.

I have used the `gersemi` tool, which simply "does its thing". I have additionally renamed a few variables to match standard casing conventions (but I am aware that some lowercased variables are used, for example in package-config files, and have left those alone).
2024-12-07 19:31:05 +00:00
Philip Hazel
55fda7f384 Update EBCDIC documentation; in pcre2pattern move it all into a separate section. 2024-11-27 17:28:11 +00:00
Philip Hazel
adab4b69d8 Expand documentation and error messages for extended character classes 2024-11-26 16:00:01 +00:00
Zoltan Herczeg
1cb968d116 Move character matching code into pcre2_jit_char_inc.h (#569)
Useful for eclass jit implementation.
2024-11-26 12:27:39 +01:00
Nicholas Wilson
e0d4eee05e Implement Perl extended character classes (#553)
Fixes #536
2024-11-15 15:55:10 +01:00
Nicholas Wilson
fc38d9e784 Implement ALT_EXTENDED_CLASS flag (#523)
* Move some existing character class code into pcre2_compile_class.c
* Add a new flag PCRE2_ALT_EXTENDED_CLASS to change the behaviour of
  parsing [...] character classes, to emit new META codes, and new
  OP_ECLASS codes for nested character classes with operators
* Document the behaviour relative to the UTS#18 standard
* No JIT support; it falls back to the interpreter. DFA is supported.
2024-10-30 11:33:29 +01:00
Nicholas Wilson
b72cc97186 Add support for Turkish I casefolding (#521)
New flag: PCRE2_EXTRA_TURKISH_CASING, and pre-pattern flag
(*TURKISH_CASING).

Also added a pre-pattern flag (*CASELESS_RESTRICT) for this existing
flag.
2024-10-14 17:00:06 +01:00
Carlo Marcelo Arenas Belón
0d087cce82 pcre2grep: add $& as an alias for $0 (#519)
Perl does not use $0 anymore to refer to the text of the matched subject
and `pcre2_substitute()` was recently updated to also provide that value
using the variable Perl prefers: `$&`.

In a similar context, either as part of the formatted output from a match
or during the processing of a callback, teach pcre2grep to also populate
$&.

While at it, update the ChangeLog with recent changes.
2024-10-09 09:08:27 +01:00
Philip Hazel
60fd745ebc Minor documentation updates 2024-10-04 17:21:33 +01:00
Nicholas Wilson
9503e68b7c Add substitute case callout function (#512)
* Add substitute case callout function

* Fix foolish misunderstanding

* Fix trivial build error

* Fix non-Unicode tests
2024-10-04 16:57:58 +01:00
Nicholas Wilson
32f03ad588 Add option to disable callouts (#499)
* Add option to disable callouts

* Fix pcre2grep issue, and docs

* Add pcre2test docs
2024-10-02 12:00:02 +01:00
Philip Hazel
012ab39bd8 Correct substitution documentation 2024-09-24 09:23:40 +01:00
Zoltan Herczeg
d3095b4761 Improve character classes (#474)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2024-09-21 14:26:49 +01:00
Alex Dowad
b868b411e2 Add new API function pcre2_set_optimization() for controlling enabled optimizations (#471)
It is anticipated that over time, more and more optimizations will be
added to PCRE2, and we want to be able to switch optimizations off/on,
both for testing purposes and to be able to work around bugs in a
released library version.

The number of free bits left in the compile options word is very small.
Hence, we will start putting all optimization enable/disable flags in
a separate word. To switch these off/on, the new API function
pcre2_set_optimization() will be used.

The values which can be passed to pcre2_set_optimization() are
different from the internal flag bit values. The values accepted by
pcre2_set_optimization() are contiguous integers, so there is no
danger of ever running out of them. This means in the future, the
internal representation can be changed at any time without breaking
backwards compatibility. Further, the 'directives' passed to
pcre2_set_optimization() are not restricted to control a single,
specific optimization. As an example, passing PCRE2_OPTIMIZATION_FULL
will turn on all optimizations supported by whatever version of
PCRE2 the client program happens to be linked with.

Co-authored-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2024-09-21 14:14:32 +01:00
Philip Hazel
f964982eec Add documentation for PCRE2_EXTRA_BS0 and PCRE2_EXTRA_PYTHON_OCTAL 2024-09-21 10:17:10 +01:00
Philip Hazel
b463821c45 Documentation for added interpretation in replacement strings (PR #483) 2024-09-20 15:00:29 +01:00
Philip Hazel
d8b7f31671 Documentation for substitions processing changes 2024-09-17 16:55:08 +01:00
Philip Hazel
b6d05541ae Document substitute title-casing feature 2024-09-16 17:56:54 +01:00
Philip Hazel
6412606942 Update documentation for scan substring patterns - now supports a list of groups 2024-09-04 12:35:14 +01:00
Alex Dowad
64137d23e9 Add missing 'expand' modifier to list in pcre2test manpage (#458) 2024-09-04 11:48:53 +01:00
Philip Hazel
3808655ed4 Documentation update re (*THEN) 2024-09-01 17:01:38 +01:00
Philip Hazel
7a0eda1f66 Update documentation for scan_substring; also some code trailing space tidies 2024-08-30 17:31:55 +01:00