Commit Graph

131 Commits

Author SHA1 Message Date
Nicholas Wilson
e62c0e0916 Re-apply "Use standard CMake constructs to export the targets. (#260)" (#739)
Additionally, I have attempted to clean up some CMake issues to make the
package's build interface cleaner, in particular, avoiding polluting the
parent directory's include path with our config.h file (if PCRE2 is being
included as a subdirectory).

This re-adds changes from Theodore's commit:
    def175f4a9
and partially reverts changes from Carlo's commit:
    92d56a1f7c

---------

Co-authored-by: Theodore Tsirpanis <teo@tsirpanis.gr>
2025-04-08 17:37:19 +01:00
Nicholas Wilson
a80dd59111 Update my maintainer documentation to remind me to merge releases to master
See user request in #742.
2025-04-02 19:52:28 +01:00
Nicholas Wilson
eb3bd3cf14 New pcre2_next_match() API to simplify pcre2demo, test, and substitute (#733)
* The primary purpose of pcre2_next_match() is to make it much easier for
  PCRE2 clients to iterate over matches, without needing an advanced knowledge
  of regular expressions.
* Secondly, we can simplify our own code by merging the three duplicate
  implementations of the /g global match behaviour: pcre2demo, pcre2_substitute,
  and pcre2test.
* Thirdly, as I look closely at the issue, I can improve the documentation.
* Fourthly, I would like to actually simplify the logic, removing a complex loop
  which makes several match attempts, swallows duplicate matches, and more.
  We can have identical behaviour with a simple retry using
  PCRE2_NOTEMPTY_ATSTART.
2025-03-24 13:29:52 +00:00
Nicholas Wilson
990d53f192 Add linker scripts with symbol versioning (#721)
Both the Autoconf and CMake build systems are updated to detect linker support for symbol versioning.

Currently, Linux, Solaris, and FreeBSD are tested and working. Windows (COFF) and macOS (Mach-O) have no symbol versioning.

There is an Autoconf/CMake flag to opt out of the versioning behaviour.
2025-03-18 08:55:38 +00:00
Nicholas Wilson
b3ecb621bd Remove the old WORKSPACE.bazel file (#732) 2025-03-17 20:24:33 +00:00
Zoltan Herczeg
d37c6dfe2d JIT compiler update (#723) 2025-03-07 17:26:03 +01:00
Nicholas Wilson
b79ee1dea5 Rename files which are #included (#708)
We have four files which have .c extensions, but which are actually #included rather than treated as their own compilation unit.

This goes against conventions - Autotools, CMake, and Bazel all assume that the .h/.c distinction indicates which files are compilation units.

pcre2_jit_match.c -> _inc.h
pcre2_jit_misc.c -> _inc.h
pcre2_printint.c -> _inc.h
pcre2_ucptables.c -> _inc.h
2025-02-27 06:57:44 +00:00
Nicholas Wilson
929b7404f9 Build fixes for z/OS (#695)
Fixes to enable the code to build with a simple `CC=xlc ./configure --enable-ebcdic --disable-unicode`.

Fixes to the tests, so that `make check` passes on EBCDIC platforms.

Add a CI job to do z/OS testing.
2025-02-26 22:31:51 +00:00
Nicholas Wilson
6e1da609f4 Update 132html to use <h2> and <h3> 2025-02-26 22:27:06 +00:00
Nicholas Wilson
500c68b986 Add testing for malloc() failures (#697)
An additional testing argument, `-malloc` is added to pcre2test and to RunTest.

The ManyConfig tests run this now in CI.

We exercise each malloc failure in the core code by counting how many mallocs are done, then repeating compilation and matching with a failure on each successive malloc.
2025-02-23 09:51:32 +00:00
Zoltan Herczeg
861a8aae41 Improve named group handling (#700)
Add a simple hash code for group names to improve search speed.
Ignore duplicates when group names are searched.
Improve finding of duplicates (they have the same name pointer).
Improve creating name table (duplicates are handled in one step).
Create a new file for name management.
2025-02-18 18:04:14 +01:00
Nicholas Wilson
db3b532aa0 Improve RunTest to continue after a test failure (#696)
This makes it easier to se all the failures at once, rather than
having to fix one at a time. The output is now grouped into
directories.
2025-02-15 11:50:25 +00:00
Nicholas Wilson
0d0ac3aa0f Update EBCDIC support to support testing on normal ASCII systems (#656)
The pcre2test utility needs quite a few changes to accommodate this.
It is simpler to add a new mode to it, than to make it fully
EBCDIC-native. On an ASCII system, pcre2test performs ASCII I/O, but
tranlates the input when passing it to the fully-EBCDIC-supporting
library.
2025-02-12 22:31:00 +00:00
Nicholas Wilson
ce6e960c49 Add tests using nm to check so contents (#693)
We want to ensure that the symbols included in shared object files (.so, .dylib, .dll) are as expected.

This verifies that our Autoconf and CMake builds are using the `-fvisibility` flags and attributes correctly.

As it turns out, the CMake build on Solaris is not working, and was exposing internal symbols due to a CMake issue.
2025-02-12 21:30:56 +00:00
Nicholas Wilson
d8a9f2fe55 Add LICENSE file for sljit in the tarball (#692) 2025-02-11 21:04:19 +00:00
Nicholas Wilson
2aa7681fb5 Update with my release procedure (#684) 2025-02-05 09:53:59 +00:00
Nicholas Wilson
1fffb0d44e Updates to the README and some documentation (#681) 2025-02-01 15:50:20 +00:00
MatthewVernon
0d579d3568 pcre2grep.1 - fix warning about undefined macro 0 (#673)
Debian's "lintian" picked this up - line 950 in the man page starts
with a ' which is how you start a roff request. You can reproduce the
warning thus:

```
LC_ALL=C.UTF-8 MANROFFSEQ='' MANWIDTH=80 \
man --warnings -E UTF-8 -l -Tutf8 -Z doc/pcre2grep.1 >/dev/null
```

The fix is to add a zero-width space (`\&`) to the start of the
relevant line (indeed `groff_man(7)` suggests exactly this use for \&).

---------

Co-authored-by: Matthew Vernon <matthew@debian.org>
2025-01-24 11:44:47 +00:00
Carlo Marcelo Arenas Belón
81dced9442 maint: avoid clang masquerading as gcc in ManyConfigTests (#671)
__GNUC__ is defined by any compiler that claims compliance to GNU
but that doesn't include the cmdline interface, so avoid passing
GCC specific warning flags to clang.
2025-01-14 14:12:26 +00:00
Nicholas Wilson
03c097797a Fix up GCC compiler detection in ManyConfigTests (#670)
Also, add fixes/suppressions for the inevitable warnings that have appeared due to the CI job not running with warnings.
2025-01-13 16:40:52 +00:00
Carlo Marcelo Arenas Belón
95181ffc05 macOS related patches for 10.45 (#668)
* autotools: retire conditional for debug build

likely added by mistake, the functionality works through `--enable-debug`
instead.

* maint: allow selecting compiler for ManyConfigTests

Instead of hardcoding the compiler as `cc`, let a CC environment
variable dictate which compiler to use.

For example, in macOS/arm64 where the GNU compiler is provided by
brew the following will allow using it instead of the system compiler
(which ALSO answers to `gcc` even though is `clang`)

  % CC=gcc-13 maint/ManyConfigTests
2025-01-13 15:42:07 +00:00
Nicholas Wilson
9a868b0605 Tidy up config.h management (task from README) (#658)
This fortuitously fixes the 16/32-bit 'unity' build.

Also tidy up the ckd_smul macros, for the sake of the Unity build.
2025-01-11 19:01:08 +00:00
Nicholas Wilson
f724b6117b Declutter one cmake file (#662) 2025-01-11 10:29:49 +00:00
Nicholas Wilson
acb4b56944 Enable and fix additional build warnings (#655)
Part of #651

* Use much stricter windows warnings (`/W3` rather than `/W1`). This requires quite a few fixes for all the sloppy places where we do implicit assignment of 64-bit values to 32-bit storage.
* Use and test CMake build & install on FreeBSD and Solaris
* Add 64 bit Solaris build (`cc -m64`) and fix existing Solaris warnings
* Make compile flags used in CI consistent across platforms. Previously Mac & Linux were building with different warning flags.
* Add `--enable-Werror` to `configure.ac`. This means that you can build with `-Werror` in a clean way. Previously, you had to hackily override the CPPFLAGS when calling `make` since you can't pass `-Werror` as a CFLAG into `./configure` (it messes with compiler feature detection).
2025-01-10 13:33:52 +00:00
Nicholas Wilson
070f561c62 Add maint scripts for checking and updating library version & updated dates (#635)
The workflow shall be:

* When the release number is bumped, all references to that release number need to be bumped immediately. (For example, when the source code moves from 10.45 → 10.46, the man pages must do so as well.)
* When documentation is updated, there's no need to update the "last modified" dates by hand. We can sweep those all up during the release process. Or update them immediately - there's no harm in it; we simply aren't obliged to.
2024-12-20 22:00:32 +00:00
Nicholas Wilson
4d3eada79b Update soversion passed to libtool for 10.45 (#636)
I believe this is the correct procedure, based on Philip's documentation.

The libpcre2-posix interface is completely unchanged, but the source code has been updated.

The libpcre2-NN interface has been extended in backwards-compatible ways, with new enum values and API functions, so callers compiled and linked against the old version may use the newer as a drop-in replacement, but callers compiled against the 10.45 headers will fail when used against the old version (if they require any newly-added functions).
2024-12-20 13:36:08 +00:00
Nicholas Wilson
5b3edae9d2 Add CI test to ensure installation manifest is correct (#630)
The new CI job ensures that `make distcheck` passes.

It also bundles up the tarball and includes in the GitHub artifacts, along with a GitHub-provided attestation that the tarball is derived from the given build steps.
2024-12-18 12:02:23 +00:00
Nicholas Wilson
ac528f2d26 Details on new maintainership (#603)
* Add details on new maintainership
* Remove checked-in autoconf outputs
* Sync & cleanup files with Detrail
* Add CI job for ensuring PrepareRelease is run
* Add Ubuntu-20.04 autoconf runner
* Make CMake installed files match autoconf
* Update acknowledgements
2024-12-11 09:53:59 +00:00
Zoltan Herczeg
182461aba1 Improve character range matching with binary search (#524)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2024-10-18 07:00:19 +02:00
Nicholas Wilson
b72cc97186 Add support for Turkish I casefolding (#521)
New flag: PCRE2_EXTRA_TURKISH_CASING, and pre-pattern flag
(*TURKISH_CASING).

Also added a pre-pattern flag (*CASELESS_RESTRICT) for this existing
flag.
2024-10-14 17:00:06 +01:00
Patrick
dcbf9a0a13 Fixing typo in ManyConfigTests (#515) 2024-10-07 06:39:34 +02:00
Carlo Marcelo Arenas Belón
a3011c4378 complete update to Unicode 16 (#513)
UCD 16 makes a lot of changes to scripts, so make sure that we have
sufficient coverage by keeping the original autogenerated tests in
addition.

Complete the code updates for changes to ScriptExtensions.txt which
is no longer sorted by script and allow for multiple unicode property
test files, depending on Unicode version.
2024-10-05 12:39:48 +01:00
Carlo Marcelo Arenas Belón
8c84b4ba58 reimplement asserts with a safer approach (#490)
The original asserts weren't very useful in debug mode as they
were lacking information on where they were being triggered and
were also unreliable and dangerous as they could result in
important code being removed and trigger crashes (even in non
debug mode).

Instead of implementing one generic assert for both modes, build
a more useful one for each one, so PCRE2_UNREACHABLE() could be
also used in non debug builds to help with optimization.

Reinstate all original assertions to use the new versions, which
will have the sideeffect of fixing indentation issues introduced
in the original, and include additional asserts that were provided
as the original ones were being audited for safety. Note that during
such audit the use of the original asserts might had been refactored
so it also includes all those relevant code changes.

While at it, update cmake and CI to help with testing as well as
other documentation.

Co-authored-by: Alex Dowad <alexinbeijing@gmail.com>
2024-10-02 15:08:18 +01:00
Ayesh Karunaratne
43cd99b632 Unicode: Update to UCD 16 (#503)
Updates to Unicode files to Unicode 16, adjusts tests, and the
scripts used to parse UCD, to adapt to minor formatting differences
in UCD 16.

The `GenerateTest26.py` and `GenerateCommon.py` had a regexp to
extract properties from the `ScriptExtensions.txt` file. Previously,
all property lines had one space after space-separated list of scripts.
In UCD-16, this list is adjusted with right-padding, which throws off
the parser.

This commit adjusts the regexps to ignore padding spaces.
2024-10-02 12:06:11 +01:00
Carlo Marcelo Arenas Belón
29494c1dfd ci: add perl DEV job (#505)
Validates Perl compatibility
2024-09-30 16:05:45 +01:00
Carlo Marcelo Arenas Belón
6915395e90 ci: add a full integer sanitizer clang job (#495)
JIT has several good uses of unsigned integer wraparounds, that
clang's UBSAN doesn't like (which is controversial, because it is
clearly not undefined behaviour), but since it is usually good to
know when they happen by accident it makes sense to make sure the
rest of PCRE2 codebase benefits from checking it.

While at it, upgrade the version of the base image to use a newer
OS as a canary from when the rest of the jobs upgrade themselves
and be a little more strict to catch other constructs that are not
welcomed in our codebase.
2024-09-25 16:53:00 +01:00
Zoltan Herczeg
0333a783a4 Improve caseless character range processing when utf is enabled (#477)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2024-09-18 17:16:54 +01:00
Philip Hazel
9a5ff3e3af Make maint/RunPerlTest public 2024-08-22 17:10:16 +01:00
Philip Hazel
dbbde80109 Update maintainer documentation 2024-06-27 08:51:54 +01:00
Carlo Marcelo Arenas Belón
83ce64305f ci: add dev job using clang (#344)
Adds more sanitizers and including "fixes" to the code to make them
mostly clean, while also using link-size 3.

While at it, make sure the original "basic" job (now renamed) error
on failure for better visibility, cleanup some 32bit signed shifts
and update documentation.
2024-02-23 12:10:12 +00:00
Philip Hazel
8e83acc599 Upgrade interpreter to match JIT in handling of nested pattern recursions 2023-11-30 16:05:33 +00:00
Carlo Marcelo Arenas Belón
4a2bf0c5ab minor cleanup for previous fix (#299)
Move tests to the non UTF file and remove dead code.

While at it, do some other minor changes everywhere.
2023-10-01 17:16:15 +01:00
Carlo Marcelo Arenas Belón
1bc34ffa64 pcre2grep: document better possible multiline matching misses (#252)
While at it, remove a misplaced cast that would cause problems for
subjects over 2GB and a few typos.
2023-05-12 15:54:02 +01:00
Carlo Marcelo Arenas Belón
9c905ce0c1 maint: avoid duplicated boolean properties and bad script extensions (#202)
`ucptest` was misbehaving and showing the wrong properties and
finding the wrong characters.
2023-02-03 14:57:32 +00:00
Carlo Marcelo Arenas Belón
af0839f911 maint: honor @missing in DerivedBidiClass and report non values (#201)
Starting with Unicode 15, the provided DerivedBidiClass data file
reports different default values to use for unassigned characters
in different groups.

Process the additional hints for that specific file, and allow
overriding the values later if more specific.

Since that was previously forbidden, change get_other_case() to
report when no valid value could be provided and allow skipping
conflicting rule lines that required that restriction.

While at it, Allow using the long identifiers in `ucptest` with
the `find bidi` command (underscores also allowed).
2023-02-02 17:31:13 +00:00
Carlo Marcelo Arenas Belón
72c9b57695 Ucptest updates (#199)
* ucptest: regenerate testoutput

Last sync with 1a5fcd (Remove unused variables in ucptest.c and update test data
for added properties, 2022-04-25), and showing significant differences.

* fix `findprop +` with UTF-8 characters and duplicated other case
2023-02-01 15:38:58 +00:00
Carlo Marcelo Arenas Belón
40626f0cb6 upgrade to Unicode 15 (#200)
Reverting several reserved characters that were removed from the
previous release, and that are only referencing as "@missing@ in
DerivedBidiClass.txt
2023-02-01 15:28:21 +00:00
Philip Hazel
c13d54f658 Implement PCRE2_EXTRA_CASELESS_RESTRICT and related features 2023-01-29 16:46:24 +00:00
Philip Hazel
0746b3d523 Update the ManyConfigTests to include new POSIX freestanding test, add a JIT test in a non-source directory, and update selectors 2022-12-14 10:47:39 +00:00
Philip Hazel
51a5fcdc1f Remove unused variables in ucptest.c and update test data for added properties 2022-04-25 15:19:09 +01:00