21 Commits

Author SHA1 Message Date
Nicholas Wilson
d59672a719 Address three minor TODO comments in the test code (#816) 2025-10-12 11:33:37 +01:00
Nicholas Wilson
9a868b0605 Tidy up config.h management (task from README) (#658)
This fortuitously fixes the 16/32-bit 'unity' build.

Also tidy up the ckd_smul macros, for the sake of the Unity build.
2025-01-11 19:01:08 +00:00
Nicholas Wilson
5b3edae9d2 Add CI test to ensure installation manifest is correct (#630)
The new CI job ensures that `make distcheck` passes.

It also bundles up the tarball and includes in the GitHub artifacts, along with a GitHub-provided attestation that the tarball is derived from the given build steps.
2024-12-18 12:02:23 +00:00
Nicholas Wilson
b72cc97186 Add support for Turkish I casefolding (#521)
New flag: PCRE2_EXTRA_TURKISH_CASING, and pre-pattern flag
(*TURKISH_CASING).

Also added a pre-pattern flag (*CASELESS_RESTRICT) for this existing
flag.
2024-10-14 17:00:06 +01:00
Carlo Marcelo Arenas Belón
a3011c4378 complete update to Unicode 16 (#513)
UCD 16 makes a lot of changes to scripts, so make sure that we have
sufficient coverage by keeping the original autogenerated tests in
addition.

Complete the code updates for changes to ScriptExtensions.txt which
is no longer sorted by script and allow for multiple unicode property
test files, depending on Unicode version.
2024-10-05 12:39:48 +01:00
Carlo Marcelo Arenas Belón
8c84b4ba58 reimplement asserts with a safer approach (#490)
The original asserts weren't very useful in debug mode as they
were lacking information on where they were being triggered and
were also unreliable and dangerous as they could result in
important code being removed and trigger crashes (even in non
debug mode).

Instead of implementing one generic assert for both modes, build
a more useful one for each one, so PCRE2_UNREACHABLE() could be
also used in non debug builds to help with optimization.

Reinstate all original assertions to use the new versions, which
will have the sideeffect of fixing indentation issues introduced
in the original, and include additional asserts that were provided
as the original ones were being audited for safety. Note that during
such audit the use of the original asserts might had been refactored
so it also includes all those relevant code changes.

While at it, update cmake and CI to help with testing as well as
other documentation.

Co-authored-by: Alex Dowad <alexinbeijing@gmail.com>
2024-10-02 15:08:18 +01:00
Zoltan Herczeg
0333a783a4 Improve caseless character range processing when utf is enabled (#477)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2024-09-18 17:16:54 +01:00
Carlo Marcelo Arenas Belón
9c905ce0c1 maint: avoid duplicated boolean properties and bad script extensions (#202)
`ucptest` was misbehaving and showing the wrong properties and
finding the wrong characters.
2023-02-03 14:57:32 +00:00
Carlo Marcelo Arenas Belón
af0839f911 maint: honor @missing in DerivedBidiClass and report non values (#201)
Starting with Unicode 15, the provided DerivedBidiClass data file
reports different default values to use for unassigned characters
in different groups.

Process the additional hints for that specific file, and allow
overriding the values later if more specific.

Since that was previously forbidden, change get_other_case() to
report when no valid value could be provided and allow skipping
conflicting rule lines that required that restriction.

While at it, Allow using the long identifiers in `ucptest` with
the `find bidi` command (underscores also allowed).
2023-02-02 17:31:13 +00:00
Philip Hazel
c13d54f658 Implement PCRE2_EXTRA_CASELESS_RESTRICT and related features 2023-01-29 16:46:24 +00:00
Philip Hazel
f7a7341726 Update ucd.c generation script for overlong initializer 2022-03-04 08:41:57 +00:00
Philip Hazel
419e3c68a3 Tidy comments 2022-01-14 16:05:30 +00:00
Zoltan Herczeg
e21345de97 Extend unicode boolean property bitset index to 12 bit (#81)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2022-01-14 15:51:03 +00:00
Philip Hazel
360a84e80b Update descriptive comments in UCD generation. 2022-01-12 17:38:48 +00:00
Zoltan Herczeg
061e57695a Merge scriptx and bidi fields (#78)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2022-01-12 17:00:12 +00:00
Philip Hazel
87571b5af3 Update documentation and comments for UCD generation 2022-01-10 16:26:41 +00:00
Philip Hazel
636569a957 Initial code for Boolean property support 2022-01-09 14:46:43 +00:00
Philip Hazel
d888d36013 Update script run code to work with new script extensions coding 2021-12-31 16:06:05 +00:00
Zoltan Herczeg
6614b281bc Implement script extension support in JIT. (#66)
Fix incorect operator in GenerateUcd.py (modulo -> bitwise and)

Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2021-12-29 15:57:32 +00:00
Zoltan Herczeg
afa4756d19 Rework script extension handling (#64)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2021-12-29 09:35:22 +00:00
Philip Hazel
98e7d70bc6 Refactor Python scripts for generating Unicode property data 2021-12-26 17:49:58 +00:00