pcre2

mirror of https://github.com/PCRE2Project/pcre2.git synced 2025-10-17 23:57:23 +08:00

Author	SHA1	Message	Date
Nicholas Wilson	16d7edb567	Fix the lookahead after [\d or [[:posix] to skip whitespace (#544 )	2024-11-01 17:13:34 +00:00
Nicholas Wilson	ccb259f089	Use assert() for unreachable assertions (#543 )	2024-10-31 12:48:34 +01:00
Zoltan Herczeg	24f9d8df0b	Move character lists data before the byte code in a pattern (#540 ) This ensures aligned data store even when the range is repeated. Furthermore character lists are stored once regerdless of repeats. Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>	2024-10-31 08:29:49 +01:00
Nicholas Wilson	fc38d9e784	Implement ALT_EXTENDED_CLASS flag (#523 ) * Move some existing character class code into pcre2_compile_class.c * Add a new flag PCRE2_ALT_EXTENDED_CLASS to change the behaviour of parsing [...] character classes, to emit new META codes, and new OP_ECLASS codes for nested character classes with operators * Document the behaviour relative to the UTS#18 standard * No JIT support; it falls back to the interpreter. DFA is supported.	2024-10-30 11:33:29 +01:00
Zoltan Herczeg	96f0653589	Rework character list matching in JIT (#535 ) Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>	2024-10-21 15:08:15 +01:00
Carlo Marcelo Arenas Belón	ef11bee735	pcre2_jit_compile: avoid potential wraparound if framesize <= 0 (#531 ) Change the minimum framesize value to match what the code can support, while at it, refactor some of the conditionals used so that extracting the framesize is more reliable (as the assert is polymorphic) and update other seemingly unrelated bits	2024-10-21 15:05:07 +01:00
Carlo Marcelo Arenas Belón	295f94cf90	perltest: add support for locale modifier (#534 ) Use a similar syntax to pcre2test to set a per pattern locale, and teach pcre2test to recognize the modifier as perl compatible. While at it, update tests and fix a recent regresion that wasn't covered by them.	2024-10-21 14:54:34 +01:00
Zoltan Herczeg	998d2e06f8	Use less branches when matching xclass in jit (#530 ) Also move clist processing out of xclass Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>	2024-10-18 09:23:47 +01:00
Zoltan Herczeg	182461aba1	Improve character range matching with binary search (#524 ) Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>	2024-10-18 07:00:19 +02:00
Carlo Marcelo Arenas Belón	1e09555d69	perltest: add support for hex modifier (#529 ) * pcre2test: tighten \N{U+hh...} support When \N{U+hh...} was added it was meant to support all unicode characters that can be encoded by pcre2test and Perl, but its use outside what is officially considered valid can be confusing so print a warning for those cases. * perltest: add support for hex modifier The use of \xhh can be ambiguous when used together with the utf modifier, so allow for describing code points individually in the pattern using hex, with the same syntax that is already supported by pcre2test.	2024-10-17 16:42:31 +01:00
Carlo Marcelo Arenas Belón	03be4d2d7f	pcre2test: add support for \N{U+hh...} escapes in subject (#528 ) When providing escaped values in the subject, the syntax can be ambiguous, so add support for a new escape that is always meant to refer to a Unicode character and that is already supported by the library in utf mode. While at it, refactor the code to support octal escapes and fix bugs with overlong numbers, as well to simplify the logic that decides if an escape is encoded as a code unit or as an Unicode character, that could require multiple code units.	2024-10-16 15:23:57 +01:00
Nicholas Wilson	b72cc97186	Add support for Turkish I casefolding (#521 ) New flag: PCRE2_EXTRA_TURKISH_CASING, and pre-pattern flag (TURKISH_CASING). Also added a pre-pattern flag (CASELESS_RESTRICT) for this existing flag.	2024-10-14 17:00:06 +01:00
Philip Hazel	c9bf83398a	Update perltest.sh to support specifying a locale.	2024-10-11 17:44:47 +01:00
Carlo Marcelo Arenas Belón	0d087cce82	pcre2grep: add $& as an alias for $0 (#519 ) Perl does not use $0 anymore to refer to the text of the matched subject and `pcre2_substitute()` was recently updated to also provide that value using the variable Perl prefers: `$&`. In a similar context, either as part of the formatted output from a match or during the processing of a callback, teach pcre2grep to also populate $&. While at it, update the ChangeLog with recent changes.	2024-10-09 09:08:27 +01:00
Nicholas Wilson	223941425f	Fix OP_REFI for caseless_restrict (#516 )	2024-10-08 13:37:35 +02:00
Zoltan Herczeg	440f5d1819	Fix case restricted range processing (#518 ) Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>	2024-10-08 12:04:44 +02:00
Patrick	dcbf9a0a13	Fixing typo in ManyConfigTests (#515 )	2024-10-07 06:39:34 +02:00
Philip Hazel	a94d1f4509	Fix minor omission in #513	2024-10-05 12:46:34 +01:00
Carlo Marcelo Arenas Belón	a3011c4378	complete update to Unicode 16 (#513 ) UCD 16 makes a lot of changes to scripts, so make sure that we have sufficient coverage by keeping the original autogenerated tests in addition. Complete the code updates for changes to ScriptExtensions.txt which is no longer sorted by script and allow for multiple unicode property test files, depending on Unicode version.	2024-10-05 12:39:48 +01:00
Philip Hazel	92ee469997	Get pcre2.h.generic up to date	2024-10-04 17:34:31 +01:00
Philip Hazel	60fd745ebc	Minor documentation updates	2024-10-04 17:21:33 +01:00
Nicholas Wilson	9503e68b7c	Add substitute case callout function (#512 ) * Add substitute case callout function * Fix foolish misunderstanding * Fix trivial build error * Fix non-Unicode tests	2024-10-04 16:57:58 +01:00
Zoltan Herczeg	07617a3845	Optimize character class processing (#511 ) Also fix empty class character matching Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>	2024-10-04 16:42:51 +01:00
Carl Weaver	75dd028862	Add PCRE2_EXTRA_NEVER_CALLOUT flag to pcre2.h.generic (#510 )	2024-10-04 08:53:41 +01:00
Zoltan Herczeg	c49e596481	Improve character range generation (#508 ) This patch improve the 8 bit and \p{Any} handling of the code. Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>	2024-10-03 17:11:41 +01:00
Carlo Marcelo Arenas Belón	8c84b4ba58	reimplement asserts with a safer approach (#490 ) The original asserts weren't very useful in debug mode as they were lacking information on where they were being triggered and were also unreliable and dangerous as they could result in important code being removed and trigger crashes (even in non debug mode). Instead of implementing one generic assert for both modes, build a more useful one for each one, so PCRE2_UNREACHABLE() could be also used in non debug builds to help with optimization. Reinstate all original assertions to use the new versions, which will have the sideeffect of fixing indentation issues introduced in the original, and include additional asserts that were provided as the original ones were being audited for safety. Note that during such audit the use of the original asserts might had been refactored so it also includes all those relevant code changes. While at it, update cmake and CI to help with testing as well as other documentation. Co-authored-by: Alex Dowad <alexinbeijing@gmail.com>	2024-10-02 15:08:18 +01:00
Carlo Marcelo Arenas Belón	c0d86f7d21	pcre2test: tighten \x{...} parsing in subject (#504 ) Eventhough it is documented that invalid escapes will be reported, the code would fallback in that case and result in a NUL being generated whenever an incompete \x{ escape was being parsed. Refactor the code to report the error instead and fix the logic used for overlong numbers so that the truncation doesn't result in an unexpected value being used. There was an old (from PCRE 4.0) test that was affected but which is no longer relevant, because it could only be triggered with invalid UTF (which isn't supported), and that was therefore removed as a result. Additionally, it was found that the same syntax error was affecting perltest so correct that as well by reporting syntax errors in the subject lines. While at it update related documentation for Perl's compatibility.	2024-10-02 12:13:37 +01:00
Ayesh Karunaratne	43cd99b632	Unicode: Update to UCD 16 (#503 ) Updates to Unicode files to Unicode 16, adjusts tests, and the scripts used to parse UCD, to adapt to minor formatting differences in UCD 16. The `GenerateTest26.py` and `GenerateCommon.py` had a regexp to extract properties from the `ScriptExtensions.txt` file. Previously, all property lines had one space after space-separated list of scripts. In UCD-16, this list is adjusted with right-padding, which throws off the parser. This commit adjusts the regexps to ignore padding spaces.	2024-10-02 12:06:11 +01:00
Nicholas Wilson	32f03ad588	Add option to disable callouts (#499 ) * Add option to disable callouts * Fix pcre2grep issue, and docs * Add pcre2test docs	2024-10-02 12:00:02 +01:00
Zoltan Herczeg	7c215fa51b	Add property data to classbits (#506 ) Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>	2024-09-30 16:50:47 +01:00
Carlo Marcelo Arenas Belón	29494c1dfd	ci: add perl DEV job (#505 ) Validates Perl compatibility	2024-09-30 16:05:45 +01:00
Nicholas Wilson	d704ee40c5	Improve error message for \N{name} in character classes (#502 )	2024-09-27 16:31:01 +01:00
Nicholas Wilson	311eef6e7a	Fix two small issues with substitution nulls (#501 )	2024-09-27 16:03:11 +01:00
Nicholas Wilson	a79dc73f86	Fix handling of \g<0> in pcre2_substitute (#498 )	2024-09-27 15:44:37 +01:00
Zoltan Herczeg	46668dd9ba	Simplify class range processing (#496 ) Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>	2024-09-26 17:45:17 +01:00
Carlo Marcelo Arenas Belón	6915395e90	ci: add a full integer sanitizer clang job (#495 ) JIT has several good uses of unsigned integer wraparounds, that clang's UBSAN doesn't like (which is controversial, because it is clearly not undefined behaviour), but since it is usually good to know when they happen by accident it makes sense to make sure the rest of PCRE2 codebase benefits from checking it. While at it, upgrade the version of the base image to use a newer OS as a canary from when the rest of the jobs upgrade themselves and be a little more strict to catch other constructs that are not welcomed in our codebase.	2024-09-25 16:53:00 +01:00
Philip Hazel	d8718cd728	Get rid of shadow variable warning	2024-09-24 16:38:00 +01:00
Zoltan Herczeg	30d48f2342	Handle spaces by the new range system (#494 ) Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>	2024-09-24 16:34:38 +01:00
Carlo Marcelo Arenas Belón	6f5a4d9fd0	doc: fix silly confusion about re "strict" (#493 ) in Perl, `use strict` has no effect on this feature but instead the "strict" mode referenced applies to `use re 'strict'`	2024-09-24 16:28:52 +01:00
Philip Hazel	012ab39bd8	Correct substitution documentation	2024-09-24 09:23:40 +01:00
Carlo Marcelo Arenas Belón	c715450695	pcre2_convert: fix return value again and refactor for clarity (#489 ) Since `4f6c43d` (Add assertion macros, use new PCRE2_UNREACHABLE assertion at unreachable points in code (#446), 2024-08-28) and then again after `04dc664` (Implement PCRE2_UNREACHABLE assertion for MS Visual C++ (#465), 2024-09-10), this API could return random values on failure. Remove assertion, until it could be added back in a way that wouldn't trigger a crash in non debug builds or result in the function returning without an API expected value.	2024-09-23 17:00:59 +01:00
Philip Hazel	82a54640ea	Add more $ interpretation to substitutions	2024-09-23 16:59:22 +01:00
Nicholas Wilson	92f5f621de	Substitute support for $& $` $' $_ and \b \v (#491 )	2024-09-23 16:23:37 +01:00
Zoltan Herczeg	a7c6109178	Implement pre-processed character range list caching (#488 ) Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>	2024-09-23 16:08:48 +01:00
Carlo Marcelo Arenas Belón	bc367f1880	pcre2_compile: avoid 1 byte buffer overread parsing VERBs (#487 ) As reported recently by `ef218fb` (Guard against out-of-bounds memory access when parsing LIMIT_HEAP et al (#463), 2024-09-07), a malformed pattern could result in reading 1 byte past its end. Fix a similar issue that affects all VERBs and add test cases to ensure the original bug and all its siblings are no longer an issue. While at it fix the wording of the related documentation.	2024-09-22 09:49:03 +01:00
Carlo Marcelo Arenas Belón	328b80010c	doc: include summary of directives in pcre2_set_optimize.3 (#486 ) Make the documentation of the new API more useful at a first glance by providing the list of values that can be used while keeping all details in pcre2api.3, and just like it is done in other similar pages. While at it reorder the entries for the directives in pcre2api.3 so it is more natural and to match the one used here.	2024-09-22 09:45:36 +01:00
Philip Hazel	cd4c0e3fc1	Improve error messages for NO_BS0 and PYTHON_OCTAL errors	2024-09-21 15:01:44 +01:00
Philip Hazel	1d73e1995e	Get rid of declaraction/code warning	2024-09-21 14:31:14 +01:00
Zoltan Herczeg	d3095b4761	Improve character classes (#474 ) Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>	2024-09-21 14:26:49 +01:00
Carlo Marcelo Arenas Belón	8a109eab14	Do not use __builtin_unreachable() in PCRE2_ASSERT() (#485 ) __builting_unreachable() implementation is not defined and has been known to not trigger failures under some compilers. Default instead to using assert(), which has the added benefit of printing a descriptive message and it is also likely more portable as it is part of ANSI C. While at it, really allow configuring builtins with cmake.	2024-09-21 14:19:25 +01:00

... 3 4 5 6 7 ...

2095 Commits