Commit Graph

15 Commits

Author SHA1 Message Date
Ayesh Karunaratne
43cd99b632 Unicode: Update to UCD 16 (#503)
Updates to Unicode files to Unicode 16, adjusts tests, and the
scripts used to parse UCD, to adapt to minor formatting differences
in UCD 16.

The `GenerateTest26.py` and `GenerateCommon.py` had a regexp to
extract properties from the `ScriptExtensions.txt` file. Previously,
all property lines had one space after space-separated list of scripts.
In UCD-16, this list is adjusted with right-padding, which throws off
the parser.

This commit adjusts the regexps to ignore padding spaces.
2024-10-02 12:06:11 +01:00
Carlo Marcelo Arenas Belón
af0839f911 maint: honor @missing in DerivedBidiClass and report non values (#201)
Starting with Unicode 15, the provided DerivedBidiClass data file
reports different default values to use for unassigned characters
in different groups.

Process the additional hints for that specific file, and allow
overriding the values later if more specific.

Since that was previously forbidden, change get_other_case() to
report when no valid value could be provided and allow skipping
conflicting rule lines that required that restriction.

While at it, Allow using the long identifiers in `ucptest` with
the `find bidi` command (underscores also allowed).
2023-02-02 17:31:13 +00:00
Carlo Marcelo Arenas Belón
40626f0cb6 upgrade to Unicode 15 (#200)
Reverting several reserved characters that were removed from the
previous release, and that are only referencing as "@missing@ in
DerivedBidiClass.txt
2023-02-01 15:28:21 +00:00
Philip Hazel
636569a957 Initial code for Boolean property support 2022-01-09 14:46:43 +00:00
Zoltan Herczeg
f90542a209 Improve unicode property abbreviation support (#74)
* Improve unicode property abbreviation support

* Auto-generate script names

Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2022-01-07 10:01:18 +00:00
Philip Hazel
823d4ac956 Add bidi class and control information to Unicode property data 2021-12-05 18:00:10 +00:00
Carlo Marcelo Arenas Belón
f5e4e10042 Update to Unicode 14.0.0 (#29) 2021-10-29 14:44:17 +01:00
Philip.Hazel
c472f3f91a Update to Unicode 13.0.0. 2020-03-25 17:18:33 +00:00
Philip.Hazel
aff5a78056 Upgrade to Unicode 12.1.0 2019-07-29 15:32:36 +00:00
Philip.Hazel
04ba4bce0f Unicode properties data records extended to 12-bytes to include a
ScriptExtensions property.
2018-10-06 17:39:52 +00:00
Philip.Hazel
937617f343 Update to Unicode 11.0.0 2018-07-07 16:10:29 +00:00
Philip.Hazel
41bb787fb3 Update to Unicode 10.0.0 and add callout_no_where to pcre2test to aid testing. 2017-07-02 16:32:01 +00:00
Philip.Hazel
d702527628 Update Unicode tables to 8.0.0. 2015-07-17 15:44:51 +00:00
Philip.Hazel
bf2bc83ed8 Update for Unicode 7.0.0 2014-06-20 12:40:32 +00:00
Philip.Hazel
225992aa3a Further work on pcre2test (can now display compiled code). 2014-05-13 11:20:03 +00:00