Skip to content

Commit

Permalink
Update to Unicode 15.0.0 (#92)
Browse files Browse the repository at this point in the history
* Update ucd.sh

* Regenerate files

* Update references to Unicode version

* Bump packages versions

* Restrict comparison to base to compatible GHC

* Fix bounds of unicode-data

* Add missing packages to stack.yaml
  • Loading branch information
wismill committed Oct 11, 2022
1 parent bf8bb53 commit 5c5013f
Show file tree
Hide file tree
Showing 50 changed files with 378 additions and 222 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ This repository provides packages to use the

The Haskell data structures are generated programmatically from the UCD files.
The latest Unicode version supported by these libraries is
[`14.0.0`](https://www.unicode.org/versions/Unicode14.0.0/).
[`15.0.0`](https://www.unicode.org/versions/Unicode15.0.0/).

### `unicode-data`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ does not match the version of this package.
| 9.0.[1-2] | 4.15.0 | 12.1 |
| 9.2.[1-4] | 4.16.0 | 14.0 |
| 9.4.[1-2] | 4.17.0 | 14.0 |
| 9.6.1 | 4.18.0 | 15.0 |
+-------------+----------------+-----------------+
-}

Expand Down
6 changes: 3 additions & 3 deletions experimental/unicode-data-text/unicode-data-text.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ library
build-depends:
base >= 4.7 && < 4.18,
text >= 1.2.4 && < 2.1,
unicode-data >= 0.3 && < 0.4
unicode-data >= 0.3 && < 0.5

test-suite test
import: default-extensions, compile-options
Expand All @@ -85,7 +85,7 @@ test-suite test
unicode-data-text
build-tool-depends:
hspec-discover:hspec-discover >= 2.0 && < 2.11
if impl(ghc >= 9.2.1)
if impl(ghc >= 9.5.1)
cpp-options: -DCOMPATIBLE_GHC_UNICODE
default-language: Haskell2010

Expand All @@ -100,6 +100,6 @@ benchmark bench
tasty-bench >= 0.2.5 && < 0.4,
tasty >= 1.4.1,
text >= 1.2.4 && < 2.1,
unicode-data >= 0.3 && < 0.4,
unicode-data >= 0.3 && < 0.5,
unicode-data-text
ghc-options: -O2 -fdicts-strict -rtsopts
3 changes: 3 additions & 0 deletions stack.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@ resolver: lts-18.18
packages:
- './unicode-data'
- './unicode-data-names'
- './unicode-data-scripts'
- './unicode-data-security'
- './experimental/unicode-data-text'
extra-deps:
- streamly-0.8.0
flags:
Expand Down
38 changes: 19 additions & 19 deletions ucd.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# we used to generate them earlier are exactly the same as the ones we are
# downloading. To ensure that verfication of the checksum is necessary.

VERSION=14.0.0
VERSION=15.0.0

# When downloading fresh new version comment this out
VERIFY_CHECKSUM=y
Expand All @@ -14,29 +14,29 @@ VERIFY_CHECKSUM=y
UCD_URL="https://www.unicode.org/Public/$VERSION/ucd"
# Filename:checksum
UCD_FILES="\
Blocks.txt:598870dddef7b34b5a972916528c456aff2765b79cd4f9647fb58ceb767e7f17 \
CaseFolding.txt:a566cd48687b2cd897e02501118b2413c14ae86d318f9abbbba97feb84189f0f \
DerivedCoreProperties.txt:e3eddd7d469cd1b0feed7528defad1a1cc7c6a9ceb0ae4446a6d10921ed2e7bc \
DerivedNormalizationProps.txt:b2c444c20730b097787fdf50bd7d6dd3fc5256ab8084f5b35b11c8776eca674c \
NameAliases.txt:14b3b677d33f95c51423dce6eef4a6a28b4b160451ecedee4b91edb6745cf4a3 \
PropertyValueAliases.txt:eb755757e20b72b330b2948df3cf2ff7adb0e31bb060140dc09dafb132ace2cd \
PropList.txt:6bddfdb850417a5bee6deff19290fd1b138589909afb50f5a049f343bf2c6722 \
Scripts.txt:52db475c4ec445e73b0b16915448c357614946ad7062843c563e00d7535c6510 \
ScriptExtensions.txt:d37eedf63ff9c48bac863d5f76862373d6cf5269fd21253d499e2430d638c01d \
SpecialCasing.txt:c667b45908fd269af25fd55d2fc5bbc157fb1b77675936e25c513ce32e080334 \
UnicodeData.txt:36018e68657fdcb3485f636630ffe8c8532e01c977703d2803f5b89d6c5feafb \
extracted/DerivedCombiningClass.txt:12b0c3af9b600b49488d66545a3e7844ea980809627201bf9afeebe1c9f16f4e \
extracted/DerivedName.txt:fef3e11514ba152f0d38a09f8018c03a825f846dbb912334c1e5c9fb29392a02 \
extracted/DerivedNumericValues.txt:11075771b112e8e7ccf6ffa637c4c91eadc3ef3db0517b24e605df8fd3624239"
Blocks.txt:529dc5d0f6386d52f2f56e004bbfab48ce2d587eea9d38ba546c4052491bd820 \
CaseFolding.txt:cdd49e55eae3bbf1f0a3f6580c974a0263cb86a6a08daa10fbf705b4808a56f7 \
DerivedCoreProperties.txt:d367290bc0867e6b484c68370530bdd1a08b6b32404601b8c7accaf83e05628d \
DerivedNormalizationProps.txt:d5687a48c95c7d6e1ec59cb29c0f2e8b052018eb069a4371b7368d0561e12a29 \
NameAliases.txt:3e39509e8fae3e5d50ba73759d0b97194501d14a9c63107a6372a46b38be18e8 \
PropertyValueAliases.txt:13a7666843abea5c6b7eb8c057c57ab9bb2ba96cfc936e204224dd67d71cafad \
PropList.txt:e05c0a2811d113dae4abd832884199a3ea8d187ee1b872d8240a788a96540bfd \
Scripts.txt:cca85d830f46aece2e7c1459ef1249993dca8f2e46d51e869255be140d7ea4b0 \
ScriptExtensions.txt:7e07313d9d0bee42220c476b64485995130ae30917bbcf7780b602d677d7e33f \
SpecialCasing.txt:78b29c64b5840d25c11a9f31b665ee551b8a499eca6c70d770fcad7dd710f494 \
UnicodeData.txt:806e9aed65037197f1ec85e12be6e8cd870fc5608b4de0fffd990f689f376a73 \
extracted/DerivedCombiningClass.txt:ca54f6360cd288ad92113415bf1f77749015abe11cbd6798d21f7fa81f04205d \
extracted/DerivedName.txt:f76288153e20de185a40f7ee6e0e365f3c6c80e9e3019b5aa0afc8ac2c1b15f2 \
extracted/DerivedNumericValues.txt:6bd30f385f3baf3ab5d5308c111a81de87bea5f494ba0ba69e8ab45263b8c34d"

# Security files (https://www.unicode.org/Public/security/$VERSION/$file)
SECURITY_URL="https://www.unicode.org/Public/security/$VERSION"
# Filename:checksum
SECURITY_FILES="\
IdentifierStatus.txt:3f3f368fccdb37f350ecedc20b37fa71ab31c04e847884c77780d34283539f73 \
IdentifierType.txt:45a150c23961b58d7784704af6c4daccd6517d97b6489e53d13bbdbf9e4f065f \
confusables.txt:f901938af166c3afa471bd10c224b0979cd024340f290649e16b29f779d48bfe \
intentional.txt:42243c12a2e20546e836576e3091a5a5db2c1fc506899b1d8b56f7b6eab77cb3"
IdentifierStatus.txt:fd5c5e510914a2018e092bc51ea653bd2bfcf7daa116a346f09179a0f74704b0 \
IdentifierType.txt:71e95d5811999776a39c33a9149e5bf3c3311217a36b89005c678f34f08debc0 \
confusables.txt:2b10130885c3370b101c52d7baedc452ab7f0e257b86c1e52ee657ecfc29ce64 \
intentional.txt:4550bcc406b5ce3b1a40ff857a3f8b703ea0c868c35f2f7c93d86bfb733215f9"

# Download the files

Expand Down
4 changes: 4 additions & 0 deletions unicode-data-names/Changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## 0.2.0 (September 2022)

- Update to [Unicode 15.0.0](https://www.unicode.org/versions/Unicode15.0.0/).

## 0.1.0 (June 2022)

- Initial release
2 changes: 1 addition & 1 deletion unicode-data-names/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ character names and aliases from the
The Haskell data structures are generated programmatically from the
Unicode character database (UCD) files. The latest Unicode version
supported by this library is
[`14.0.0`](https://www.unicode.org/versions/Unicode14.0.0/).
[`15.0.0`](https://www.unicode.org/versions/Unicode15.0.0/).

Please see the
[Haddock documentation](https://hackage.haskell.org/package/unicode-data-names)
Expand Down
4 changes: 2 additions & 2 deletions unicode-data-names/lib/Unicode/Char/General/Names.hs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
-- Stability : experimental
--
-- Unicode character names and name aliases.
-- See Unicode standard 14.0.0, section 4.8.
-- See Unicode standard 15.0.0, section 4.8.
--
-- @since 0.1.0

Expand Down Expand Up @@ -84,7 +84,7 @@ nameAliasesWithTypes
= fmap (fmap (fmap unpack))
. NameAliases.nameAliasesWithTypes

-- Note: names are ASCII. See Unicode Standard 14.0.0, section 4.8.
-- Note: names are ASCII. See Unicode Standard 15.0.0, section 4.8.
{-# INLINE unpack #-}
unpack :: CString -> String
unpack = unsafePerformIO . peekCAString

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
-- autogenerated from https://www.unicode.org/Public/14.0.0/ucd/NameAliases.txt
-- autogenerated from https://www.unicode.org/Public/15.0.0/ucd/NameAliases.txt
-- |
-- Module : Unicode.Internal.Char.UnicodeData.NameAliases
-- Copyright : (c) 2022 Composewell Technologies and Contributors
Expand All @@ -17,7 +17,7 @@ import Data.Maybe (fromMaybe)
import Foreign.C.String (CString)
import GHC.Exts (Ptr(..))

-- | Type of name alias. See Unicode Standard 14.0.0, section 4.8.
-- | Type of name alias. See Unicode Standard 15.0.0, section 4.8.
--
-- @since 0.1.0
data NameAliasType
Expand Down Expand Up @@ -86,7 +86,7 @@ nameAliasesWithTypes = \case
'\x0016' -> [(Control,[Ptr "SYNCHRONOUS IDLE\0"#]),(Abbreviation,[Ptr "SYN\0"#])]
'\x0017' -> [(Control,[Ptr "END OF TRANSMISSION BLOCK\0"#]),(Abbreviation,[Ptr "ETB\0"#])]
'\x0018' -> [(Control,[Ptr "CANCEL\0"#]),(Abbreviation,[Ptr "CAN\0"#])]
'\x0019' -> [(Control,[Ptr "END OF MEDIUM\0"#]),(Abbreviation,[Ptr "EOM\0"#])]
'\x0019' -> [(Control,[Ptr "END OF MEDIUM\0"#]),(Abbreviation,[Ptr "EOM\0"#,Ptr "EM\0"#])]
'\x001a' -> [(Control,[Ptr "SUBSTITUTE\0"#]),(Abbreviation,[Ptr "SUB\0"#])]
'\x001b' -> [(Control,[Ptr "ESCAPE\0"#]),(Abbreviation,[Ptr "ESC\0"#])]
'\x001c' -> [(Control,[Ptr "INFORMATION SEPARATOR FOUR\0"#,Ptr "FILE SEPARATOR\0"#]),(Abbreviation,[Ptr "FS\0"#])]
Expand Down Expand Up @@ -132,6 +132,7 @@ nameAliasesWithTypes = \case
'\x01a2' -> [(Correction,[Ptr "LATIN CAPITAL LETTER GHA\0"#])]
'\x01a3' -> [(Correction,[Ptr "LATIN SMALL LETTER GHA\0"#])]
'\x034f' -> [(Abbreviation,[Ptr "CGJ\0"#])]
'\x0616' -> [(Correction,[Ptr "ARABIC SMALL HIGH LIGATURE ALEF WITH YEH BARREE\0"#])]
'\x061c' -> [(Abbreviation,[Ptr "ALM\0"#])]
'\x0709' -> [(Correction,[Ptr "SYRIAC SUBLINEAR COLON SKEWED LEFT\0"#])]
'\x0cde' -> [(Correction,[Ptr "KANNADA LETTER LLLA\0"#])]
Expand All @@ -149,6 +150,7 @@ nameAliasesWithTypes = \case
'\x180d' -> [(Abbreviation,[Ptr "FVS3\0"#])]
'\x180e' -> [(Abbreviation,[Ptr "MVS\0"#])]
'\x180f' -> [(Abbreviation,[Ptr "FVS4\0"#])]
'\x1bbd' -> [(Correction,[Ptr "SUNDANESE LETTER ARCHAIC I\0"#])]
'\x200b' -> [(Abbreviation,[Ptr "ZWSP\0"#])]
'\x200c' -> [(Abbreviation,[Ptr "ZWNJ\0"#])]
'\x200d' -> [(Abbreviation,[Ptr "ZWJ\0"#])]
Expand Down
6 changes: 3 additions & 3 deletions unicode-data-names/test/Unicode/Char/General/NamesSpec.hs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ spec = do
name '\x1f41d' `shouldBe` Just "HONEYBEE"
-- Name generated using pattern (example from UCD file)
name '\x2f89f' `shouldBe` Just "CJK COMPATIBILITY IDEOGRAPH-2F89F"
-- Last name defined, as of Unicode 14.0.0
-- Last name defined, as of Unicode 15.0.0
name '\xe01ef' `shouldBe` Just "VARIATION SELECTOR-256"
name maxBound `shouldBe` Nothing
it "correctedName: Test some characters" do
Expand All @@ -48,7 +48,7 @@ spec = do
correctedName '\x1f41d' `shouldBe` Just "HONEYBEE"
-- Name generated using pattern (example from UCD file)
correctedName '\x2f89f' `shouldBe` Just "CJK COMPATIBILITY IDEOGRAPH-2F89F"
-- Last name defined, as of Unicode 14.0.0
-- Last name defined, as of Unicode 15.0.0
correctedName '\xe01ef' `shouldBe` Just "VARIATION SELECTOR-256"
correctedName maxBound `shouldBe` Nothing
it "nameOrAlias: Test some characters" do
Expand All @@ -68,7 +68,7 @@ spec = do
nameOrAlias '\x1f41d' `shouldBe` Just "HONEYBEE"
-- Name generated using pattern (example from UCD file)
nameOrAlias '\x2f89f' `shouldBe` Just "CJK COMPATIBILITY IDEOGRAPH-2F89F"
-- Last name defined, as of Unicode 14.0.0
-- Last name defined, as of Unicode 15.0.0
nameOrAlias '\xe01ef' `shouldBe` Just "VARIATION SELECTOR-256"
nameOrAlias maxBound `shouldBe` Nothing
it "Every defined character has at least a name or an alias" do
Expand Down
6 changes: 3 additions & 3 deletions unicode-data-names/unicode-data-names.cabal
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cabal-version: 2.2
name: unicode-data-names
version: 0.1.0
version: 0.2.0
synopsis: Unicode characters names and aliases
description:
@unicode-data-names@ provides Haskell APIs to access the Unicode
Expand All @@ -9,7 +9,7 @@ description:
.
The Haskell data structures are generated programmatically from the UCD files.
The latest Unicode version supported by this library is
@<https://www.unicode.org/versions/Unicode14.0.0/ 14.0.0>@.
@<https://www.unicode.org/versions/Unicode15.0.0/ 15.0.0>@.
homepage: http://github.com/composewell/unicode-data
bug-reports: https://github.com/composewell/unicode-data/issues
license: Apache-2.0
Expand Down Expand Up @@ -96,7 +96,7 @@ test-suite test
build-depends:
base >= 4.7 && < 4.18
, hspec >= 2.0 && < 2.11
, unicode-data
, unicode-data >= 0.4 && < 0.5
, unicode-data-names
build-tool-depends:
hspec-discover:hspec-discover >= 2.0 && < 2.11
Expand Down
4 changes: 4 additions & 0 deletions unicode-data-scripts/Changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## 0.2.0 (September 2022)

- Update to [Unicode 15.0.0](https://www.unicode.org/versions/Unicode15.0.0/).

## 0.1.0 (September 2022)

Initial release
Expand Down
2 changes: 1 addition & 1 deletion unicode-data-scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ character [scripts](https://www.unicode.org/reports/tr24/) from the
The Haskell data structures are generated programmatically from the
Unicode character database (UCD) files. The latest Unicode version
supported by this library is
[`14.0.0`](https://www.unicode.org/versions/Unicode14.0.0/).
[`15.0.0`](https://www.unicode.org/versions/Unicode15.0.0/).

Please see the
[Haddock documentation](https://hackage.haskell.org/package/unicode-data-scripts)
Expand Down

0 comments on commit 5c5013f

Please sign in to comment.