Replies: 3 comments 2 replies
-
I don't know of such a plan, but our plans depend on prospective clients; so if people want something like that we could potentially add it (or accept patches for it). It depends, though, I'm not familiar enough with this API to be sure. |
Beta Was this translation helpful? Give feedback.
-
I know XPath is not in any ways the real specification, but it's the particular perspective I'm coming from. The XPath standard has this section on substring matching in light of collation: https://www.w3.org/TR/xpath-functions-31/#substring.functions It talks about "collation units", which are, it says, the same as unicode "collation elements", with a reference to "Unicode Technical Standard #10: Unicode Collation Algorithm". So it looks like those algorithms could be supported if we had access to "collation elements". XPath implies that not all collations support this. I looked through the icu4x source code and indeed the collator does seem to implement this concept, but it looks like neither As a next step I should examine the older ICU implementations to get a better idea of what such an API should look like. |
Beta Was this translation helpful? Give feedback.
-
We've had discussions about search collations in the past, such as #3174 (comment) Basically, we need a client with a clear and compelling use case who ideally can make some contributions, and then the team can provide mentorship to help land this type of feature. |
Beta Was this translation helpful? Give feedback.
-
I looked through the project to figure out how to do collator-aware string search, such as substring matching. I couldn't find any API in icu4x that helps me implement this. Is this correct or did I miss something?
I did some digging. First this document suggests that the string search algorithm implemented by the other ICU implementations has shifted to a less performant but more accurate linear search:
https://unicode-org.github.io/icu/userguide/collation/string-search.html#performance-and-other-implications
I'm also trying to imagine how difficult it would be to implement linear collation-aware search myself. Since collations can ignore characters, a naive implementation that uses the collator compare ordering
is_eq()
for a range of characters starting at a position won't work, I think.But before I delve into this further, is a (linear or not) search implementation planned for ICU4X?
Beta Was this translation helpful? Give feedback.
All reactions