Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lookup all codepoints with general_category (eg) "Lu" #213

Open
jrochkind opened this issue May 4, 2018 · 11 comments
Open

lookup all codepoints with general_category (eg) "Lu" #213

jrochkind opened this issue May 4, 2018 · 11 comments

Comments

@jrochkind
Copy link
Contributor

jrochkind commented May 4, 2018

I see the fancy code for using a trie to lookup codepoints with certain properties, but either it doesn't work for general_category, or I'm not using it right.

TwitterCldr::Shared::CodePoint.code_points_for_property("general_category", "Lu").count
# => 0

TwitterCldr::Shared::CodePoint.properties.property_values_for("general_category")
# => ["Cc", "Cf", "Co", "Cs", "Ll", "Lm", "Lo", "Lt", "Lu", "Mc", "Me", "Mn", "Nd", "Nl", "No", "Pc", "Pd", "Pe", "Pf", "Pi", "Po", "Ps", "Sc", "Sk", "Sm", "So", "Zl", "Zp", "Zs"]

(PS: Thank you for this awesome gem, a real service to the community)

@camertron
Copy link
Collaborator

camertron commented May 4, 2018

By any chance are you using a Mac? Apple's case-insensitive file system strikes again...

Try this:

TwitterCldr::Shared::CodePoint.code_points_for_property('General_Category', 'Lu')

You can also use "gc" in place of "General_Category". This could be a bit more forgiving of capitalization :)

@jrochkind
Copy link
Contributor Author

I am indeed using a Mac. okay then! I wonder if there's any way to have the code catch/correct. maybe not. I guess I should have noticed that the property_values_for came back with certain case, and used the same case -- may be nothing but user error then.

TwitterCldr::Shared::CodePoint.code_points_for_property('General_Category', 'Lu').count
# => 1441

okay!

Without having that working, I was trying to iterate through every single codepoint and checking it's general_category property, just now I interupted the process 2.5 hours in and still not finished, heh.

@jrochkind
Copy link
Contributor Author

jrochkind commented May 4, 2018

Oh wait, it's not just the case for Lu, which I think is user error, but the case for General_Category, which is a lot more of a gotcha.

I wonder if there is some way to have the code recognize known property names (it is a finite and compile-time known set, right?), and case them properly -- or maybe better, just raise if you use one that isn't valid. So you know you're doing something wrong, instead of just getting empty set back.

@jrochkind jrochkind reopened this May 4, 2018
@camertron
Copy link
Collaborator

camertron commented May 4, 2018

Yeah, the gem should be able to normalize the property name so this doesn't happen in the future. We can leave this issue open until I can get to fixing it. Or, if you've got time for a pull request, the place to look is lib/twitter_cldr/shared/properties_database.rb 😉

@jrochkind
Copy link
Contributor Author

jrochkind commented May 4, 2018

I wonder if normalizing is really the right goal, or just raising on unrecognized properties.

I can totally make a property up (in real life: typo, or confusion about what property names are), and get no indication I did something wrong, just get an empty set result.

TwitterCldr::Shared::CodePoint.code_points_for_property('this_is_not_a_thing', 'Lu')

Shouldn't that raise?

@camertron
Copy link
Collaborator

Maybe? Is that what you would expect as a client of the interface?

@jrochkind
Copy link
Contributor Author

Myself, yes.

@camertron
Copy link
Collaborator

Ok, I'm cool with that.

@camertron
Copy link
Collaborator

camertron commented Nov 22, 2019

I totally forgot to do this for v5.0, dang it. I would consider it a breaking change meaning it should be released with a major version bump. Hopefully this isn't too big a deal for clients of the lib. Crossing my fingers I remember to look through old issues before releasing v6.0 whenever that happens to be.

@jrochkind
Copy link
Contributor Author

What makes it a breaking change, that you can get a raise instead of empty return value on invalid input? Doh.

I guess it could be released with an opt-in of some kind, either globally or as an argument to the method call.

Alternately, oh this is probably better, you could provide a different method name with the input normalizing/validating before. validated_code_points_for_property or something. With the old method name getting a deprecation warning?

But I personally haven't been working too much in this area lately.

@camertron
Copy link
Collaborator

Oh yeah, I like the different method name idea. What about code_points_for_property!, following Ruby's idiom for "dangerous" operations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants