Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Searching by name #1

Open
themobydisk opened this issue Jan 7, 2016 · 7 comments
Open

Feature request: Searching by name #1

themobydisk opened this issue Jan 7, 2016 · 7 comments

Comments

@themobydisk
Copy link

I want to make an app that searches unicode characters by name. It looks like the UnicodeInfo class only lets me search by character. Ex: If I want to find a Unicode music note, I want to call something like:

// Returns 2669 - 266C, 1D13B - 1D164, 1F3B5, etc.
IEnumerable<int> matchingCharacters = UnicodeInfo.FindByName("note"); 
@hexawyz
Copy link
Owner

hexawyz commented Jan 9, 2016

Hello,

What you request is a full text index on name data. Providing a full text search algorithm is well outside the scope of this library, but there are tools which can provide this feature, like sqlite.
If you wish, you can already build such an index yourself by requesting the name for every possible code point (0x0000 to 0x10FFFF).

I may implement a few helper methods to help with scenarios like this one, allowing to enumerate things like valid code points or known names, but that would be it.

@themobydisk
Copy link
Author

Okay, fair enough. Probably not a good fit for this project. Thanks for the reply!

@KirillOsenkov
Copy link

Actually you'll be surprised how simple it is, even without a full text search engine. Here's the sample code that works for me:

        private Dictionary<int, string> descriptions = new Dictionary<int, string>();

        private void BuildUnicodeList()
        {
            var blocks = UnicodeInfo.GetBlocks();

            foreach (var block in blocks)
            {
                foreach (var codepoint in block.CodePointRange)
                {
                    if (char.IsSurrogate((char)codepoint))
                    {
                        continue;
                    }

                    var charInfo = UnicodeInfo.GetCharInfo(codepoint);
                    var displayText = charInfo.Name;
                    if (displayText != null)
                    {
                        descriptions[codepoint] = displayText;
                    }
                }
            }
        }

...
            var sb = new StringBuilder();
            int hitcount = 0;
            foreach (var d in descriptions)
            {
                if (hitcount > 20)
                {
                    return sb.ToString();
                }

                if (d.Value.IndexOf(input, StringComparison.OrdinalIgnoreCase) > -1)
                {
                    sb.AppendLine(d.Key);
                    hitcount++;
                }
            }

            if (sb.Length > 0)
            {
                return sb.ToString();
            }

@KirillOsenkov
Copy link

The performance on my machine is about 70-80 ms per lookup, so of course having an in-memory trie or other index can significantly speed it up, however if you're OK with these numbers then it works great and is super simple.

@hexawyz
Copy link
Owner

hexawyz commented Sep 6, 2017

Nice solution with so little code. 👍
It would likely be enough for most scenarios.

I did write some code that you can use to create an index of Unicode characters, but it's not production ready.
(@themobydisk, I apologize to you. I had totally forgotten about that issue… :( )

You can try it and/or benchmark it if you want:
https://gist.github.com/GoldenCrystal/0071772cd111ac4b45b21470f1ac101f
It needs a bit of cleaning, but as far as I remember, the code was working.
Once cleaned a bit, I will include it as an example, instead of letting it rot on my hard drive…

@KirillOsenkov
Copy link

BTW I'm using my naive lookup algorithm here:
http://quickinfo.io/?char%20cherries

You can try searching for various emoji names, paste emoji to view their info, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants