Skip to content

How to I find all occurences of emojis in my codebase? #1623

Answered by BurntSushi
ssbarnea asked this question in General
Discussion options

You must be logged in to vote

I think you might be a bit confused about what \p{Emoji} actually is. \p{Emoji} is a single Unicode property that matches exactly one codepoint. But Emoji can of course be multiple codepoints, and there is no fixed length as to how long they can be. Your regex, for example, only matches emoji that are at most two codepoints. This leaves out any emoji that are encoded with more than two codepoints. You can find lots of examples in Unicode's full list of emoji: https://unicode.org/emoji/charts/full-emoji-list.html

One such example is 0️⃣. If that doesn't render correctly, then it looks like this:

This particular emoji is made up of three codepoints: U+0030 U+FE0F U+20E3. Notice that U+0030

Replies: 4 comments 7 replies

Comment options

You must be logged in to vote
2 replies
@ssbarnea
Comment options

@BurntSushi
Comment options

Comment options

You must be logged in to vote
4 replies
@ssbarnea
Comment options

@BurntSushi
Comment options

@BurntSushi
Comment options

@BurntSushi
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by BurntSushi
Comment options

You must be logged in to vote
1 reply
@BurntSushi
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants