Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use fully qualified / unified names in svg and png paths #422

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

BrianHung
Copy link

This PR attempts to fix issues mentioned in #405 and #419 by using unicode fully qualified / unified names in paths for svgs and pngs.

To get the fully qualified name for an emoji, I used the emoji.json provided by https://github.com/iamcal/emoji-data and the following node script.

const fs = require('fs');
let emojiList = JSON.parse(fs.readFileSync("./emoji.json"))

// Flatten and append skin variations as a separate emojis to emojiList.
emojiList.filter(e => e.skin_variations)
  .forEach(e => emojiList = emojiList.concat(Object.values(e.skin_variations)))

function unifiedToNative(unified) {
  const codePoints = unified.split('-').map(u => `0x${u}`);
  return String.fromCodePoint.apply(String, codePoints);
}

// Convert unicode to native represetation.
emojiList.forEach(e => e.native = unifiedToNative(e.unified))

// Parse each native representation into a twemoji entity.
const { parse } = require('twemoji-parser');
emojiList.forEach(e => e.entity = parse(e.native)[0])

function getTwemojiUnicode(url) {
  return url.match(/([^\/]+)(?=\.\w+$)/)[0]
}

// Get the twemoji unicode representation from entity url.
emojiList.forEach(e => e.twemojiUnicode = getTwemojiUnicode(e.entity.url))

// Calculate the list of emojis where twemoji and unified or non_qualified differ.
let diff = emojiList.filter(e => e.twemojiUnicode !== e.unified.toLowerCase())
  .filter(d => d.twemojiUnicode !== "1f441") // BUG: see https://github.com/twitter/twemoji/issues/419

diff.forEach(e => { 
  fs.renameSync(`./assets/72x72/${e.twemojiUnicode}.png`, `./assets/72x72/${e.unified.toLowerCase()}.png`); 
  fs.renameSync(`./assets/svg/${e.twemojiUnicode}.svg`, `./assets/svg/${e.unified.toLowerCase()}.svg`); 
})

// To-do: manually handle 1f441.

The only exception to this was the eye emoji mentioned in #405, because both 👁️ and 👁️‍🗨️ resolve to "1f441" with the twemoji-parser. For the eye emoji, I had to manually rename two files.

@CLAassistant
Copy link

CLAassistant commented Jul 5, 2020

CLA assistant check
All committers have signed the CLA.

@jdecked
Copy link
Contributor

jdecked commented Oct 13, 2020

Thanks for giving it a shot! Twemoji and twemoji-parser are intended to be interoperable as part of how we use them at Twitter, so we're working on a more complete solution internally to #405, hopefully by the end of the year. Since this breaks interoperability and would cause a pretty substantial divergence in our internal vs open sourced version of this package, I'm leaving it open for now.

@JoshyPHP
Copy link

In my opinion, it should be the other way around: instead of using a fully qualified sequence, remove all modifiers and variant selectors. For instance, U+FE0F (VS-16) exists to indicate that a character should be rendered as a colourful image rather than monochrome text. Since those files are already images, it's not needed. Same for U+200D (ZWJ) which is used to join several characters as one. It's already a single file so the joiner isn't meaningful.

In addition to being shorter, it's more robust against possible changes in future Unicode versions if some sequences are retooled to make some of those characters optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants