Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Intl] [Emoji] Move emoji data in a new component #53096

Merged
merged 1 commit into from
Feb 3, 2024

Conversation

smnandre
Copy link
Contributor

@smnandre smnandre commented Dec 16, 2023

Q A
Branch? 7.1
Bug fix? no
New feature? yes
Deprecations? yes
Issues
License MIT

This PR move all the emoji data & code from the Intl component into its own new Emoji component.

Objectives/reasons:

  • reduce the size of a standard "--webapp" install
  • allow usage of intl (required if your app uses those validators/form types: BIC, country, currency, language, locale, timezone) without downloading all the emoji data
  • allow usage of Emoji without downloading all the Intl data

Thanks to all the reviewers for the feedbacks, opinions, advices ❤️


--- Original (obselete) post below ---

This PR move all the emoji data & code from the Intl component into its own new IntlEmoji component.

... and hopefully open a debate aboute the future of the Intl component, its role, and the way we handle "data" in the framework and the repositories

Important

🎙️ DISCLAIMER: This PR contains both metrics and opinions. The metrics were collected this morning in a neutral and transparent manner, ensuring they can be reproduced by anyone. However, the opinions I present here are just that – opinions, and should not be interpreted as objective truths or claims of fact.

Update: details/summary added to improve the readability

What is symfony/intl ?

Repository: https://github.com/symfony/intl
Documentation: https://symfony.com/doc/current/components/intl.html
Unicode: https://home.unicode.org/technical-quick-start-guide/

Responsabilities

Currently, it seems to me this component:

  • provides a polifill-ish layer for the intl PHP extension
  • provides exhaustive formatting + locale data for countries, currencies, date, ...
  • provides methods to check ISO codes or identifiers
  • provides a dictionnary emoji -> description for every possible combinaison

Opiniated remarks

Two comments highlight the "blurred lines" I believe this component navigates:

1) Access or data ?

This component provides access to the localization data of the ICU library.

Maybe my english is in fault, but it seems to me it does not provides access to the data... it provides the data.

2) Unicode = CLDR + ICU + UTC

This component provides access to the localization data of the ICU library.

CLDR (where the emoji data comes from) is not in the ICU library. I'm quibbling over details here, i know. But i think that illustrates the volatile "scope" and "responsabilities" of this component.

So we come close to the problem...

Symfony/Intl is massive

The data included in the Intl component is massive (especially the emoji descriptions), and will grow more every semester.

I looked at the following cases

Versions:

  • 6.0 6.1 6.2 6.3 6.4 7.0
  • Emoji data were added in 6.2

Some metrics...

Size (zip) Size (unzipped) Files PHP Files
symfony/symfony 12.91 58.6 6006 4729
symfony/intl 7.5 43.2 1517 1487
% 58.1% 73.8% 25.3% 31.5%

So symfony/intl accounts for 30% of the files in the monorepo ... and nearly 75% of its total disk size.

...over time

Size (in MB) of the sources

6.0 6.1 6.2 6.3 6.4 7.0
intl 15.1 15.1 41.8 41.9 43.2 43.2
symfony 28.8 29.1 56.5 58.1 59.8 58.6

It was already big in previous versions, but since the emoji data integration, it's off charts.

The symfony/intl alone is twice as big as: all the other components, all the bridges and all the bundles. Combined.

And it's not over. At all.

Why it'll grow more

The ICU components used in the component are well-defined and constrained by 'real-world' factors, so we can expect minor changes regarding countries, formatting data, etc. It's unlikely, for instance, that 200 new countries will suddenly emerge in 2024.

However, emojis may present a major challenge in the near future. New ones are added with every CLDR release. Except for a significant drop (like the upcoming 2000 hieroglyphs), this should be a gradual increase.

What bothers me more is the 'combinatorial nature' of these descriptions. We generate a line of text for every combination. And that's why this component is so large. But it's just the beginning of what could be exponential growth.

As of today, the 'hand emoji' has variations for skin color (I'm not certain, but let's say there are 6 possible colors), and emojis with multiple people often vary by gender ('boy and two girls').

In the upcoming release, a new variable is the concept of 'left-handed' versus 'right-handed'. So, we'll create a new line for every existing emoji with a visible hand. But we'll need way more than just a new line, because of every emoji where two hands are visible. I don't remember if it's already implemented, but there was discussion about including the same thing for the age of a person, or some hairstyles.

So, the symfony/intl component could very soon be 50GB, and a short while later 10^80TB. But there's no way it reduces in size... or even slows its growth.

And.... where is the problem ?

I see negative effect on three very different layers.

Developper Experience

Whether these values are low or not in absolute terms (and I have no doubt that everyone will have their own opinion on this)... the reality is that users are downloading a component that is twice as heavy as all the others combined... and this inevitably affects installation times, bandwidth, update times, static analysis, IDE indexing, etc. A prime example is Docker on macOS, which was a real pain until recently with Orbstack, and the performance nightmare was directly related to the number of files mounted in a volume.

Contributor experience

I've lost count of how many times I've seen a contributor propose a feature only to be told: it's userland. (Full disclosure, I understand and share this point of view). But it can be frustrating to see closed doors for a few classes, while at the same time Symfony contains hundreds of lines like 'young woman with dark hair and kid'

Real world consequences: Ecological & financial costs

I have no desire to open a debate (on either of those topics). But again, these small things have real-world consequences. We are talking about Symfony, so the impact is enormous, even on small matters

What is the real impact ?

Downloads data, as provided by packagist (collected today)

Package install total install (last 30 days)
symfony/intl 111,000,000 2,800,000
symfony/http-foundation 528,000,000 11,250,000
symfony/console 678,000,000 12,900,000
symfony/console 678,000,000 12,900,000
symfony/translation 508,000,000 10,400,000

Let's agree on: "it's not without an impact".

Why is it used ?

For its quality

Please don't misinterpret my message. I'm not criticizing the value of the component or questioning its qualities. Besides, my opinion wouldn't have any value for that matter anyway. And i'm absolutely convinced a lot of people decide to install this component knowing what they do.

For another reason

But there are also people who install... Symfony. The recommended installation procedure, as outlined in the documentation on the website, is to install the web application skeleton, which requires symfony/intl.

To revisit the argument from earlier, I'm really not sure if anyone realizes after installation why its vendors directory is 80MB and what it's used for (young woman with...).

I'm unsure why symfony/intl is included by default in a new project, while other components are not. As a developer, I would appreciate the ability to install a small, lightweight application or to have more packages for the same amount of overhead :)

For another reason (bis)

"There is a third reason, and once again, I'm not fully understanding the situation (and may not have all the backstory required for it).

The Country validator requires symfony/intl to validate a given string as a valid ISO alpha country code. To do this, it tries to retrieve the list of country names (indexed by code) from the locale data. Consequently, it's not possible to use BIC, Country, Currency, and probably others without symfony/intl.

So, if a developer installs symfony/validator and then wants to validate a BIC, they cannot do so without downloading 80MB of locale-specific data.

Wouldn't it be simpler to have a couple of ISO classes/methods in the Validator component? Or perhaps create a small component just for that purpose? Because having to parse giant files just to check if "FR" is a valid ISO country code seems quite inefficient to me.

Well: Suggestions

So, personal conclusion and some suggestions..

TODO list

Sooner

  • move "emoji" out of Symfony\Intl
  • remove "intl" from symfony/webapp
  • fix the validator requirements

Later

  • move "intl" out of the monorepo
  • create a distinct class/component to handle iso lists

Discussions

  • Handle the BC layer (require symfony/intl-emoji in symfony/intl ?)
  • Find a way to handle the exponential growth of those files

--

Open to any feedback :)

@carsonbot carsonbot added this to the 7.1 milestone Dec 16, 2023
@carsonbot carsonbot changed the title [Intl][Emoji] Move emoji data in a new component [Intl] [Emoji] Move emoji data in a new component Dec 16, 2023
@OskarStark
Copy link
Contributor

Related

@smnandre
Copy link
Contributor Author

Related

* [Splitt intl packages in smaller packages #49943](https://github.com/symfony/symfony/issues/49943)

I have no memory of this PR, but there is a thumbs-up from me, so... 😮‍💨

@smnandre
Copy link
Contributor Author

Fabbot fails but i cannot fix it (as it wants to edit emoji files to add licence, fix typos, etc.)

@smnandre
Copy link
Contributor Author

smnandre commented Dec 18, 2023

In the end, i renamed IntlEmoji just Emoji, as nothing from Emoji uses/interact with the Intl component.

Moreover, Emoji requires native php Intl extension.

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Dec 19, 2023

My understanding of this topic:

  • the intl data is NOT in the base skeleton. It is in webapp-pack.
  • even if the data is not compressed in the source, it's always compressed on the network, so that uncompressed size doesn't matter when considering transfer time/overhead
  • a fresh compressed webapp is ~56MB
  • when we remove all intl data, the compressed size becomes ~48MB (-13%)
  • when we remove only the emoji data, the compressed size becomes ~52MB (-7%)

About symfony/intl, I think it does what is should: make the intl data easily available.
About splitting emoji data, it's worth 7% of the webapp skeleton.
I've no strong objection to make the split (but no strong motivation either, so thanks for pushing the topic :) )

Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach works, thanks for proposing.
Here are some comments.

.github/workflows/emoji-data-tests.yml Outdated Show resolved Hide resolved
composer.json Outdated Show resolved Hide resolved
src/Symfony/Component/Emoji/CHANGELOG.md Outdated Show resolved Hide resolved
src/Symfony/Component/Emoji/EmojiTransliterator.php Outdated Show resolved Hide resolved
src/Symfony/Component/Emoji/README.md Outdated Show resolved Hide resolved
src/Symfony/Component/Emoji/README.md Outdated Show resolved Hide resolved
src/Symfony/Component/Emoji/Resources/bin/build.php Outdated Show resolved Hide resolved
src/Symfony/Component/Intl/CHANGELOG.md Outdated Show resolved Hide resolved
src/Symfony/Component/Intl/composer.json Outdated Show resolved Hide resolved
@smnandre
Copy link
Contributor Author

when we remove all intl data, the compressed size becomes ~48MB (-13%)

There is something wrong with your computation... an uncompressed webapp without symfony/intl is only 43MB

webapp webapp - Intl
Capture d’écran 2023-12-19 à 12 46 32 Capture d’écran 2023-12-19 à 12 47 36

Thank you for the feedbacks, i'll fix/adapt things accordingly this afternoon 😄

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Dec 19, 2023

symfony new test-webapp --webapp
du -sh test-webapp
#displays 154MB (uncompressed - 141MB just for vendor)
rm test-webapp/vendor/symfony/intl/ -rf
du -sh test-webapp
#displays 109MB (still uncompressed - 96MB just for vendor)

@stof
Copy link
Member

stof commented Dec 19, 2023

Currently, it seems to me this component:

* provides a polifill-ish layer for the intl PHP extension

this is not true. The polyfills are in symfony/polyfill-intl-icu

@smnandre
Copy link
Contributor Author

du -sh test-webapp

That is interesting. Really. Because i have clearly not the same numbers. And i wonder why.

du -sh test-webapp
105M	test-webapp
rm test-webapp/vendor/symfony/intl/ -rf
rm: test-webapp/vendor/symfony/intl/: is a directory
rm: -rf: No such file or directory
rm -rf test-webapp/vendor/symfony/intl/
du -sh test-webapp
 60M	test-webapp

(yes i let you enjoy my -rf fail)

@smnandre
Copy link
Contributor Author

I'm not sure how i can fix the remaining test... if someone can help me / point me in the good direction that'd be nice :)

@OskarStark OskarStark changed the title [Intl] [Emoji] Move emoji data in a new component [Intl][Emoji] Move emoji data in a new component Dec 20, 2023
@javiereguiluz
Copy link
Member

javiereguiluz commented Dec 27, 2023

@smnandre just a random comment related to this.

In Symfony we have a config option called enabled_locales. We could delete (automatically or letting developers doing it explicitly via a command provided by us) all the data of all languages not included in that config option. We'd delete 95% of Intl data (emojis transliterators, names of coutnries/languages/currencies in other languages, etc.) without affecting to the application.

@smnandre
Copy link
Contributor Author

@smnandre just a random comment related to this.

In Symfony we have a config option called enabled_locales. We could delete (automatically or letting developers doing it explicitly via a command provided by us) all the data of all languages not included in that config option. We'd delete 95% of Intl data without affecting to the application.

I suppose that could solve the "final deployable size" question... even more if combined with the "compress script"

But... that would still download every emojis variants x every languages everytime you create a webapp, run composer install, or need the CountryValidator.

I won't push too hard on this, even less if i'm far from beeing in the majority, thinking that this data should not be in the main repo. I guess my pov on this is more "philosophical" or "opiniated" than i imagined... :)

As i said while investigating this issue, i have some alternative ideas to reduce/optimise those files, it can be a step in the good* direction ! (*according to me :) )

@nicolas-grekas
Copy link
Member

Let's continue with this approach. 7% compressed diff also means less time to uncompress the app so that can lead to a slightly better DX.
rebase needed though :)

@smnandre
Copy link
Contributor Author

Let's continue with this approach. 7% compressed diff also means less time to uncompress the app so that can lead to a slightly better DX. rebase needed though :)

Ok will do tomorrow. I can reduce 25% of uncompressed data files... by removing all spaces (only 3/4% when compressed... as expected)

Do we need to php-cs-fix those files ?

@nicolas-grekas
Copy link
Member

I can reduce 25% of uncompressed data files... by removing all spaces (only 3/4% when compressed... as expected)

not worth it to me

Do we need to php-cs-fix those files ?

nope, fabbot can be ignored on those

@smnandre
Copy link
Contributor Author

smnandre commented Jan 1, 2024

Ok !

I'll fix what i can/figure out.. and probably will call for a bit of help :)

Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to solve the deps=high failure by stating that intl conflicts with string < 7.1

.github/workflows/intl-data-tests.yml Outdated Show resolved Hide resolved
src/Symfony/Component/Intl/CHANGELOG.md Outdated Show resolved Hide resolved
src/Symfony/Component/Intl/composer.json Outdated Show resolved Hide resolved
Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one minor thing 🚀

@carsonbot carsonbot changed the title [Intl][Emoji] Move emoji data in a new component [Intl] [Emoji] Move emoji data in a new component Jan 2, 2024
@nicolas-grekas
Copy link
Member

Merge branch 'symfony:7.1' into feat/component-intl-emoji

💥
We don't merge PRs with merge commits :)

@smnandre
Copy link
Contributor Author

smnandre commented Jan 2, 2024

If possible, could you configure fabbot to ignore src/Symfony/Component/Emoji/Resources/data/

(cf https://fabbot.io/report/symfony/symfony/53096/25c5b345a64dbce84dd180c8e8c779af5e2d8fe0 )

Transfert emoji data from Intl to emoji component

Update main composer.json

Fix phpunit config

Update composer and README descriptions

Fix LICENCE date

Update src/Symfony/Component/Intl/CHANGELOG.md

Co-authored-by: Oskar Stark <oskarstark@googlemail.com>

Fix Changelog

I feel that cool resolve some of the recent issues linked to the Profiler.

Rename component Emoji + unlink from Intl

(no shared resp/code)

Isolated commit to move data

Update Github worflows

Use Emoji in String component

Update src/Symfony/Component/Emoji/CHANGELOG.md

Co-authored-by: Nicolas Grekas <nicolas.grekas@gmail.com>

Update src/Symfony/Component/Emoji/README.md

Co-authored-by: Nicolas Grekas <nicolas.grekas@gmail.com>

Present the compress command in both README's

Update src/Symfony/Component/Intl/CHANGELOG.md

Co-authored-by: Nicolas Grekas <nicolas.grekas@gmail.com>

Fix main composer.json

Revert symfony/intl requires symfony/emoji

Remove EmojiTransliteratorTrait

Move emoji data

Add  "symfony/deprecation-contracts" to Intl

Revert data test split

Add symfony/emoji to String (dev)

Fix String Test namespace

Fix .gitattributes hides "bin/compress" script

Please Psalm ?

Compute quickCheck once

Update LICENCE

Add Intl conflict with string < 7.1

Fix Int changelog

Fix composer.json CS

Throw exception in Intl BC layer when symfony/emoji is not installed

Test Intl & Emoji in the same job

Remove useless check

Remove useless check (without breaking things)
@smnandre
Copy link
Contributor Author

smnandre commented Jan 2, 2024

(squashed / rebased on 7.1)

@smnandre
Copy link
Contributor Author

Is there something else i can do here ? :)

@stof
Copy link
Member

stof commented Feb 2, 2024

To make the description fairer, installing symfony/validator does not force installing symfony/intl (we don't have a composer requirement on it. We have a runtime check when using the intl-based constraint)

@smnandre
Copy link
Contributor Author

smnandre commented Feb 3, 2024

@stof I updated the description

allow usage of intl (required if your app uses those validators/form types: BIC, country, currency, language, locale, timezone) without downloading all the emoji data

Feel free to rephrase or edit directly if you have another idea in mind!

@fabpot
Copy link
Member

fabpot commented Feb 3, 2024

Thank you @smnandre.

@fabpot fabpot merged commit bf3b54b into symfony:7.1 Feb 3, 2024
7 of 10 checks passed
"symfony/var-exporter": "^6.4|^7.0"
},
"conflict": {
"symfony/string": "<7.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the conflict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better link to this message than trying to rephrase it (i'd fail)
#53096 (review)

@@ -24,8 +24,9 @@
},
"require-dev": {
"symfony/error-handler": "^6.4|^7.0",
"symfony/intl": "^6.4|^7.0",
"symfony/emoji": "^7.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^7.1 (fixed in f0ddd8d)

@alexander-schranz
Copy link
Contributor

Here is a small bc break if we do not require symfony/emoji component inside symfony/string component it breaks AsciiSlugger withEmoji support when update from Symfony 7.0 -> 7.1 and application throws then an exception which did work on Symfony 7.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants