Skip to content

Fluent vs gettext

Zibi Braniecki edited this page Apr 17, 2019 · 3 revisions

Gettext is a localization system deeply rooted in the GNU project and its design choices. Fluent Project is looking at gettext as a good example of a complete, low level, platform independent ecosystem of libraries and tools for managing full release cycle workflow with human-readable file format. At the same time Fluent paradigms lead us to different design decisions on multiple core localization-specific choices which lead to a vastly different APIs and lifecycles.

In other words, we believe that gettext is a very good project, but we disagree with how it approaches localization.

Below, we listed significant differences between gettext and Fluent:

gettext Fluent
Message identifier source string developer provided
Argument bindings positional key-based
Translation invalidation fuzzy matched id-change
Data storage human-readable (.po) and compiled (.mo) human readable (.ftl)
External arguments none rich support
Plural support special-cased part of generic variant-selection syntax
Plural support span developer decision, spanning across translations localizer decision, per locale
Designed for C family of languages Web, modern client-side languages
Message references by developer by localizer
Message templates required (.pot) none
Localizer comments none fully supported
Error recovery fragile resilient, strong recovery logic
Compound Messages none value + attributes per message
BiDi none bidirectional isolation
Intl Formatters none explicit and implicit

Social contract

The most important difference between gettext and Fluent is the choice of a message identifier. Gettext approaches the problem by taking the source string (often English). While the choice seem simple, it has long standing consequences in form of two limitations that this choice imposes.

First of all, it means that any change to the source string invalidate all translations of the string. This severely increases the burden on the developers to never alter messages in the source language as it results in all translations having to be updated.

Secondly, it makes it harder to introduce multiple messages with the same source string which should be translated differently. For example a button with a message "Open", and a label "Open" may have different translations, since one is a command, while the other is a description. Gettext offers an optional context string - msgctxt - to disambiguate between two or more strings with the same source translation. This approach puts the burden on developers to recognize such scenario, going against the separation of concerns principle. Fluent recommends against reuse of translation messages because of that. Disconnecting source translation from other translations is also important for our ability to introduce compound messages (which hold multiple strings for a single translation unit bind to a single UI widget) and enable message referencing based on the message identifier.

Fluent establishes a social contract between the developer and localizers. The developer introduces a unique identifier and provides a set of variables such as number of unread emails or the name of the user, and localizers are using Fluent syntax features to construct the best possible translation for that identifier.

The developer does not, and should not, be bothered with details of how such translations are constructed. All they know is that a result of a query for the identifier will be a single, opaque, string that contains the right translation to be placed in the UI.

Message variants

Gettext supports a limited set of internationalization features; notably - plural rules. But gettext support for plural rules is a special-cased addition on top of the original gettext syntax, and as such feels out of place and doesn't scale beyond plural rules. Fluent supports a generic concept of string variants that can be used in combination with a selector. Commonly, plural rule will be such a selector, but depending on grammatical features of a language there may be others as well, such as genders, declension or even environmental values such as time of the day or operating system, allowing localizers to easily design messages with multiple variants as they wish.

External arguments

Gettext doesn't support external arguments, which means that string formatting doesn't include any parameter formatting. When needed, Gettext recommends returning a string that can be then passed to printf or to run String.prototype.replace on the result.

Fluent support for external arguments is deeply rooted in the syntax. External arguments are not only interpolated, but can also be used to design message variants or be passed to builtin functions. That allows fluent localizers to construct much more fine tuned localizations. On top of that, Fluent places FSI/PDI markers around placebles to protect directionality isolation in bidirectional text and strongly discourages any manipulations on result strings reducing the burden on the developer.

Isolation of concerns

On top of that, the way gettext handles plural rules requires the developer of the system to select if the message will be a multi-variant message, or a single string. Fluent believes that a developer is not in the best position to make such decisions. In many cases, a message that does not require plural rules variants in English, may require them in other languages.

More generically, Fluent makes an assumption that developers should not be required to understand the linguistic requirements of all languages their software is translated to and that each language may want to use different features to construct the translation.

In result Fluent keeps each translation separate, without "leaking" the requirements of one language onto others, and keeps all translations opaque from the developer, who doesn't need to be bothered with deciding what features localizers may need for a given string.

Translation invalidation

Generally speaking, in release cycle we recognize three types of message invalidation:

  • Minor: doesn't affect translations (e.g. spelling, or punctuation)
  • Medium: does affect how the message is constructed, but does not invalidate the content of the message (e.g. "Show All Bookmarks" -> "Show Bookmarks Manager")
  • Major: changes the meaning of the message (e.g. "Click to save" -> "Click to open")

Due to design decisions, Gettext clusters all three of the levels into a single state they call "fuzzy". Any change to the source string, no matter how minor or major, results in all translations being invalidated.

Fluent's use of unique identifiers allows for at least two of those three states to remain separated - minor changes may be applied and if the developer does not alter the unique identifier (a.k.a does not change the social contract), all translations remain valid. On the other hand if the developer changes the ID, all translations become invalid and must be updated.

While we believe this design decision to be better for most product release cycles, we recognize that it does not address the "medium" level of changes forcing the developer to chose between altering and not altering the ID thus turning it into a minor or major change respectively.

We're investigating an idea of message versioning, which would allow the developer to mark a translation as updated, without invalidating it completely. Such a state would keep the translation valid with an assumption tha the older version of the translation is better than an untranslated string, but would allow all tools to notify the localizer that their translation is outdated.

Data formats

Gettext uses three file formats - .po, .pot and .mo - to operate on the life cycle of the product. This decision impact how gettext fits into release cycle introducing required steps such as compilation and extraction of messages. Fluent uses a single file format - .ftl - which makes it easier to fit into different workflow models and removes those extra steps which may cause data misalignment.

Unicode support

Gettext can be encoded as UTF-8, but that's basically how far its support for Unicode standard goes. It uses custom plural rules dataset, does not handle any date, time or number formatting, and does not help with bidirectional messages. Fluent leverages CLDR, ICU and ECMA402 extensively benefiting from well designed and standardized databases and algorithms for a lot of internationalization features nicely blending localization and internationalization together.

Summary

We believe that both Fluent API and syntax represent a substantial improvement over gettext and recommend it over gettext for all multilingual software.