Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTL Serializer #182

Closed
Michael-F-Bryan opened this issue Jun 23, 2020 · 9 comments
Closed

FTL Serializer #182

Michael-F-Bryan opened this issue Jun 23, 2020 · 9 comments
Labels
crate:fluent-syntax Issues related to fluent-syntax crate enhancement good first issue Want to help? Those are great bugs to start with! help wanted We need help making decisions or writing PRs for this.

Comments

@Michael-F-Bryan
Copy link
Contributor

Michael-F-Bryan commented Jun 23, 2020

At work we're planning to use Google Translate to generate some initial translations for our app. Does a pretty-printer already exist to help turn the translated FluentMessages back into text form?

If not, I'll put together my own and publish it to crates.io.

@zbraniecki zbraniecki added enhancement crate:fluent-syntax Issues related to fluent-syntax crate good first issue Want to help? Those are great bugs to start with! help wanted We need help making decisions or writing PRs for this. labels Jun 23, 2020
@zbraniecki
Copy link
Collaborator

We call it a serializer.

It's not going to be super easy because we don't have a writable AST. See #104.

You'll need to first duplicate the AST in writable form (I wouldn't mind if you offered a macro to generate read-only and RW one, or experimented with Cow if runtime perf is not compromised).

Then you'll need to replicate https://github.com/projectfluent/fluent.js/blob/master/fluent-syntax/src/serializer.ts or https://github.com/projectfluent/python-fluent/blob/master/fluent.syntax/fluent/syntax/serializer.py

@Michael-F-Bryan
Copy link
Contributor Author

It's not going to be super easy because we don't have a writable AST.

Yeah I've already looked into that.

A while back I was looking at the fluent AST and wrote up a string pool which you can use to extend the lifetime of a &'a str. That way you create the string pool before loading the AST and we're able to rewrite part of the tree to use the updated versions instead of the original strings.

You'll need to first duplicate the AST in writable form (I wouldn't mind if you offered a macro to generate read-only and RW one, or experimented with Cow if runtime perf is not compromised).

I might go down the path of swapping everything to Cow<str> in my own fork and seeing how it goes. I doubt it'd be much of a runtime issue because loading translations typically only happens once on startup, and translation files are measured in the 10s or 100s of KB.

There might also be some sort of reference-counted sliceable str that we can use (kinda like bytes::Bytes), That way we don't get the second level of indirection from Cow<str> and only pay the cost of copying the original text to the heap once. Owning the string an AST comes from also means you can avoid needing rental for FluentResource ... Thoughts?

I'm not a fan of the macro route because it'll be a massive pain to implement and maintain.

@zbraniecki
Copy link
Collaborator

I might go down the path of swapping everything to Cow in my own fork and seeing how it goes. I doubt it'd be much of a runtime issue because loading translations typically only happens once on startup, and translation files are measured in the 10s or 100s of KB.

It may depend on the use case. fluent-rs is used on the startup of Firefox and that's quite a high number of strings in a very performance-critical area.

I encourage you to look into Cow, to get your prototype running, but depending on performance we may still end up with a macro to produce both.

Owning the string an AST comes from also means you can avoid needing rental for FluentResource ... Thoughts?

Same as previous. I care about edge case critical performance in ways that most use cases probably don't care as much. For that reason, I'd much rather maintain two ASTs than take any non-insignificant perf/mem hit.

String+&str is likely an optional approach for the runtime scenario, but I'm aware that all tooling will need writable AST, so we have to solve it, one way or the other.

I'm not a fan of the macro route because it'll be a massive pain to implement and maintain.

That's not my experience working with macros :) I hope you're wrong! :)

The bottom line is - go with Cow and lets measure. We have criterion on hand and I can hook up any branch to Firefox and measure startup impact.
If that pans out, great. If not, I'd like to try to get a non PITA macros over string pool or ref counting tbh. I doubt those two would give better perf than Cow while they'd definitely be more hassle than Cow.

@Michael-F-Bryan
Copy link
Contributor Author

It may depend on the use case. fluent-rs is used on the startup of Firefox and that's quite a high number of strings in a very performance-critical area.

Thanks, I forgot that fluent is used by Firefox!

What operations are we wanting to benchmark? Should I be comparing just the existing parsing benchmark, or are there additional criteria that we care about?

@Pike
Copy link
Contributor

Pike commented Jun 23, 2020

Michael, can you add a bit more detail on what you're trying to do? What do you want to send to GT, when, what do you expect to get back, what do you want to do with those results?

I'm curious, 'cause we haven't figured that one out.

@Michael-F-Bryan
Copy link
Contributor Author

What do you want to send to GT, when, what do you expect to get back, what do you want to do with those results?

We currently have English translations for our app and would like to add support for other languages. I'm hoping we can use machine translation to get some initial translations, then tweak them over time as people report better versions.

There are a couple factors which make Google Translate feasible for us:

  • Of our app's 1000 or so translations, the vast majority are simple text (e.g. "Exit" or "You have unsaved changes. Would you like to continue?")
  • We have agents in various countries who understand our product, can speak multiple languages, and have done translations for us in the past

My plan was to:

  1. Separate a resource bundle into "simple" messages (i.e. PatternElement::TextElement()s) and complex messages
  2. Extract all strings from the simple messages and pass them through Google Translate
  3. Once the translations come back, walk the tree and construct translated versions by substituting in the translation
  4. Concatenate the list of translated simple messages and un-translated complex messages
  5. Write them all to a new *.ftl file

I don't want to try anything fancy with the complex messages (i.e. something using variables like "Object { $object_number } must be a closed shape") because it'll probably mess up word order or not understand the context. There are 10-20 of these more complex messages, which is manageable enough that we can ask a human to do the translations and get results back in a reasonable amount of time.

@Michael-F-Bryan
Copy link
Contributor Author

@Pike, my final implementation is available on GitLab, feel free to poke around the code. It works well for our purposes, but I imagine I'll be revisiting it in a couple weeks/months when we need to merge human translations with the machine generated stuff.

@zbraniecki zbraniecki changed the title Pretty Printer FTL Serializer Apr 24, 2021
@RedMser
Copy link

RedMser commented May 1, 2024

I used the latest master branch and the API created in #241 and it works as expected. This issue can likely be closed now?

@alerque
Copy link
Collaborator

alerque commented May 2, 2024

Yes, I believe so. This should be part of the next release (see issue #347 and changelog already in PR #349).

@alerque alerque closed this as completed May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate:fluent-syntax Issues related to fluent-syntax crate enhancement good first issue Want to help? Those are great bugs to start with! help wanted We need help making decisions or writing PRs for this.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants