Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: String interpolation #5085

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

FEAT: String interpolation #5085

wants to merge 4 commits into from

Conversation

hiiamboris
Copy link
Collaborator

@hiiamboris hiiamboris commented Feb 21, 2022

This PR is technically a part of our Format effort, but:

  • it's not tied to anything exported by Format module
  • it's widely useful outside of Format scope and should not require one to include the latter

Intro

String interpolation is used in most languages as a readable way of inserting values into a string.

Though the first question everyone of us may have is "what, another rejoin?", if rejoin was good enough for me or Gregg, this design would never have been born.

Here are just a few common examples written using rejoin function:

avcmd: rejoin [{"} player {" "} vfile {" --audio-file "} afile {"}]
print rejoin ["Download OK: '" remote "' [" info/Content-Type ", " info/Content-Length " bytes]"]
do make error! rejoin [token1 " cannot follow " token2 " in the " name " part of " mold .mask]

Try to look at these expressions and visualize how the resulting string will look like, and if I've got all spaces and quotes right.
Not a human task, eh? Even telling the strings from code requires syntax highlighting or quote counting or guessing.

These are equivalent expressions written via string interpolation:
avcmd: #rejoin {"(player)" "(vfile)" --audio-file "(afile)"}
#print "Download OK: '(remote)' [(info/Content-Type), (info/Content-Length) bytes]"
#error "(token1) cannot follow (token2) in the (name) part of (mold .mask)"

Pretty clear, right?

Another, even more solid reason for this design is translation. Whatever way we choose to translate our software, we have to give whole messages to the translator.

For more background info you can read related Format discussion, preliminary design document and the original sad-emoji dialect design, but I'll try to explain the main points here.

List of so far registered use cases can be found here but those are only mine. Maybe Gregg can find his own somewhere. Below that is a description of design implemented in this PR - a result of my experience with interpolation.

Why a macro?

Since input is a string, we don't get word!s from it. Without words, nothing is bound during context and function creation. Then the moment we extract words from the string, we can only bind them to the global scope, or to an explicitly provided list of contexts. Gabriele expressed the concern very clearly

Macro helps us avoid this trap. It gets expanded and produces words before those words are bound, resulting in code that just works.

We also get the benefit of speed: if program is compiled, all macros are expanded at compile-time.

Function approach is still provided for advanced users as it has it's merit: macro cannot work on message that is generated at run time (it can, but it has no knowledge of contexts where to bind words).

Syntax

Every paren inside the string becomes code:
#rejoin "(x) + (y) = (x + y)" -> rejoin ["" (x) " + " (y) " = " (x + y)]
Why parenthesis? Because as everywhere else in Red, it visually hints at evaluation.

To treat a paren literally, it is escaped inside by a backslash:
#rejoin "total (n) hits (\ratio: (100% * n / total))" -> rejoin ["total " (n) " hits (ratio: " (100% * n / total) ")"]
This relies on backslash being invalid in Red, so if there's a plan to leverage \ then another escape sigil must be found.

Original string type is preserved (leading "" also serves this purpose):
#rejoin %"file-(n).(ext)" -> rejoin [%"file-" (n) "." (ext)]
#rejoin <x (y)=(z)> -> rejoin [<x > (y) "=" (z)]

Extensions

  1. Format module will also include a #format macro, backward-compatible with #rejoin but with two more features:

    • it will convert (expression as mask) into (format (expression) mask)
    • it will support per-message locale specification:
      #format/in "you've spent (spent as {$0.00}) of (spent + left as {$0.00})" locale
      will be preprocessed into
      #rejoin "you've spent (format/in (spent) {$0.00} locale) of (format/in (spent + left) {$0.00} locale)"
      This will require Issues in paths are not lexed #5009 to be solved.
  2. It is expected that users will write their own macros for common cases based on #rejoin, e.g.:

    #macro [#log any-string!] func [[manual] s e] [insert remove s [log #rejoin] s]
    
    #log "(now) test message: (1) + (2) = (1 + 2)"		;) `#log` now is synonymous with `log #rejoin`
    #log "(now) test message: (2) * (3) = (2 * 3)"
    
  3. Another extension idea concerns #error macro. Right now the only fully custom error we have is the User Error. I propose adding another fully custom error into every error category, so we could produce Script, Math, other errors with custom messages. #error should then default to Script error, with #error/math, #error/syntax, #error/user etc forms changing the error type.

  4. For cases when template is only known at runtime (e.g. report generation with user-defined template), rejoin function is extended to accept any-string! argument of exactly the same syntax as the macro:

    • rejoin <img src=(url) size=(as-pair sizex sizey)> -> <img src=http://../image.png size=100x100>
    • to overcome the binding issue, rejoin/with accepts one or more contexts to bind produced expressions before reducing them
    • as a feature, rejoin/trap allows one to replace evaluation errors with some text (per original Gregg's design)
    • /with and /trap only apply to string case and have no effect on block argument
  5. WISH: URLs to support parens () for string interpolation REP#112 should be considered to bring URL support to this design.

P.S. I need some help reducing the docstrings :)

@hiiamboris hiiamboris marked this pull request as draft February 21, 2022 19:13
@hiiamboris
Copy link
Collaborator Author

hiiamboris commented Jan 25, 2024

How it could in theory work without a macro:

  1. Special lexer string-like syntax that gets transcoded as a block, e.g. `(x) + (y) = (x + y)` -> ["" (x) " + " (y) " = " (x + y)]

    Seems to me much more complex solution than a macro.

  2. Introduce lexical scoping to the language, and let strings infer it so they can be expanded properly.

    Extremely complex solution with uncertain outcomes. But may resolve the loops leaking words issue?

  3. Let string expansion routine access the stack and automatically bind the result to contexts it finds in the stack. It will only bind to entered functions and make object contexts, ignoring arguments pushed to the stack.

    Seems simple enough. But contradicts our definitional scoping model where words are bound at entity creation time. Resulting block will be bound not where the string appears, but where it's expanded. Which arguably for strings may be a desired outcome, e.g. we define a set of template strings somewhere, then fetch them by words and expand in different places automatically binding to different contexts. But still an inconsistency: e.g. if a function defined in an object uses such expansion, it will have access to function words but not to object words, because we most likely have left the make object scope by that time.

    Also this option will need a function flag, akin to [no-trace], e.g. [no-bind] to tell the expand function to skip itself. E.g. a log wrapper that calls expand would want to hide its own context from being bound to. A possible wrapper around log wrapper as well, and so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant