Skip to content
This repository has been archived by the owner on Oct 22, 2023. It is now read-only.

RFC: new expression syntax #990

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

charleskorn
Copy link
Collaborator

@charleskorn charleskorn commented Aug 29, 2021

This is a request for comment and feedback on the idea in proposal.md.

The RFC process is intended to allow anyone with an interest in Batect to get involved and help shape it to suit their needs, so please don't be shy! Any comments, suggestions, questions or constructive criticism are more than welcome.

Please comment directly on the file itself or add general comments in this PR.

@codecov
Copy link

codecov bot commented Aug 29, 2021

Codecov Report

Merging #990 (4b741ea) into main (f936697) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##               main     #990   +/-   ##
=========================================
  Coverage     83.86%   83.86%           
  Complexity     2686     2686           
=========================================
  Files           353      353           
  Lines         11773    11773           
  Branches       1412     1412           
=========================================
  Hits           9874     9874           
  Misses         1598     1598           
  Partials        301      301           
Flag Coverage Δ
linux 82.17% <ø> (ø)
mac 82.24% <ø> (ø)
windows 82.88% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f936697...4b741ea. Read the comment docs.

@binkley
Copy link
Contributor

binkley commented Aug 29, 2021

Can you share some concrete use case projects?
My first instinct is YAGNI, but I trust this RFC solves some real problems. I'd like to see those (and find out how I could improve my own build via this RFC).

@charleskorn
Copy link
Collaborator Author

@binkley:

Can you share some concrete use case projects?
My first instinct is YAGNI, but I trust this RFC solves some real problems. I'd like to see those (and find out how I could improve my own build via this RFC).

The primary use case for this is to enable more configurability for bundles. Two particular examples come to mind:

Example 1: imagine a bundle that provides an opinionated build environment for Golang, and defaults to a known stable version of the Golang toolchain. As a consumer of that bundle, I might want to be able to use an older or newer version of Golang, but retain all the other opinionated bits and pieces that are included in the bundle. At the moment, the only way to achieve this would be through build args to build an image with the correct version of Golang, but with the addition of being able to use expressions (almost) everywhere, this could just change the image used for the build environment. (Imagine a container with image: golang:${{ var.golang_version }}.)

Example 2: imagine a bundle that provides a linting tool, and we only want to allow consumers of the bundle to tweak a limited number of configuration options for the tool. With the current functionality, the only way this could be done is either: have some kind of script that wraps the tool and reads those options from a configuration file in a hardcoded location in the consuming project, or provide config variables that are then passed through to the tool via environment variables. For example:

Bundle:

containers:
  tool:
    command: sh -c "tool --scan '$SCAN_DIRECTORY'"
    environment:
      SCAN_DIRECTORY: <{scan_directory}

Consuming project's batect.local.yml:

scan_directory: scripts

This proposal, in conjunction with #991, would allow something like this:

Bundle:

containers:
  tool:
    command: tool --scan '${{ var.scan_directory }}'

(note that sh is no longer required because we're no longer trying to get environment variables interpolated into a command line)

Consuming project's configuration file:

bundles:
  - type: git
    repo: ...
    config:
      scan_directory: scripts

Does that make sense? Open to other ideas and suggestions.

@Tzrlk
Copy link

Tzrlk commented Sep 3, 2021

For the implementation side, do you have an idea what you'll use an an expression evaluator? There are a few that could be dropped-in such as JEXL, though finding one that doesn't add too much functionality is tricky.

I'd prefer a simpler internal syntax, such as with envsubst, where the only operations available (broadly speaking) are length, casing, slices, trimming, defaults, and replacement. Anything beyond that starts to stray into logic and scripting.

Not sure if it's helpful, but there's a neat interpolation library that I found while looking for a java envsubst implementation: https://github.com/lantunes/interpolatd

@charleskorn
Copy link
Collaborator Author

@Tzrlk:

For the implementation side, do you have an idea what you'll use an an expression evaluator?

My initial plan was to extend the existing expression parsing and evaluation logic that already exists in Batect. I'm hesitant to take on any dependencies, as they will make it harder to move to Kotlin/Native (and away from the JVM), which is something I'd like to do sooner rather than later. I'm also cautious that many libraries bring in many other features that aren't needed, which might lead to unexpected behaviour, security or stability issues or difficulty replacing the library with something else in the future.

I'd prefer a simpler internal syntax, such as with envsubst, where the only operations available (broadly speaking) are length, casing, slices, trimming, defaults, and replacement. Anything beyond that starts to stray into logic and scripting.

+1, I definitely don't want to encourage / permit people to attempt to use YAML as a programming language. I think the examples you've given there are good examples of the kinds of things that I'd be OK with adding in the future. Anything beyond that belongs as a script or application inside containers.

@Tzrlk
Copy link

Tzrlk commented Sep 5, 2021

Legit. I guess the question then becomes whether to use a syntax as terse as envsubst, or use actual words for the functions. I think I lean on the side of words, so as to promote readability. I'd want to keep the functions short though, or use reasonably well-known shorthand to avoid lengthy expressions.

The gotpl pipeline syntax used in Helm templating is a reasonable example. There's a very limited list of functions, and the output of one function can be piped into another to have one fluid expression, though parentheses can be used to have sub-expressions, e.g. "{{ .myinput | trim | default ( .myotherinput | b64enc ) }}". The downside is that it needs tons of whitespace to be readable, where an equally fluid dot syntax could do the trick just fine: myinput.trim.default(myotherinput.b64enc).

Having symbolic shorthands for things might be nice, though it would complicate the parser more than necessary, as you'd probably need to implement prefix and postfix notation handling.

I hope my rambling is helpful.

@Tzrlk
Copy link

Tzrlk commented Sep 5, 2021

Oh, and for another concrete use-case: how to execute the same set of tasks on different submodules of the same project.

For instance:

/
|- modules/
| |- module-1/
| | \- <files>
| \- module-2/
|   \- <files>
|- batect
|- batect.cmd
\- batect.yml

Given this project, how do I run the exact same tasks (build, test, package) on both modules without explicitly defining each task multiple times (build -> [ build:module-1, build:module-2 ])? With this, I can write the build, test, and package commands once, and parameterise the working_directory to modules/${{ submodule }}.

@charleskorn
Copy link
Collaborator Author

@Tzrlk

I hope my rambling is helpful.

Definitely helpful.

I was leaning more towards the myinput.trim.default(myotherinput.b64enc) style of syntax that you suggested - at least in my experience, I've found the chaining style (eg. .myinput | trim | default ( .myotherinput | b64enc )) to be difficult to follow, especially when subexpressions are required or a single function takes arguments. Open to other opinions though.

@charleskorn
Copy link
Collaborator Author

charleskorn commented Nov 2, 2021

📢 Update on implementing this RFC

I realise there's been radio silence on this RFC since I posted it and so I wanted to give an update on what's going on.

The short version is that while I'm very, very keen to implement this, I'm currently prioritising some other work to improve the experience of working on Batect itself.

Batect has grown quite a bit since it first began a few years ago, and unfortunately that is starting to show - builds take longer to run than I'd like (both locally and on CI), flaky tests are going unfixed and I'm finding more and more of my precious time diverted away to fixing things that are invisible to most users, like edge cases in the Docker client. All of these things make it less pleasant to work on Batect, and make it much harder for others to contribute as well.

I have two major things on my backlog to address these issues:

  • The biggest contributor to the slow builds and flaky tests are the journey tests, where Batect is started from the command line and used to verify a variety of scenarios. This group of tests has grown to the point where they take up to ten minutes to run as part of the CI build.

    I'm going to spend some time trimming back the test scenarios here, either removing them altogether or covering the functionality with other kinds of tests, and also looking into other ways to improve their performance.

  • The second biggest contributor to this is the Docker API client. It comprises nearly 40% of the production code in the codebase and has a similarly large proportion of the unit tests. Not only does the API client carry a significant maintenance burden, it consumes a significant amount of time in CI, it is often the source of bugs and instability, and having to implement new Docker client features myself means that there is always a lag between new features being added to Docker and being able to provide them in Batect.

    I'm currently working on a new Docker client which is currently very much in the experimental stage but is already showing promise:

    • Separating the client out of the main project means it does not need to be rebuilt and retested on every commit to the main project, saving a significant amount of time
    • Embedding the official Docker Golang client as a library inside the client (rather than implementing these features myself) means adding support for new features is far simpler and faster, and also means that the behaviour of Batect should more closely match that of the docker CLI, which in turn makes supporting alternatives to Docker for Desktop easier as well
    • This new approach also removes a significant impediment to migrating to Kotlin/Native, which would remove the need for users to install a JVM to use Batect

Once I've finished addressing these issues, I will come back to this feature as a priority.

@charleskorn
Copy link
Collaborator Author

📢 Update

Apologies for the continuing radio silence on this RFC.

Replacing the Docker client in Batect has taken much longer than I planned / hoped / expected, however, I can see the light at the end of the tunnel.

As soon as this is complete, the features described here are my next priority.

@sschuberth
Copy link
Contributor

It would be nice if a new expression syntax would also allow some simple kind of "mapping". I'm facing a situation where I'd like to translate a boolean config variable to either the name of a command line option (that I pass on to a command), or an empty string (to not use that option). Currently, I'm struggling to do that.

sschuberth added a commit to oss-review-toolkit/ort that referenced this pull request Sep 27, 2022
Unfortunately, Batect's expressions [1] are not yet powerful enough [2]
to elegantly map booleans to strings, so a rather lengthy `sed`
construct has to be used. Gradle build scans are disabled by default. In
order to enable them, run `./batect --config-var gradle_build_scan=true`.

[1]: https://batect.dev/docs/reference/config/expressions
[2]: batect/batect#990

Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
sschuberth added a commit to oss-review-toolkit/ort that referenced this pull request Sep 27, 2022
Unfortunately, Batect's expressions [1] are not yet powerful enough [2]
to elegantly map booleans to strings, so a rather lengthy `sed`
construct has to be used. Gradle build scans are disabled by default. In
order to enable them, run `./batect --config-var gradle_build_scan=true`.

[1]: https://batect.dev/docs/reference/config/expressions
[2]: batect/batect#990

Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
sschuberth added a commit to oss-review-toolkit/ort that referenced this pull request Sep 27, 2022
Unfortunately, Batect's expressions [1] are not yet powerful enough [2]
to elegantly map booleans to strings, so a rather lengthy `sed`
construct has to be used. Gradle build scans are disabled by default. In
order to enable them, run `./batect --config-var gradle_build_scan=true`.

[1]: https://batect.dev/docs/reference/config/expressions
[2]: batect/batect#990

Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
@hqtoan94
Copy link

Just wonder is there any new updates on this?

@binkley
Copy link
Contributor

binkley commented Jun 30, 2023

I've rereviewed the proposal, and see good use case examples.

Observations

  • Happy to see expressions become general across all parts of my build file -- special cases are always a road bump for those learning a new feature or facility
  • A little sad to see the shell-like syntax go away (example: ${NAME:-default value} because it was immediately familiar to me; however, for those without shell experience the new explicit syntax will be of big help
  • Quotes work! ❤️
  • I see the reasoning behind double curlies (${{ NAME }}). I don't care for the extra typing, but it seems the most sensible way to disabiguate
  • The existing <NAME syntax is bletcherous: is this reading from a file? piping from a variable? It clashes with the existing shell syntax features. Great improvement with the new syntax!
  • Glad to see an explicit deprecation policy described

Suggestions

Provide a BNF specification in the proposal doc. For example, this would make explicit that padding whitespace around the expression is OK inside of the double curlies.

Questions

Do any current OS's support whitespace in environment variables? The "C" API for Linux/UNIX let one craft a main entry point to a program that sets up environment variables:

int main(int argc, char **argv, char **env);

(See https://en.wikipedia.org/wiki/Entry_point#C_and_C.2B.2B)

This is a sad path example if I pass an array as env that includes a key like "BOB'S UNCLE", so that Batect would inherit this from the user's program or shell.

Maybe this isn't worth considering. 🤔

@charleskorn
Copy link
Collaborator Author

Just wonder is there any new updates on this?

Unfortunately I don't have any news on this - I have far less time to work on Batect these days than I used to, and keeping the lights on is the highest priority over everything else, including this.

If you're really keen to see this, I'm always open to PRs 😄

@charleskorn
Copy link
Collaborator Author

Thanks for all the feedback @binkley! Will definitely consider this when I get time to work on it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen Issue should not be marked as stale is:request for comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants