Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use shell environment variables in migrations using Go's text/template parsing #439

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

davidecavestro
Copy link

@davidecavestro davidecavestro commented May 1, 2023

Add support to selectively enable usage of env vars within migrations.
Named env vars are enabled though "env:VAR_NAME" entries within the up/down block headers., reusing the existing parsing logic
The name of activated env vars are then kept with migration options and passed to go templating just before executing relevant migrations.

See discussion #352 and issue #118

Add support to selectively enable usage of env vars within migrations.
Named env vars are enabled though "env-var:VAR_NAME" entries within
the up/down block headers., reusing the existing parsing logic
The name of activated env vars are then kept with migration options and
passed to go templating just before executing relevant migrations.
@amacneil
Copy link
Owner

@dossy @gregwebs for review

Copy link
Collaborator

@dossy dossy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a few additional test cases that will codify the following behaviors and help set expectations:

  • What does dbmate do if the migration specifies an environment variable (e.g., env:FOO), and that environment variable is unset in the environment at the time dbmate is executed? What should the expected behavior/outcome be? Silently interpolate an empty string? Raise an error? Something else entirely?
  • What does dbmate do if a migration has a template string (e.g., {{ .FOO }}) and env:FOO is not specified in the migration? What should the expected behavior/outcome be? Silently interpolate an empty string? Raise an error? Leave the template fragment as-is? Something else entirely?
  • What does dbmate do if there are characters in the environment variable that should be escaped, e.g., YABBA_DABBA_DOO="this ' is ' a string"? Is the expectation that the user executing dbmate must ensure that all environment variable contents are properly escaped string values for the target database, and otherwise be susceptible to SQL injections and general errors due to improperly escaped strings? Is this a reasonable expectation? Will this behavior violate the principle of least surprise?

Also, please update the README's section on "Migration files" documenting this new functionality.

@davidecavestro
Copy link
Author

davidecavestro commented May 13, 2023

I'd like to see a few additional test cases that will codify the following behaviors and help set expectations:

  • What does dbmate do if the migration specifies an environment variable (e.g., env:FOO), and that environment variable is unset in the environment at the time dbmate is executed? What should the expected behavior/outcome be? Silently interpolate an empty string? Raise an error? Something else entirely?
  • What does dbmate do if a migration has a template string (e.g., {{ .FOO }}) and env:FOO is not specified in the migration? What should the expected behavior/outcome be? Silently interpolate an empty string? Raise an error? Leave the template fragment as-is? Something else entirely?
  • What does dbmate do if there are characters in the environment variable that should be escaped, e.g., YABBA_DABBA_DOO="this ' is ' a string"? Is the expectation that the user executing dbmate must ensure that all environment variable contents are properly escaped string values for the target database, and otherwise be susceptible to SQL injections and general errors due to improperly escaped strings? Is this a reasonable expectation? Will this behavior violate the principle of least surprise?

Also, please update the README's section on "Migration files" documenting this new functionality.

Good points: I'd like some more feedback before proceeding with changes.

For the 1st and 2nd ones I propose to raise an error, i.e. we could simply add .Option("missingkey=error") to the template.
Migration authors can still set default values within the template expression in case they prefer, and we could document it.

About SQL injections we could abandon templates in favor of query parameters/placeholders, passing env var values i.e. tx.Exec(upScript, envVars).

IMHO the downsides of placeholders are:

  • positional binding -> migration authors must pay attention to the relative position of env vars declarations
  • the placeholder syntax varies between db drivers

AFAIK named parameters are only available through additional packages like https://github.com/jmoiron/sqlx, so if the mandate is keeping thing simple, that's not a viable option.

OTOH if we keep both templates and placeholders - leaving the migration author choose between risks of SQL injection and portability/positional declarations - we end up violating both the POLA and KISS principle.

@gregwebs
Copy link

I agree that erroring on unset/unspecified is best.

Perhaps we should think about placeholders more? Then we don't need our own variable feature but instead a feature for working with placeholders.

@davidecavestro
Copy link
Author

davidecavestro commented May 14, 2023

... a feature for working with placeholders.

I'm not sure what you mean.

@gregwebs
Copy link

What's the actual downside of using sqlx?

@gregwebs
Copy link

It seems placeholders may end up being too limited and templates will be needed.
The pgx author himself created a migration system and it has templates.

@gregwebs
Copy link

The migration system should be referencing static values in its template variables rather than user input, so SQL injection would have to be an insider attack that uses a static value that makes it through code review (or an environment variable and a controlled deployed environment)- quite different from what we normally think of for SQL injection.

With respect to escaping, I think we can put the burden on the user that wants to use the template system. Over time we can provide some helper functions for different types of escaping.

I think this PR is good once errors are raised for missing variables.

@davidecavestro
Copy link
Author

davidecavestro commented May 19, 2023

Is it ok if I place the env var refs resolution logic into an internal package, in order to unit test it but still retaining private access?

image

@gregwebs
Copy link

Few things I like hearing more than talk of testing. I also like smaller packages. It's possible other maintainers don't want to introduce a new pattern and can speak up, but I am fine with it.

Fail on referring non existent env vars
Increase test coverage
@davidecavestro davidecavestro requested a review from dossy May 20, 2023 20:14
@davidecavestro
Copy link
Author

Ouch, I forgot to update docs, but I'll do it soon

Beef up docs
Add test case for erroring on missing env var and
js function to mitigate naive sql injection attempts
@davidecavestro
Copy link
Author

So I added some references to the env option within the Migration Options section of the readme.

I also played with naive sql injection attempts and saw they could be mitigated using the js function, hence I mentioned it in the docs and added some explicit tests.

Copy link
Collaborator

@dossy dossy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Some minor English language corrections requested, but otherwise 👍.

pkg/dbmate/internal/envvars_test.go Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Fix typos and grammar errors
create role '{{ or (index . "THE_ROLE") "adminuser" }}' login password '{{ .THE_PASSWORD }}';
```

The `js` function may help to mitigate risks of SQL injection:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry if I missed the discussion- how does this mitigate SQL injections risks? Can't I insert a string with a single quote still?

Copy link
Collaborator

@dossy dossy May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The go/template functions includes js which is described as:

js
	Returns the escaped JavaScript equivalent of the textual
	representation of its arguments.

While this may not necessarily be sufficient to prevent all possible SQL injection attacks (see: Unicode homoglyph injection attacks aka Unicode smuggling [1] [2], character set transcoding SQL injection attacks [3]), it prevents the most trivial kind where an unescaped single quote ' is in a string that will be interpolated into the SQL query, by escaping it with a backslash, thus turning ' into \'.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trivial case of unescaped single quote is covered by TestResolveMitigatingSqlInjection.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test case!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment in the main PR thread - I'm not a fan of introducing functions to the template syntax, or encouraging use of js escaping functions for sql escaping.

@dossy
Copy link
Collaborator

dossy commented May 22, 2023

Before merging this, I'm only very slightly concerned that we are technically introducing a backwards-incompatible change, in that if anyone actually has {{ and }} strings in their migrations for some reason, that would previously be interpreted literally as plain text, but now will be parsed and processed with text/template and will likely cause errors.

I suspect the likelihood of anyone having {{ and }} in their migration files already is unlikely, but it may not be zero probability.

@amacneil, how have you handled introducing backwards-incompatible changes into releases? Does the dbmate project follow semantic versioning for its version numbering? If yes, then this probably means needing to bump the version to v3.0.0, and not v2.4.0, which seems a bit much considering the amount of functionality introduced by this change.

Perhaps we should put migration file parsing behind a command line argument, default off, which would allow this change to be fully backwards-compatible? @davidecavestro & @gregwebs, how do you feel about that? I propose --use-env or --enable-templates as the switch to control this.

@dossy dossy changed the title Use system variables in migrations Use shell environment variables in migrations using Go's text/template parsing May 22, 2023
@davidecavestro
Copy link
Author

davidecavestro commented May 22, 2023

Before merging this, I'm only very slightly concerned that we are technically introducing a backwards-incompatible change, in that if anyone actually has {{ and }} strings in their migrations for some reason, that would previously be interpreted literally as plain text, but now will be parsed and processed with text/template and will likely cause errors.

I suspect the likelihood of anyone having {{ and }} in their migration files already is unlikely, but it may not be zero probability.

@amacneil, how have you handled introducing backwards-incompatible changes into releases? Does the dbmate project follow semantic versioning for its version numbering? If yes, then this probably means needing to bump the version to v3.0.0, and not v2.4.0, which seems a bit much considering the amount of functionality introduced by this change.

Perhaps we should put migration file parsing behind a command line argument, default off, which would allow this change to be fully backwards-compatible? @davidecavestro & @gregwebs, how do you feel about that? I propose --use-env or --enable-templates as the switch to control this.

We could also enable templating just when at least one occurrence of env:VAR_NAME has been set into the header.

OTOH if an additional switch is needed, IMHO a self-explanatory option is preferable, so --enable-templates.

@gregwebs
Copy link

We could also enable templating just when at least one occurrence of env:VAR_NAME has been set into the header.

That seems like a great way to do it- it ends up being enabled automatically when needed on a per-migration basis- so it would be fully backwards compatible.

@dossy
Copy link
Collaborator

dossy commented May 23, 2023

I'm indifferent to the two options. My concern is guaranteeing backwards-compatibility in all cases, which both options, (a) only enable template parsing when a migration defines an env variable for interpolation, or (b) only enable template parsing when a command line argument is specified, satisfy.

I would vote in favor of enabling Go template parsing with --enable-templates over auto-enabling it only when an env variable is specified, because there are other potentially useful ways that writing migration scripts using Go templating aside from interpolating environment variables, so it would be nice to have a way to enable it without having to define a placeholder env var in the migration script.

While I personally dislike migration scripts that are not plain text with literal values because it potentially introduces issues with repeatability (a future execution of the unmodified migration script can result in a different migration), I understand others do not try to abide by this constraint and a template-driven migration script might be really useful.

Introducing Go template parsing in general opens the door for a wide range of functionality we could make available to migration script writers aside from only interpolating environment variables.

We could even go one step further and remove the entire env:var mechanism per-migration but keep the --enable-templates functionality, and instead expose os.Getenv and os.ExpandEnv to text/template using template.Funcs and a template.FuncMap, like Hugo or gomplate does.

Users could then use {{ os.Getenv "ENV_VAR_NAME" "defaultValue" }} in their migration script. This would mean that migration script writers would have unrestricted to any environment variable, which may be an issue. Hugo implements a Security Policy that has a default filter on what environment variables can be accessed by default. If necessary, we could do something similar, and the filter could have a reasonable default value and overridable with a command-line argument like --os-env-regexp or something.

Something to consider when deciding between enabling templating if and only if env:var is specified on a migration, or when --enable-templates is specified as a command-line argument.

@gregwebs
Copy link

If you want reproducibility, the current system has it- it just needs to record the values of the variables used.

We can do auto-enabling now and then still add the flag in the future if we support reading in template values that are not pre-declared.

@amacneil
Copy link
Owner

@amacneil, how have you handled introducing backwards-incompatible changes into releases? Does the dbmate project follow semantic versioning for its version numbering?

Yes, following semver. In general I try not to make backwards incompatible changes to the CLI/migration spec.

If yes, then this probably means needing to bump the version to v3.0.0, and not v2.4.0, which seems a bit much considering the amount of functionality introduced by this change.

Agreed, better if we can do it in a backwards compatible way.

Introducing Go template parsing in general opens the door for a wide range of functionality we could make available to migration script writers aside from only interpolating environment variables.

We could even go one step further and remove the entire env:var mechanism per-migration but keep the --enable-templates functionality, and instead expose os.Getenv and os.ExpandEnv to text/template using template.Funcs and a template.FuncMap, like Hugo or gomplate does.

My initial reaction is that this goes against dbmate's philosophy of keeping things simple. Now we are introducing a whole templating language (and one that will be completely unfamiliar to non-Go developers). I've stated in the past that I think env vars in migrations are not a great idea, but if we are going to support it, then I think we should strictly limit this to env vars and not support any other functions.

We could also enable templating just when at least one occurrence of env:VAR_NAME has been set into the header.

Agreed, this seems sufficient to ensure backwards compatibility (existing migrations will not break, future migrations that use this feature will need to be aware of it.

I also played with naive sql injection attempts and saw they could be mitigated using the js function, hence I mentioned it in the docs and added some explicit tests.

I disagree with adding the js function, because again it is something new for the user to learn, and javascript escaping is not sql escaping so it won't work in all cases. Doing a half assed job of sql escaping is worse than not doing it at all. And we cannot easily provide a sql escaping function because it is different for each database and depending on what type of escaping you need (string vs identifier, etc).

Here is an alternative solution to address the problem of sql injection:

-- migrate:up unsafe-env:ROLE env:PASSWORD
CREATE USER {{.ROLE}} WITH PASSWORD {{.PASSWORD}};
  • Keep things simple: no fancy functions for the user to learn, only variables are available
  • env:FOO passes the variable through database-specific string escaping
  • unsafe-env:FOO means I know what I am doing and disable string escaping for that variable (e.g. you need it to create an identifier in postgres). Naming convention is inspired by web standards, e.g. CSP's unsafe-eval and React's dangerouslySetInnerHTML
  • if you specify the same variable twice or otherwise provide an invalid --migrate directive we should abort with an error

@amacneil
Copy link
Owner

The migration system should be referencing static values in its template variables rather than user input, so SQL injection would have to be an insider attack that uses a static value that makes it through code review (or an environment variable and a controlled deployed environment)- quite different from what we normally think of for SQL injection.

With respect to escaping, I think we can put the burden on the user that wants to use the template system. Over time we can provide some helper functions for different types of escaping.

I do agree with this, and that would be a perfectly reasonable stance. However, the only real use case I have heard articulated for this feature seems to revolve around not storing passwords in your migration files and injecting them at runtime. In that case, it would be nice if your long secure passwords don't accidentally trip up migrations due to accidental sql injection (even if it will not be exploited). So I think doing my automatic string escaping proposal above would be worthwhile, as long as we give people an escape hatch.

@gregwebs
Copy link

Another use case that I saw was to use a different schema name for a different environment. So rather than inserting a string, it would be inserting a database name.

@gregwebs
Copy link

I have seen other migration systems use templating for some thing, like adding updated at and created at columns to a table with the trigger. There is a significant amount of boiler plate around attaching that to all the tables. However, it would probably be better to do such a thing in a pre-phase that still generates a static SQL output.

@gregwebs
Copy link

We could expose a new command "dbmate template" for those that want to do ad hoc templating. It would be used as a pre phase- take a file as an input and output, a file.

Then, for this environment, variable approach we could only support placeholders values such as strings.

@davidecavestro
Copy link
Author

ok guys, it's plenty of viable options here, and it's certainly up to you maintainers driving the change with proper vision.

I drafted this PR with the aim to keep things really simple, avoiding to tackle with SQL injection for many reasons.
Then during reviews I added an example of standard js function usage along with a specific test case.
I can easily remove them to avoid confusion.

That said, I totally understand if you prefer to postpone and or/restart designing the whole thing from scratch to get better features.

In other words I'm not in a hurry to have this feature: I contributed this PR just to give back and keep hands on golang.
Just my two cents.

@gregwebs
Copy link

Thanks for the understanding. It is obvious we are lacking in a process to agree on how new features should behave before they start being developed.

I guess one question that is being asked now is- what does the implementation look like to focus on the password insert case and use a placeholder for a string? Are there inherit difficulties or is there a straight-forward approach.

As stated, I think even if we go that route we can still expose full templating for those that want it.

@dossy
Copy link
Collaborator

dossy commented May 25, 2023

My initial reaction is that this goes against dbmate's philosophy of keeping things simple. Now we are introducing a whole templating language (and one that will be completely unfamiliar to non-Go developers).

The upside is that if we ensure backwards compatibility, and enable the templating functionality if and only if the user opts-in to using it by explicitly activating it, either through a flag in a migration script or a command line arg, whatever path we decide to take, then it becomes completely optional.

I've stated in the past that I think env vars in migrations are not a great idea, but if we are going to support it, then I think we should strictly limit this to env vars and not support any other functions.

Personally, I agree with you, that anything that modifies the migration before it's executed isn't a great idea.

In my opinion, one of dbmate's strengths is its forced simplicity, which makes its execution behavior easier to reason about and predict exactly what the outcome will be, and there's incredible value in that. It's this characteristic that drove my decision to use dbmate over alternate options.

I disagree with adding the js function, [...]

Unfortunately, if we use Go's text/template, then js along with a handful of other built-in functions will be available, included in the global function map. We aren't exactly "adding" these functions, they're mandatory built-ins that are part of text/template.

It's a shame that there's no easy way to disable the built-in functions, without forking the package and removing these lines of code:

https://cs.opensource.google/go/go/+/refs/tags/go1.20.4:src/text/template/funcs.go;l=149-151;drc=29604312784cfbf530fcf54837b7cf42c0500d0b

Oh, I just had an idea after looking at that code: maybe it's possible to define a custom function map with function names that shadow the built-in names, and make the implementation of the custom functions raise errors prevent them from being used.

Not sure the juice is worth the squeeze, though.

[...] because again it is something new for the user to learn, and javascript escaping is not sql escaping so it won't work in all cases. Doing a half assed job of sql escaping is worse than not doing it at all. And we cannot easily provide a sql escaping function because it is different for each database and depending on what type of escaping you need (string vs identifier, etc).

If dbmate.Driver had QuoteIdentifier in its interface, we could possibly even implement a custom function and add it to the parser, which we would have to delay template parsing until Migration.Parse time and pass along the database Driver to the parser, and ... and ... probably not worth the complexity it adds.

  • env:FOO passes the variable through database-specific string escaping

Hm. This would require the change I just described above, making a database-specific quoting function part of the database Driver public interface, and passing along the driver object to the migration parser ...

  • unsafe-env:FOO means I know what I am doing and disable string escaping for that variable (e.g. you need it to create an identifier in postgres). Naming convention is inspired by web standards, e.g. CSP's unsafe-eval and React's dangerouslySetInnerHTML

Maybe for now we only support the unsafe-env functionality, which makes it explicitly clear that WYSIWYG and use at your own risk, and so on?

Should we proceed with changing env to unsafe-env as the way forward for this PR, and update the tests and README accordingly?

@gregwebs
Copy link

I don't think there's anything inherently unsafe here- again this isn't dealing with arbitrary user input- it's variables values are known ahead of time, except for passwords. Randomly generated passwords aren't really a threat- this would have to be a user supplied password. That just needs to be given as a string. It might make more sense to use terms like escaped, quoted, raw, text, etc and make a note in the docs about passwords.

@dossy
Copy link
Collaborator

dossy commented May 26, 2023

I don't think there's anything inherently unsafe here [...]

In this context, "unsafe" communicates that you, the user, are wholly and entirely responsible for escaping the input string as necessary, that passing through values should be considered an unsafe operation if you would have otherwise expected your strings to be escaped and quoted for you. Essentially, "unsafe" is a warning signal to the operator that there are NO protections offered here: you are wielding a footgun. You are responsible for how you aim it. Do not blame anyone but yourself if the string you pass in causes bad things to happen because you failed to escape it properly.

@amacneil
Copy link
Owner

If we can't disable the Go template helpers, then how about we just do our own templating using ${} syntax? It doesn't need to be complicated.

E.g. this would work fine:

-- migrate:up unsafe-env:ROLE env:PASSWORD
CREATE USER ${ROLE} WITH PASSWORD ${PASSWORD};

While we're here, it should probably be an error to run a migration that specifies an environment variable if that environment variable is unset (empty is ok, unset should be an error).

@DevopsMercenary
Copy link

I found a nuance / issue that I wanted to let you know about.

In a db migration, if I want to use variables I follow this syntax....

-- migrate:up env:DBNAME env:OWNER

Therefore I would assume that for the down the syntax would be the same

-- migrate:down env:DBNAME env:OWNER

But I get an error...

Error: dbmate requires each migration to define a down block with '-- migrate:down'

BUT I did figure out that the following syntax works...

-- migrate:down env:DBNAME,OWNER

If you need any more details please let me know. I have the latest from your branch and I have a current version of GoLang

[greg@CM-LT-0007 dbmate (env-vars)]$ git status
On branch env-vars
Your branch is up to date with 'origin/env-vars'.
nothing to commit, working tree clean

go version go1.20.6 darwin/arm64

@o-lee
Copy link

o-lee commented Jan 17, 2024

Will this PR be released soon? Thanks.

@dossy
Copy link
Collaborator

dossy commented Jan 17, 2024

Will this PR be released soon? Thanks.

@o-lee There's no expected release date for this PR.

At a minimum, I think this needs to be implemented before this is safe to merge:

We could also enable templating just when at least one occurrence of env:VAR_NAME has been set into the header.

Agreed, this seems sufficient to ensure backwards compatibility (existing migrations will not break, future migrations that use this feature will need to be aware of it.

Or, alternatively, this:

Perhaps we should put migration file parsing behind a command line argument, default off, which would allow this change to be fully backwards-compatible? @davidecavestro & @gregwebs, how do you feel about that? I propose --use-env or --enable-templates as the switch to control this.

And, someone needs to investigate @DevopsMercenary's issue and determine what's going on.

We need someone to work on these.

@dossy dossy self-requested a review January 17, 2024 16:25
Copy link
Collaborator

@dossy dossy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @DevopsMercenary's comment, it's clear that we need more test cases. Our current tests only cover migrate:down with only one variable. At a minimum, we need a test with one, and a test with more than one.

@o-lee
Copy link

o-lee commented Jan 18, 2024

@dossy, thanks for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants