Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate uploads settings to the db-level behind the scenes, so the uploads DB can be set by the config file #42869

Merged
merged 41 commits into from
May 27, 2024

Conversation

calherries
Copy link
Contributor

@calherries calherries commented May 17, 2024

This is necessary for the Cloud team to set the uploads DB using the config file without knowing the database ID in advance. Context behind the request is here.

BE changes:

This PR includes a breaking change to the way uploads settings are configured by the config file. Instead of being on the global level, they can be configured on a database level.

Before, you could set the uploads-enabled, uploads-table-prefix, and uploads-schema-name settings globally like this:

config:
  settings:
    - uploads-enabled: true
    - uploads-database-id: 2
    - uploads-schema-name: uploads
    - uploads-table-prefix: uploads_
  databases:
    - name: test-data
      engine: postgres
      ...

but you would have to know the ID of the database you wanted to be the uploads database. That should not be possible until after the Metabase instance has already been set up (without depend on the auto-incrementing behaviour of database IDs, and the databases being created in a certain order, but that's unstable and users shouldn't have to depend on it).

Instead, all upload settings can be set on the database level:

config:
  settings:
    - ...
  databases:
    - name: test-data
      engine: postgres
      ...
      uploads_enabled: true
      uploads_schema_name: uploads
      uploads_table_prefix: uploads_

Since there can only be one uploads database, if there is more than one database with uploads_enabled: true in the config file, then only the last database in the config file will actually have uploads_enabled=true in the application database.

They way this works is by calling this function inside the before-insert and before-update toucan hooks for a database:

(defn- handle-uploads-enabled!
"This function maintains the invariant that only one database can have uploads_enabled=true."
[db]
(when (:uploads_enabled db)
(t2/update! :model/Database :uploads_enabled true {:uploads_enabled false :uploads_table_prefix nil :uploads_schema_name nil}))
db)

FE changes:

The FE changes are just a refactoring, there should be no observable change in behaviour. See this file to understand how the data returned by the API endpoints have changed. The rest of the FE changes are simply to accomodate this change in schema. Having everything under one setting is needed to avoid multiple relatively expensive executions of (t2/select-one :model/Database ...) whenever the uploads settings are queried. There's no equivalent for the uploads-enabled setting because it's redundant.

Breaking changes to settings:

The settings uploads-enabled, uploads-database-id, uploads-schema-name, and uploads-table-prefix no longer exist. These were never documented, so I've removed them without a deprecation cycle. Instead they have been replaced with one setting: uploads-settings. It contains the same information as the previous settings but in one map.

e.g.

(uploads-settings)
=> {:db_id 1, :schema_name "PUBLIC", :table_prefix nil}

uploads-settings is really just a convenient proxy for the data that is actually stored in database columns. See the getter and setter:

:getter (fn []
(let [db (t2/select-one :model/Database :uploads_enabled true)]
{:db_id (:id db)
:schema_name (:uploads_schema_name db)
:table_prefix (:uploads_table_prefix db)}))
:setter (fn [{:keys [db_id schema_name table_prefix]}]
(cond
(nil? db_id)
(t2/update! :model/Database :uploads_enabled true {:uploads_enabled false
:uploads_schema_name nil
:uploads_table_prefix nil})
(or (not-handling-api-request?)
(mi/can-write? :model/Database db_id))
(t2/update! :model/Database db_id {:uploads_enabled true
:uploads_schema_name schema_name
:uploads_table_prefix table_prefix})
:else
(api/throw-403))))

@calherries calherries requested a review from camsaul as a code owner May 17, 2024 22:19
@metabase-bot metabase-bot bot added the .Team/BackendComponents also known as BEC label May 17, 2024
@calherries calherries changed the title Migrate uploads settings to the db-level Migrate uploads settings to the db-level behind the scenes, so it can be set by config file May 18, 2024
@calherries calherries changed the title Migrate uploads settings to the db-level behind the scenes, so it can be set by config file Migrate uploads settings to the db-level behind the scenes, so the uploads DB can be set by the config file May 18, 2024
Copy link

github-actions bot commented May 18, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff ba7eae7...3b9584f.

Notify File(s)
@ranquild frontend/src/metabase/status/components/FileUploadStatus/FileUploadStatus.unit.spec.tsx

@calherries calherries added the backport Automatically create PR on current release branch on merge label May 21, 2024
Copy link

replay-io bot commented May 21, 2024

Status Complete ↗︎
Commit 3b9584f
Results
2571 Passed

@calherries calherries requested a review from a team May 23, 2024 14:16
Copy link
Contributor

@qnkhuat qnkhuat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to rewrite the backfill migrations using custom migrations because setting.value could be encrypted.

@calherries calherries requested review from a team and qnkhuat May 23, 2024 23:55
:uploads_table_prefix uploads-table-prefix
:uploads_schema_name uploads-schema-name}
:where [:= :id db-id]})))
(when-let [db (t2/query-one {:select [:*], :from :metabase_database, :where :uploads_enabled})]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about cases where we have more than one upload db here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There cannot be more than one DB with uploads_enabled. This invariant is maintained by the toucan hooks

(defn- handle-uploads-enabled!
"This function maintains the invariant that only one database can have uploads_enabled=true."
[db]
(when (:uploads_enabled db)
(t2/update! :model/Database :uploads_enabled true {:uploads_enabled false :uploads_table_prefix nil :uploads_schema_name nil}))
db)

:where [:= :id db-id]})))
(when-let [db (t2/query-one {:select [:*], :from :metabase_database, :where :uploads_enabled})]
(let [settings [{:key "uploads-database-id", :value (encryption/maybe-encrypt (str (:id db)))}
{:key "uploads-enabled", :value (encryption/maybe-encrypt (str "true"))}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{:key "uploads-enabled", :value (encryption/maybe-encrypt (str "true"))}
{:key "uploads-enabled", :value (encryption/maybe-encrypt "true")}

it's already a string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true that

;; These settings were removed in v50 and will be removed in a future release. They are kept here for backwards compatibility
;; to avoid breaking existing installations that may have set these values with environment variables, or via the config file.

(defsetting uploads-enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not fully understand why we still need these here. What's wrong if we remove them now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Without these, if you start MB with these settings set via the config file, Metabase will fail to launch.
  2. Without these, if you start MB with these settings set via env vars, it might be confusing because you wouldn't see why they weren't working anymore, even though you expected them to work (although that's much less important than 1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirements for this was agreed with Vamsi here

:table_prefix (:uploads_table_prefix db)}))
:setter (fn [{:keys [db_id schema_name table_prefix]}]
(cond
(nil? db_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check write perm in this case too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that's a great question. We currently don't check any write perms when disabling uploads, but maybe we should. We probably didn't realise this before because uploads weren't a database-level setting. It's a very niche problem though: settings managers can disable uploads for databases that they don't have write permissions for. I'll make a fix and test it works but I wouldn't block this PR on it

@@ -316,8 +321,8 @@
(= :unrestricted (data-perms/full-db-permission-for-user api/*current-user-id*
:perms/view-data
(u/the-id db)))
;; previously this required `unrestricted` data access, i.e. not `no-self-service`, which corresponds to *both*
;; (at least) `:query-builder` plus unrestricted view-data
;; previously this required `unrestricted` data access, i.e. not `no-self-service`, which corresponds to *both*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was correct before

Copy link
Contributor

@iethree iethree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontend LGTM! 🎈

@@ -217,7 +217,7 @@ class MetabaseSettings {
* @deprecated use getSetting(state, "anon-tracking-enabled")
*/
uploadsEnabled() {
return !!(this.get("uploads-enabled") && this.get("uploads-database-id"));
return !!this.get("uploads-settings")?.db_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! 🗑️

Copy link
Contributor

@crisptrutski crisptrutski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the care with which this was done, and the test clean ups are quite nice too.

There's a part of me that's unconvinced with the necessity for the schema change, as the new schema doesn't prevent against invalid states, and peppering the code assumptions that rely on a toucan hook is a bit fragile.

I'm curious whether you explored the idea of rather intercepting the reading of this config to update the settings instead, to me this seems like it would be simpler and more robust. In the worst case we could probably strip off and re-route these "database options" in a hook as well, although I would prefer doing it in a more localized spot that doesn't run again after start up. This would also give us a chance to validate that there is no redundancy across config files, rather than silent order dependent clobbering.

I don't feel strongly enough to block this change, especially since it half opens a door to support multiple upload databases in future, which seems plausibly useful. Perhaps you even considered an approach like this, in which case I'd be curious to hear the reasons for rejecting it.

`(do-with-uploads-enabled (fn [] ~@body)))

(defn do-with-uploads-disabled
"Set uploads_enabled to true the current database, and as an admin user, run the thunk"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Set uploads_enabled to true the current database, and as an admin user, run the thunk"
"Set uploads_enabled to false for the current database, and as an admin user, run the thunk"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 3b9584f

[thunk]
(mt/with-discard-model-updates [:model/Database]
(t2/update! :model/Database :uploads_enabled true {:uploads_enabled false})
(t2/update! :model/Database (mt/id) {:uploads_enabled false})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant with the previous line, guessing just copy-pasta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 3b9584f

@@ -3457,9 +3439,10 @@
This function exists to deduplicate test logic for all API endpoints that must return `based_on_upload`,
including GET /api/collection/:id/items and GET /api/card/:id"
[request]
(mt/with-driver :h2 ; just test on H2 because failure should be independent of drivers
(mt/with-temporary-setting-values [uploads-enabled true]
(mt/test-driver :h2 ; just test on H2 because failure should be independent of drivers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside, it would be better to use :mb/once IMHO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 3b9584f

@@ -328,7 +333,8 @@
(and (some? schema-name)
(not (driver.s/include-schema? db schema-name)))
(ex-info (tru "The schema {0} is not syncable." schema-name)
{:status-code 422}))))
{:status-code 422}))
(can-use-uploads-error db)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing the slowest stuff last 🙌

:export? false
:type :boolean
:default false
:getter (fn [] (log/warn "'uploads-enabled' has been removed; use 'uploads_enabled' on the database instead"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for returning nil rather than falling back to getting the answer from uploads-settings?

I guess one concern is that we could re-introduce code that depends on this setting, unless we had a linter or context sensitive exception to prevent that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're just talking about the getter, there's little benefit from this because we don't promise backwards
compatibility of the API and we generally don't have a problem breaking things like this that are unlikely to be
depended upon. I'd rather break the API and make it impossible for FE code to be written that depends on these settings.

If you're talking about the setter, then that's more interesting. Unfortunately there's no way to be perfectly backwards
compatible with the new design without complicating the way env vars or config file data translates into settings.

Before both uploads-enabled: true AND uploads-database-id: <some ID> were needed to enable uploads for a database. Now we
only have one boolean field, metabase_database.uploads_enabled to enable uploads for a database. So there's no way to have two
independent setters and preserve perfect backwards compatibility of behaviour.

Given this complication I'd rather just migrate the settings, and show a warning and have the admin for the metabase instance change the config
file / env vars.

;;; Deprecated uploads settings begin
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; These settings were removed in v50 and will be removed in a future release. They are kept here for backwards compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a mechanism to remind us to do the removal, and is there a policy on which version this will happen in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. There isn't a mechanism for deprecating and removing settings currently but I've decided to add deprecated metadata to the settings in this commit and so we'll default to remove these in 53 following the standard deprecation cycle for driver changes (Cam does this manually). I also asked here to change the behaviour of the config file loading so that removing these shouldn't be such a drama. If we do that we can happily remove these settings by 53 and all we'll lose is the logged messages.

author: calherries
comment: Remove uploads settings
changes:
- sql:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do this clean-up in the custom migration instead? That avoids the quirk of having the rollback in a previous migration, and toucan2 can paper over the dialect differences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no good reason, I had just written this SQL migration first. I've tidied it up as you suggest here 3369753

Copy link
Contributor Author

@calherries calherries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it half opens a door to support multiple upload databases in future, which seems plausibly useful

Yes, that's the main motivation behind this approach. Having multiple upload databases has been discussed a couple of times since uploads was first introduced, but it was left out of the initial version to keep the upload flow as simple as possible. I suspect we'll change it soon enough

:export? false
:type :boolean
:default false
:getter (fn [] (log/warn "'uploads-enabled' has been removed; use 'uploads_enabled' on the database instead"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're just talking about the getter, there's little benefit from this because we don't promise backwards
compatibility of the API and we generally don't have a problem breaking things like this that are unlikely to be
depended upon. I'd rather break the API and make it impossible for FE code to be written that depends on these settings.

If you're talking about the setter, then that's more interesting. Unfortunately there's no way to be perfectly backwards
compatible with the new design without complicating the way env vars or config file data translates into settings.

Before both uploads-enabled: true AND uploads-database-id: <some ID> were needed to enable uploads for a database. Now we
only have one boolean field, metabase_database.uploads_enabled to enable uploads for a database. So there's no way to have two
independent setters and preserve perfect backwards compatibility of behaviour.

Given this complication I'd rather just migrate the settings, and show a warning and have the admin for the metabase instance change the config
file / env vars.

author: calherries
comment: Remove uploads settings
changes:
- sql:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no good reason, I had just written this SQL migration first. I've tidied it up as you suggest here 3369753

;;; Deprecated uploads settings begin
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; These settings were removed in v50 and will be removed in a future release. They are kept here for backwards compatibility
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. There isn't a mechanism for deprecating and removing settings currently but I've decided to add deprecated metadata to the settings in this commit and so we'll default to remove these in 53 following the standard deprecation cycle for driver changes (Cam does this manually). I also asked here to change the behaviour of the config file loading so that removing these shouldn't be such a drama. If we do that we can happily remove these settings by 53 and all we'll lose is the logged messages.

@@ -3457,9 +3439,10 @@
This function exists to deduplicate test logic for all API endpoints that must return `based_on_upload`,
including GET /api/collection/:id/items and GET /api/card/:id"
[request]
(mt/with-driver :h2 ; just test on H2 because failure should be independent of drivers
(mt/with-temporary-setting-values [uploads-enabled true]
(mt/test-driver :h2 ; just test on H2 because failure should be independent of drivers
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 3b9584f

[thunk]
(mt/with-discard-model-updates [:model/Database]
(t2/update! :model/Database :uploads_enabled true {:uploads_enabled false})
(t2/update! :model/Database (mt/id) {:uploads_enabled false})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 3b9584f

`(do-with-uploads-enabled (fn [] ~@body)))

(defn do-with-uploads-disabled
"Set uploads_enabled to true the current database, and as an admin user, run the thunk"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 3b9584f

@calherries calherries enabled auto-merge (squash) May 27, 2024 16:59
@calherries calherries merged commit f3ddcb6 into master May 27, 2024
109 checks passed
@calherries calherries deleted the migrate-uploads-settings-to-db branch May 27, 2024 17:18
Copy link

@calherries Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Automatically create PR on current release branch on merge .Team/BackendComponents also known as BEC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants