CSV task fails if uuid is used for active_storage_attachments.record_id #462

shouichi · 2021-07-29T05:07:23Z

Background

We use uuid for primary keys, thus active_storage_attachments.record_id is uuid. On the other hand, maintenance_tasks_runs.id is bigint. Therefore, when a CSV task tries to upload a CSV as an ActiveStorage::Attachment, the attachment fails to reference the CSV task's run record (MaintenanceTasks::Run) because of the type inconsistency.

Possible solutions

Document that the generated migration file must be manually edited when uuid is used.
Respect g.orm :active_record, primary_key_type: :uuid when generating the migration file.

Thanks in advance.

The text was updated successfully, but these errors were encountered:

shouichi · 2021-07-29T07:02:43Z

Unfortunately simply changing maintenance_tasks_runs.id to uuid did not work because of the following code.

maintenance_tasks/app/models/maintenance_tasks/task_data.rb

Line 44 in e64cdf0

id: available_task_runs.select("MAX(id) as id").group(:task_name)

etiennebarrie · 2021-07-29T14:01:06Z

Thanks for the report. Indeed we'd want to support primary_key_type: :uuid, but not sure how to deal with this query yet.

shouichi · 2021-08-04T04:47:10Z

The project only uses sqlite for testing. It seems that the first step is using both sqlite and postgres for testing. Is this the right direction @etiennebarrie ?

etiennebarrie · 2021-08-04T12:12:13Z

Yes, if it's necessary. I wonder if we could just test UUIDs with sqlite?

shouichi · 2021-08-05T10:16:53Z

Mmm, maybe we could use blob as a primary key and store uuid. I'll take a look.

shouichi · 2021-08-11T08:35:24Z

I tried using string type as a primary key but it seems this comes with a lot of trouble because we cannot rely on database to generate ids. Personally testing both sqlite and postgress seems the way to go because it truly tests type: :uuid. @etiennebarrie What do you think?

shouichi · 2021-08-30T09:28:35Z

Changed primary keys to uuid, ran tests, and found that the current code depends on monotonically increasing ID more than I thought. For example, Run.last or Run.where("id < ?", @cursor). It would be better to change incrementally. The first step should be to stop depending on monotonically increasing IDs. Does this make sense?

etiennebarrie · 2021-08-30T12:02:49Z

I'm not sure how which approach we should take:

stop depending on monotonically-increasing IDs
make all code dependencies on monotonically-increasing IDs abstract so we can change the behavior when using UUIDs, but keep the behavior as it is for existing users.

For the first solution: from your investigation, do we need a migration, or can we rely on the existing (task_name, created_at) index? It seems that would be the case for last run and pagination (runs#show), but for the query you mentioned above (#462 (comment)), is that the case too?

If we can avoid a migration, then the first solution is great: we don't need two code paths for UUIDs. But if we need more work for existing users, I think the second solution would be preferable.

In these situations, I tend to do a spike so that I can see all the moving parts, even if I end up shipping change incrementally. I would favor having a branch where we can see everything we will need, even things aren't perfect or tests fail.

shouichi · 2021-09-01T10:41:16Z

Thanks for the advice.

If we can avoid a migration, then the first solution is great: we don't need two code paths for UUIDs. But if we need more work for existing users, I think the second solution would be preferable.

Honestly, I'm not sure if we can avoid migration or not.

In these situations, I tend to do a spike so that I can see all the moving parts, even if I end up shipping change incrementally. I would favor having a branch where we can see everything we will need, even things aren't perfect or tests fail.

OK. I'm going to open a PR that simply changes pk type to uuid so we can discuss it more in detail. Then, I'm also going to open several PRs against the first PR branch. So that PRs stay small (though the last merge might end up large).

dhnaranjo · 2022-11-28T22:35:46Z

I found myself in this situation and decided to hack it out with some monkeypatching and I thought I'd just offer this to anyone else who has a similar short-sighted need.

I'm using Zeitwerk overrides as described here: https://guides.rubyonrails.org/classic_to_zeitwerk_howto.html#decorating-classes-and-modules-from-engines

It uses ActiveRecordExtended for a window function and CTE because I was already using it, I'm sure we wouldn't want to pull this dependency into the gem but it could certainly be written without it.

# app/overrides/maintenance_tasks/run_override.rb

module MaintenanceTasks
  class Run < ApplicationRecord
    scope :last_completed_per_task, -> do
      window_cte = define_window(:ended_at_window)
        .partition_by(:task_name, order_by: {ended_at: :desc})
        .select(:id)
        .select_window(:row_number, over: :ended_at_window, as: :ended_position)

      completed
        .with(run_ended_positions: window_cte)
        .joins("JOIN run_ended_positions ON run_ended_positions.id = maintenance_tasks_runs.id")
        .where(run_ended_positions: {ended_position: 1})
    end
  end
end

# app/overrides/maintenance_tasks/task_data_index_override.rb

module MaintenanceTasks
  class TaskDataIndex
    class << self
      def available_tasks
        tasks = []

        task_names = Task.available_tasks.map(&:name)

        active_runs = Run.with_attached_csv.active.where(task_name: task_names)
        active_runs.each do |run|
          tasks << TaskDataIndex.new(run.task_name, run)
          task_names.delete(run.task_name)
        end

        last_runs = Run.with_attached_csv.last_completed_per_task.where(task_name: task_names)
        task_names.map do |task_name|
          last_run = last_runs.find { |run| run.task_name == task_name }
          tasks << TaskDataIndex.new(task_name, last_run)
        end

        # We add an additional sorting key (status) to avoid possible
        # inconsistencies across database adapters when a Task has
        # multiple active Runs.
        tasks.sort_by! { |task| [task.name, task.status] }
      end
    end
  end
end

The meat of the change above is substituting the old:

        completed_runs = Run.completed.where(task_name: task_names)
        last_runs = Run.with_attached_csv.where(id: completed_runs.select("MAX(id) as id").group(:task_name))

for the new scope:

        last_runs = Run.with_attached_csv.last_completed_per_task.where(task_name: task_names)

With no other changes

# db/migrate/###_maintenance_tasks_runs_uuid_pkey.rb

class MaintenanceTasksRunsUuidPkey < ActiveRecord::Migration[7.0]
  def change
    add_column :maintenance_tasks_runs, :uuid, :uuid, default: "gen_random_uuid()", null: false
    change_table :maintenance_tasks_runs do |t|
      t.remove :id
      t.rename :uuid, :id
    end
    execute "ALTER TABLE maintenance_tasks_runs ADD PRIMARY KEY (id);"
  end
end

And now I can run CSV tasks. No tests, no promises, works for me.

gmcgibbon · 2024-01-19T05:02:18Z

Can we not refactor to use run timestamps in all of these cases? If someone is interested in issuing a PR where we use created_at and not id, I think that would solve both problems identified in this issue. There may be further issues, but IMO refactoring to not depend on incrementing IDs is better in the long run.

github-actions · 2024-03-20T01:43:45Z

This issue has been marked as stale because it has not been commented on in two months.
Please reply in order to keep the issue open. Otherwise, it will close in 14 days.
Thank you for contributing!

karnowski · 2024-04-01T19:27:56Z

Any movement on this? We're still impacted.

etiennebarrie · 2024-04-02T09:48:44Z

Are you interested in contributing? I think with the task_name, status, created_at DESC index, we could change this query to use timestamps like @gmcgibbon said. The main blocker for us maintainers is that we don't use UUID as primary keys, so we're not really confident in contributing this, but as mentioned above, it's something we want to support since it's in Rails/Active Record.

karnowski · 2024-04-05T17:42:42Z

My team is currently pretty heads down, but we could potentially look at it later this year. If this issue gets closed as stale, can we reopen it later in the year when we're able to better contribute?

shouichi changed the title ~~CSV fails if uuid is used for active_storage_attachments.record_id~~ CSV task fails if uuid is used for active_storage_attachments.record_id Jul 29, 2021

shouichi mentioned this issue Aug 11, 2021

Run tests against sqlite and postgres #468

Merged

shouichi mentioned this issue Sep 5, 2021

DO NOT MERGE: Use uuid for primary keys #482

Closed

nvasilevski mentioned this issue Jan 8, 2024

CSV Task Failure: ActiveRecord::NotNullViolation Error on UUID record_id #946

Closed

github-actions bot added the stale label Mar 20, 2024

github-actions bot removed the stale label Apr 2, 2024

nvasilevski added the railsconf-hackday label Apr 25, 2024

nvasilevski removed the railsconf-hackday label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV task fails if uuid is used for active_storage_attachments.record_id #462

CSV task fails if uuid is used for active_storage_attachments.record_id #462

shouichi commented Jul 29, 2021 •

edited

shouichi commented Jul 29, 2021

etiennebarrie commented Jul 29, 2021

shouichi commented Aug 4, 2021

etiennebarrie commented Aug 4, 2021

shouichi commented Aug 5, 2021

shouichi commented Aug 11, 2021

shouichi commented Aug 30, 2021

etiennebarrie commented Aug 30, 2021 •

edited

shouichi commented Sep 1, 2021

dhnaranjo commented Nov 28, 2022 •

edited

gmcgibbon commented Jan 19, 2024 •

edited

github-actions bot commented Mar 20, 2024

karnowski commented Apr 1, 2024

etiennebarrie commented Apr 2, 2024

karnowski commented Apr 5, 2024

CSV task fails if uuid is used for active_storage_attachments.record_id #462

CSV task fails if uuid is used for active_storage_attachments.record_id #462

Comments

shouichi commented Jul 29, 2021 • edited

shouichi commented Jul 29, 2021

etiennebarrie commented Jul 29, 2021

shouichi commented Aug 4, 2021

etiennebarrie commented Aug 4, 2021

shouichi commented Aug 5, 2021

shouichi commented Aug 11, 2021

shouichi commented Aug 30, 2021

etiennebarrie commented Aug 30, 2021 • edited

shouichi commented Sep 1, 2021

dhnaranjo commented Nov 28, 2022 • edited

gmcgibbon commented Jan 19, 2024 • edited

github-actions bot commented Mar 20, 2024

karnowski commented Apr 1, 2024

etiennebarrie commented Apr 2, 2024

karnowski commented Apr 5, 2024

shouichi commented Jul 29, 2021 •

edited

etiennebarrie commented Aug 30, 2021 •

edited

dhnaranjo commented Nov 28, 2022 •

edited

gmcgibbon commented Jan 19, 2024 •

edited