Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add side_by_side mode to update_materialized_view #387

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Roguelazer
Copy link

This adds a side_by_side kwarg to the update_materialized_view method, which builds the new view alongside the old one and then atomically swaps them to reduce downtime at the cost of increasing disk usage. It is plumbed through to migrations as a hash value for the materialized kwarg of update_view.

Fixes #386.

This adds a `side_by_side` kwarg to the `update_materialized_view`
method, which builds the new view alongside the old one and then
atomically swaps them to reduce downtime at the cost of increasing disk
usage. It is plumbed through to migrations as a hash value for the
`materialized` kwarg of `update_view`.
@Roguelazer
Copy link
Author

it's somewhat maddening that this repository doesn't have a working rubocop.yml (what is in there is a many-years-old version of rubocop and seems to disagree with how hound is configured).

@ferblape
Copy link

ferblape commented Apr 5, 2023

We also implemented something similar in a project using Scenic, but we had some issues with the indexes of the view you are replacing: you should drop them in the current view before adding the swap one, otherwise when you rename the view you'll have conflicts in the naming.

@Roguelazer
Copy link
Author

Ah, good point. I could probably modify this to rename all the indices, too…

@Roguelazer
Copy link
Author

Actually, this won't cause conflicts because the indexes on the new view get created after the swap, and the drop at the end of the swap removes the conflicting indexes.

That being said, building the indexes after the drop means that the indexes are built while holding the access exclusive lock, which probably is not a good idea.

@renchap
Copy link

renchap commented Jun 9, 2023

@calebhearth @derekprior Gentle ping on this, do you need anything else for this PR to be merged?

This would really help some Mastodon materialized view changes :)

@Roguelazer
Copy link
Author

Any changes you'd like me to make to this?

@gsinkin
Copy link

gsinkin commented Aug 23, 2023

Just adding another nudge here

@Roguelazer
Copy link
Author

Anything you'd like me to do to help move this forward?

Copy link
Contributor

@derekprior derekprior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some issues that we're likely to encounter with this in the wild that require some more thought.

I also would like to encapsulate the functionality a bit differently but I don't have and easily explainable idea at the moment. Best bet may be to tackle the outstanding questions to get to a mergeable idea and then we can refactor after that.

@@ -155,17 +155,36 @@ def create_materialized_view(name, sql_definition, no_data: false)
# @param no_data [Boolean] Default: false. Set to true to create
# materialized view without running the associated query. You will need
# to perform a non-concurrent refresh to populate with data.
# @param side_by_side [Boolean] Default: false. Set to true to create the
# new version under a different name and atomically swap them, limiting
# downtime at the cost of doubling disk usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be more precise than downtime. We limit the amount of time the view is inaccessible, perhaps?

Comment on lines 166 to 168
def update_materialized_view(
name, sql_definition, no_data: false, side_by_side: false
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know hound is not going to like this due to line length, but I don't care....

Suggested change
def update_materialized_view(
name, sql_definition, no_data: false, side_by_side: false
)
def update_materialized_view(name, sql_definition, no_data: false, side_by_side: false)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd need to also add some comments to make rubocop be quiet. I don't mind doing so (and actually set Metrics/LineLength to 120 on all my projects anyway).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess rubocop isn't automatically run (and the config file is remarkably old; at least <1.7 and broken with current rubocop versions) so I won't worry about it

#
# @raise [MaterializedViewsNotSupportedError] if the version of Postgres
# in use does not support materialized views.
#
# @return [void]
def update_materialized_view(name, sql_definition, no_data: false)
def update_materialized_view(
name, sql_definition, no_data: false, side_by_side: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure side_by_side is the name I'd opt for here but I'm lacking a better suggetion at the moment other than background which I don't love either.

In your experience, is side_by_side an established name for this process?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "online" might be a more established name but I was trying to think of something that was very specific about what it did

) do
create_materialized_view(new_name, sql_definition, no_data: no_data)
end
rename_materialized_view(name, old_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to drop here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did it this way out of the assumption that we might commit after the rename (but before the drop) and pause to give other threads a chance to finish using that view before we drop it, to reduce the lock impact. I didn't actually build that, of course.

create_materialized_view(name, sql_definition, no_data: no_data)
if side_by_side
session_id = Time.now.to_i
new_name = "#{name}_new_#{session_id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres identifiers are limited to 63 characters. We're removing 15 characters from that allotment already. This will likely be problematic for some view names. What does Rails do when, for example, a generated index name would be too long? We may need to access that functionality (or copy it) here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC they use a truncated hash of the generated components.

The name only matters for debugging, so I'd be fine with doing that. Something like

new_name = "#{name}_new_#{session_id}"
if new_name.size > 63
  # session_id is 10c, _new_ is 5c; leave lots of space free
  new_name = "#{Digest::SHA256.hexdigest(name)[...40]}_new_#{session_id}"
end

def on_side_by_side(name, new_table_name, temporary_id)
indexes = Indexes.new(connection: connection).on(name)
indexes.each_with_index do |index, i|
old_name = "predrop_index_#{temporary_id}_#{i}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it cuts 24 characters off the alloted 63 for postgres identifiers (see earlier comment). We may need to think more about how to lessen the chance this is an issue for folks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't actually have the index name in it (it just looks like predrop_index_1705535993_1). I'm pretty sure I did it that way because I hit the index size limit. :-D

@@ -271,9 +304,19 @@ def refresh_dependencies_for(name, concurrently: false)
name,
self,
connection,
concurrently: concurrently,
concurrently: concurrently
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style/TrailingCommaInArguments: Put a comma after the last parameter of a multiline method call.

if side_by_side
session_id = Time.now.to_i
new_name = generate_name name, "new_#{session_id}"
drop_name = generate_name name, "drop_#{session_id}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lint/UselessAssignment: Useless assignment to variable - drop_name.

#
# @raise [MaterializedViewsNotSupportedError] if the version of Postgres
# in use does not support materialized views.
#
# @return [void]
def update_materialized_view(name, sql_definition, no_data: false)
def update_materialized_view(name, sql_definition, no_data: false, side_by_side: false)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/LineLength: Line is too long. [93/80]

@@ -155,17 +157,34 @@ def create_materialized_view(name, sql_definition, no_data: false)
# @param no_data [Boolean] Default: false. Set to true to create
# materialized view without running the associated query. You will need
# to perform a non-concurrent refresh to populate with data.
# @param side_by_side [Boolean] Default: false. Set to true to create the
# new version under a different name and atomically swap them, limiting
# the time that a view is inaccessible at the cost of doubling disk usage
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics/LineLength: Line is too long. [81/80]

#{sql_definition.rstrip.chomp(';')}
#{'WITH NO DATA' if no_data};
#{sql_definition.rstrip.chomp(";")}
#{"WITH NO DATA" if no_data};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style/StringLiteralsInInterpolation: Prefer single-quoted strings inside interpolations.

@@ -137,8 +139,8 @@ def create_materialized_view(name, sql_definition, no_data: false)

execute <<-SQL
CREATE MATERIALIZED VIEW #{quote_table_name(name)} AS
#{sql_definition.rstrip.chomp(';')}
#{'WITH NO DATA' if no_data};
#{sql_definition.rstrip.chomp(";")}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style/StringLiteralsInInterpolation: Prefer single-quoted strings inside interpolations.

if rails_version == "master"
rails_constraint = { github: "rails/rails" }
rails_constraint = if rails_version == "master"
{github: "rails/rails"}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layout/IndentationWidth: Use 2 (not -17) spaces for indentation.
Layout/SpaceInsideHashLiteralBraces: Space inside { missing.
Layout/SpaceInsideHashLiteralBraces: Space inside } missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make update_view for materialized views create less downtime
5 participants