Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for slow query on MS SQL #22522

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

boring-joey
Copy link

What's changed:

  • Resolved performance issue related to long-running queries in Azure SQL Database.
  • Simplified conditions in SQL queries to improve execution times.
  • Added optimization steps for better handling of schema updates in Directus.

Potential Risks / Drawbacks

  • Changes in SQL query handling might affect other parts of the system that rely on similar queries.

Review Notes / Questions

  • Please review the changes made to SQL query conditions and confirm they do not impact other functionalities.
  • Special attention should be paid to the handling of NULL values and the use of temporary tables for intermediate results.

Fixes #19486

You can review the related issue here.
#19486

Copy link

changeset-bot bot commented May 19, 2024

⚠️ No Changeset found

Latest commit: 64bdde8

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@jaads jaads self-requested a review May 21, 2024 15:04
@jaads jaads self-assigned this May 21, 2024
Copy link
Member

@nickrum nickrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, this completely removes the check which ensures that is_primary_key and is_unique are only set if index_column_count and index_priority are either 1 or NULL. Can we really drop that?

My best guess for the slow query is that using a function and a comparison, that does not establish a relation between the joined tables, inside the ON expression of a join somehow hits a slow path in Azure. Maybe moving the check into the subquery would help.

@boring-joey
Copy link
Author

@nickrum, is this something the Directus team will investigate further, or would you like me to look into it more? I noticed the issue has been moved to 'ready,' so I'm curious about the status.

@jaads
Copy link
Member

jaads commented May 24, 2024

I'm currently looking into it. I cannot reproduce the issue but that's only because I don't have a big and complex enough schema in my ms sql database. we're currently thinking about how we can spin up an arbitrary schema to do performance tests for introspection but right now it's tricky / time consuming.

But I can confirm that the query in this PR does not return the same result as the current one. The one in this PR returns more records. I'll have to investigate now that this does not introduce any other problem.

However, the fact that the PR does not introduce any additional integration tests to fail is a good sign that we can use the updated query :)

@jaads
Copy link
Member

jaads commented May 27, 2024

I compared the query results further and noticed that the is_primary_key and is_unique are not correct. so unfortunately, we cannot use the updated query as it is right now

Copy link
Member

@jaads jaads left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as described above

@alexchopin alexchopin modified the milestone: Next Minor Release May 30, 2024
@boring-joey
Copy link
Author

boring-joey commented Jun 3, 2024

I compared the query results further and noticed that the is_primary_key and is_unique are not correct. so unfortunately, we cannot use the updated query as it is right now

@jaads How did you verify this? Is there a test setup I can use to further evaluate the query's outcome? I'm happy to help resolve this issue. It remains a significant problem in our Azure environment, especially when editing our data model, as we are doing this month for our production website. Each data model edit currently takes 30-60 seconds, which is quite disruptive.

@jaads
Copy link
Member

jaads commented Jun 3, 2024

I enabled logging, started directus on the main branch, copied the introspection query from the logs into my sql editor, (deleted the schema condition at the very end of the query) and run the query. then I adopted the query with the changes from this PR, run it again and notice those mentioned differences.

however! now I did the same thing and but this time the output seem to be the same 🫨 so i'm either doing something wrong now or last time.

here are the queries for which I compared the results.
the current query:

select [o].[name]                                 AS [table],
       [c].[name]                                 AS [name],
       [t].[name]                                 AS [data_type],
       [c].[max_length]                           AS [max_length],
       [c].[precision]                            AS [numeric_precision],
       [c].[scale]                                AS [numeric_scale],
       CASE
           WHEN [c].[is_nullable] = 0 THEN
               'NO'
           ELSE
               'YES'
           END                                    AS [is_nullable],
       object_definition([c].[default_object_id]) AS [default_value],
       [i].[is_primary_key],
       [i].[is_unique],
       CASE [c].[is_identity]
           WHEN 1 THEN
               'YES'
           ELSE
               'NO'
           END                                    AS [has_auto_increment],
       OBJECT_NAME([fk].[referenced_object_id])   AS [foreign_key_table],
       COL_NAME([fk].[referenced_object_id],
                [fk].[referenced_column_id])      AS [foreign_key_column],
       [cc].[is_computed]                         as [is_generated],
       [cc].[definition]                          as [generation_expression]
from [master].[sys].[columns] [c]
         JOIN [sys].[types] [t] ON [c].[user_type_id] = [t].[user_type_id]
         JOIN [sys].[tables] [o] ON [o].[object_id] = [c].[object_id]
         JOIN [sys].[schemas] [s] ON [s].[schema_id] = [o].[schema_id]
         LEFT JOIN [sys].[computed_columns] AS [cc]
                   ON [cc].[object_id] = [c].[object_id] AND [cc].[column_id] = [c].[column_id]
         LEFT JOIN [sys].[foreign_key_columns] AS [fk]
                   ON [fk].[parent_object_id] = [c].[object_id] AND [fk].[parent_column_id] = [c].[column_id]
         LEFT JOIN (SELECT [ic].[object_id],
                           [ic].[column_id],
                           [ix].[is_unique],
                           [ix].[is_primary_key],
                           MAX([ic].[index_column_id])
                               OVER (partition by [ic].[index_id], [ic].[object_id])       AS index_column_count,
                           ROW_NUMBER() OVER (
                               PARTITION BY [ic].[object_id], [ic].[column_id]
                               ORDER BY [ix].[is_primary_key] DESC, [ix].[is_unique] DESC) AS index_priority
                    FROM [sys].[index_columns] [ic]
                             JOIN [sys].[indexes] AS [ix] ON [ix].[object_id] = [ic].[object_id]
                        AND [ix].[index_id] = [ic].[index_id]) AS [i] ON [i].[object_id] = [c].[object_id]
    AND [i].[column_id] = [c].[column_id]
    AND ISNULL([i].[index_column_count], 1) = 1
    AND ISNULL([i].[index_priority], 1) = 1;

and the updated query from this PR

select [o].[name]                                 AS [table],
       [c].[name]                                 AS [name],
       [t].[name]                                 AS [data_type],
       [c].[max_length]                           AS [max_length],
       [c].[precision]                            AS [numeric_precision],
       [c].[scale]                                AS [numeric_scale],
       CASE
           WHEN [c].[is_nullable] = 0 THEN
               'NO'
           ELSE
               'YES'
           END                                    AS [is_nullable],
       object_definition([c].[default_object_id]) AS [default_value],
       [i].[is_primary_key],
       [i].[is_unique],
       CASE [c].[is_identity]
           WHEN 1 THEN
               'YES'
           ELSE
               'NO'
           END                                    AS [has_auto_increment],
       OBJECT_NAME([fk].[referenced_object_id])   AS [foreign_key_table],
       COL_NAME([fk].[referenced_object_id],
                [fk].[referenced_column_id])      AS [foreign_key_column],
       [cc].[is_computed]                         as [is_generated],
       [cc].[definition]                          as [generation_expression]
from [master].[sys].[columns] [c]
         JOIN [sys].[types] [t] ON [c].[user_type_id] = [t].[user_type_id]
         JOIN [sys].[tables] [o] ON [o].[object_id] = [c].[object_id]
         JOIN [sys].[schemas] [s] ON [s].[schema_id] = [o].[schema_id]
         LEFT JOIN [sys].[computed_columns] AS [cc]
                   ON [cc].[object_id] = [c].[object_id] AND [cc].[column_id] = [c].[column_id]
         LEFT JOIN [sys].[foreign_key_columns] AS [fk]
                   ON [fk].[parent_object_id] = [c].[object_id] AND [fk].[parent_column_id] = [c].[column_id]
         LEFT JOIN (SELECT [ic].[object_id],
                           [ic].[column_id],
                           [ix].[is_unique],
                           [ix].[is_primary_key],
                           COALESCE(MAX([ic].[index_column_id])
                                        OVER (partition by [ic].[index_id], [ic].[object_id]), 1) AS index_column_count,
                           COALESCE(ROW_NUMBER() OVER (
                               PARTITION BY [ic].[object_id], [ic].[column_id]
                               ORDER BY [ix].[is_primary_key] DESC, [ix].[is_unique] DESC), 1)   AS index_priority
                    FROM [sys].[index_columns] [ic]
                             JOIN [sys].[indexes] AS [ix] ON [ix].[object_id] = [ic].[object_id]
                        AND [ix].[index_id] = [ic].[index_id]) AS [i] ON [i].[object_id] = [c].[object_id]
    AND [i].[column_id] = [c].[column_id];

are the results the same for you? @boring-joey 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

Slow schema introspection query with Azure SQL
5 participants