Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate_status_report performance - SQL improvements #596

Open
JoshAudette opened this issue Feb 13, 2024 · 4 comments
Open

generate_status_report performance - SQL improvements #596

JoshAudette opened this issue Feb 13, 2024 · 4 comments

Comments

@JoshAudette
Copy link

JoshAudette commented Feb 13, 2024

Issue:
The \tool_objectfs\task\generate_status_report task can be quite slow and costly in terms of database resources.

Recently we have experienced SQL queries failing on MySQL with a message indicating temporary storage space full. On reconfiguring MySQL to favour memory (temptable_max_ram tweak) we saw the query succeed for awhile, but lately the MySQL database exhausts memory causing it to restart.

We have had a look at changing the SQL queries used by the report.

See also: #572

@JoshAudette
Copy link
Author

JoshAudette commented Feb 13, 2024

Hi,

We've come up with a SQL query using Common Table Elements approach. This was meant to help make the queries easier to read / understand but in practice trials these perform much faster. As well, the MySQL server doesn't exhaust resources running these, even when using a lower instance class.

Downside, MySQL 5.7 supposedly doesn't support CTE / the WITH clause, so I guess this would require a new branch for Moodle 4.2 where MySQL 8 is the minimum.

For location = DUPLICATED, EXTERNAL, ERROR

WITH
  cte_objects AS (
    SELECT 
    o.contenthash, 
    o.location
    FROM {tool_objectfs_objects} o
    WHERE o.location = ? ),
  cte_obj_files AS (
    SELECT 
    f.contenthash, 
    MAX(f.filesize) AS filesize
    FROM {files} f
    INNER JOIN cte_objects co ON f.contenthash = co.contenthash
    WHERE filesize > 0 
    GROUP BY f.contenthash, f.filesize )
SELECT
COALESCE(COUNT(cof.contenthash),0) AS objectcount,
COALESCE(SUM(cof.filesize),0) AS objectsum
FROM cte_obj_files cof

For location = LOCAL

WITH
  cte_objects AS (
    SELECT 
    o.contenthash, 
    o.location
    FROM {tool_objectfs_objects} o ),
  cte_obj_files AS (
    SELECT 
    f.contenthash, 
    MAX(f.filesize) AS filesize
    FROM {files} f
    LEFT JOIN cte_objects co ON f.contenthash = co.contenthash
    WHERE filesize > 0 AND ( co.location = ? OR co.location IS NULL )
    GROUP BY f.contenthash, f.filesize )
SELECT
COALESCE(COUNT(cof.contenthash),0) AS objectcount,
COALESCE(SUM(cof.filesize),0) AS objectsum
FROM cte_obj_files cof

For location = ORPHANED

WITH
  cte_objects AS (
    SELECT 
    o.contenthash 
    FROM {tool_objectfs_objects} o
    WHERE o.location = ? )
SELECT
COALESCE(COUNT(co.contenthash),0) AS objectcount
FROM cte_objects co

@JoshAudette
Copy link
Author

We've trialled these queries on various Moodle databases of various sizes on MySQL 8 and on PostgreSQL 14.

In all cases they return the same numbers as their current counterparts.

For all cases except location = LOCAL, they execute much faster.

For case of location = LOCAL there is not as much improvement in speed compared to its current counterpart, but it's worth noting that even with a lower AWS instance class (meaning less CPU and less memory) the new query returns whereas the old one causes the issues described earlier.

@brendanheywood
Copy link
Contributor

@JoshAudette rather than code in a comment can you turn this into a PR please?

@JoshAudette
Copy link
Author

Hi @brendanheywood PR is ready, although these SQL queries use the WITH clause (Common Table Expressions style) and that's only supported in MySQL since version 8, PostgreSQL since version 12.

I think we may need a base branch MOODLE_402_STABLE and change the PR to merge with that branch.

michaelkotlyar added a commit that referenced this issue Feb 28, 2024
…report_sql_tweaks

Issue #596: SQL change with some performance / resource improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants