Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect query results and problems with FULL GROUP BY #185

Open
solomonty opened this issue Jan 24, 2017 · 5 comments
Open

Incorrect query results and problems with FULL GROUP BY #185

solomonty opened this issue Jan 24, 2017 · 5 comments

Comments

@solomonty
Copy link

solomonty commented Jan 24, 2017

I noticed that I was getting an invalid mapping of hostname_max and db_max after adding them to the query.

It appears that the GROUP BY is not properly grouping the data.

If I turn on FULL GROUP BY then I get an error message from MySQL.

Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'slow_query_log.dimension.sample' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by (1055)
@solomonty
Copy link
Author

These two columns get added to the query

  MAX(hostname_max) AS hostname_max,
  MAX(db_max) AS db_max,

@solomonty
Copy link
Author

solomonty commented Jan 24, 2017

Here is the query generated by Anemometer

SELECT checksum AS `checksum`,
  LEFT(dimension.sample,20) AS `snippet`,
  ROUND(SUM(Rows_examined_sum)/SUM(rows_sent_sum),2) AS `index_ratio`,
  SUM(Query_time_sum) / SUM(ts_cnt) AS `query_time_avg`,
  ROUND(SUM(Rows_sent_sum)/SUM(ts_cnt),0) AS `rows_sent_avg`,
  MAX(hostname_max) AS `hostname_max`,
  MAX(db_max) AS `db_max`,
  SUM(ts_cnt) AS `ts_cnt`,
  SUM(Query_time_sum) AS `Query_time_sum`,
  SUM(Lock_time_sum) AS `Lock_time_sum`,
  SUM(Rows_sent_sum) AS `Rows_sent_sum`,
  SUM(Rows_examined_sum) AS `Rows_examined_sum`,
  SUM(Full_scan_sum) AS `Full_scan_sum`,
  SUM(Tmp_table_sum) AS `Tmp_table_sum`,
  SUM(Filesort_sum) AS `Filesort_sum`
 FROM `global_query_review` AS `fact`
 JOIN `global_query_review_history` AS `dimension` USING (`checksum`)
 WHERE dimension.ts_min >= "2017-01-23 02:39:13"
  AND dimension.ts_min <= "2017-01-24 02:39:13"
 GROUP BY checksum
 ORDER BY Query_time_sum DESC
 LIMIT 20

@solomonty
Copy link
Author

The issue is that

MAX(hostname_max) and MAX(db_max) don't produce values that are from the same row.

@solomonty
Copy link
Author

Adding snippet to the GROUP BY eliminates the FULL GROUP BY error but doesn't fix the query result.

GROUP BY checksum, snippet

@solomonty
Copy link
Author

Removing the MAX() for hostname and db and adding them to the GROUP BY appears to return correct results and works with FULL GROUP BY.

 SELECT checksum AS `checksum`,
  LEFT(dimension.sample,20) AS `snippet`,
  ROUND(SUM(Rows_examined_sum)/SUM(rows_sent_sum),2) AS `index_ratio`,
  SUM(Query_time_sum) / SUM(ts_cnt) AS `query_time_avg`,
  ROUND(SUM(Rows_sent_sum)/SUM(ts_cnt),0) AS `rows_sent_avg`,
  hostname_max AS `hostname_max`,
  db_max AS `db_max`,
  SUM(ts_cnt) AS `ts_cnt`,
  SUM(Query_time_sum) AS `Query_time_sum`,
  SUM(Lock_time_sum) AS `Lock_time_sum`,
  SUM(Rows_sent_sum) AS `Rows_sent_sum`,
  SUM(Rows_examined_sum) AS `Rows_examined_sum`,
  SUM(Full_scan_sum) AS `Full_scan_sum`,
  SUM(Tmp_table_sum) AS `Tmp_table_sum`,
  SUM(Filesort_sum) AS `Filesort_sum`
 FROM `global_query_review` AS `fact`
 JOIN `global_query_review_history` AS `dimension` USING (`checksum`)
 WHERE dimension.ts_min >= "2017-01-23 02:39:13"
  AND dimension.ts_min <= "2017-01-24 02:39:13"
 GROUP BY checksum, snippet, hostname_max, db_max
 ORDER BY Query_time_sum DESC
 LIMIT 20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant