Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oracle CERT May Have A High False Positive Rate #855

Open
jinhui-lai opened this issue Jul 11, 2023 · 7 comments
Open

Oracle CERT May Have A High False Positive Rate #855

jinhui-lai opened this issue Jul 11, 2023 · 7 comments

Comments

@jinhui-lai
Copy link

Hi. I have tested MySQL with Oracle CERT, and 1000+ .log files could be generated in an hour. Obviously, Oracle CERT may have a high false positive rate, which makes it difficult to reproduce. Most of .log files as follow.

--java.lang.AssertionError: Inconsistent result for query: EXPLAIN SELECT t0.c2 AS ref0 FROM t0; --1
--EXPLAIN SELECT ALL t0.c2 AS ref0 FROM t0 WHERE (LEAST(-153311606, "", "-1875220950", NULL)) OR (t0.c0); --2
--	at sqlancer.mysql.oracle.MySQLCERTOracle.check(MySQLCERTOracle.java:88)
--	at sqlancer.ProviderAdapter.generateAndTestDatabase(ProviderAdapter.java:61)
--	at sqlancer.Main$DBMSExecutor.run(Main.java:387)
--	at sqlancer.Main$2.run(Main.java:582)
--	at sqlancer.Main$2.runThread(Main.java:564)
--	at sqlancer.Main$2.run(Main.java:555)
--	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
--	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
--	at java.base/java.lang.Thread.run(Thread.java:829)
---- Time: 2023/07/10 05:37:00
-- Database: database0
-- Database version: 8.0.33
-- seed value: 1688967294765
DROP DATABASE IF EXISTS database0;
CREATE DATABASE database0;
USE database0;
CREATE TABLE t0(c0 INT(73) ZEROFILL  NULL  COMMENT 'asdf'  UNIQUE STORAGE MEMORY COLUMN_FORMAT DEFAULT, c1 DOUBLE  UNIQUE  COMMENT 'asdf'  , c2 SMALLINT ZEROFILL  STORAGE MEMORY COLUMN_FORMAT DYNAMIC NULL  COMMENT 'asdf' ) MIN_ROWS = 1982191339202676555, STATS_PERSISTENT = DEFAULT, COMPRESSION = 'LZ4', MAX_ROWS = 4880837524191020543;
REPLACE INTO t0(c1) VALUES('');
INSERT INTO t0(c2) VALUES('859573787');
INSERT INTO t0(c1, c2) VALUES("_⶷_w	nI", 0.752389860419101);
REPLACE INTO t0(c1, c0, c2) VALUES(-239589010, 1079229920, NULL), (0.14520110024174904, 1212654633, NULL), (-546666147, 0.07258431801038334, NULL), (0.3822153866378174, NULL, NULL), (0.923574233622857, 0.1118041313546595, '-788119057');
REPLACE LOW_PRIORITY INTO t0(c0, c1) VALUES("", -1396893531);
INSERT HIGH_PRIORITY INTO t0(c2) VALUES(0.9528411231403249);
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database0';
REPLACE LOW_PRIORITY INTO t0(c1) VALUES(NULL);
INSERT INTO t0(c2, c1, c0) VALUES('', 0.375561287719101, -561381508);
REPLACE DELAYED INTO t0(c2, c0) VALUES(-2025801188, NULL);
REPLACE DELAYED INTO t0(c1) VALUES(NULL), (NULL), (1762071404);
INSERT LOW_PRIORITY IGNORE INTO t0(c1) VALUES('쀞0Jd');
DELETE IGNORE FROM t0;
REPLACE INTO t0(c2, c0, c1) VALUES(0.6253181964749023, NULL, NULL), (888001759, -239081148, 0.3906082337718525), (NULL, 0.5586688737539388, NULL), (NULL, 615794320, NULL), (-1849166220, 0.9393324784034404, NULL);
REPLACE INTO t0(c1) VALUES(NULL);
REPLACE LOW_PRIORITY INTO t0(c2, c0, c1) VALUES(NULL, NULL, 'o(fTg'), (-909126894, -5.61381508E8, '6ꐩK^Y4[#Q'), (0.5765530836566802, -307854252, 880914973);
INSERT INTO t0(c1, c2) VALUES("", 0.5807169089183865), (NULL, "1432360005"), (670635765, 1628746943);
INSERT INTO t0(c2, c0) VALUES("0.7126051377144537", "៎AI");
REPLACE INTO t0(c2, c0, c1) VALUES("Be#0!랃YN", NULL, "?8uJ2");
INSERT HIGH_PRIORITY INTO t0(c1, c2) VALUES(NULL, "1673248112");
REPLACE INTO t0(c2) VALUES(NULL);
TRUNCATE TABLE t0;
DELETE IGNORE FROM t0 WHERE 0.5795074530490835;
REPLACE DELAYED INTO t0(c2, c0) VALUES(NULL, 1.942140452E9);
DELETE LOW_PRIORITY QUICK IGNORE FROM t0 WHERE LEAST((t0.c0) > (1518128610), IF(t0.c1, -19466496, t0.c2), (NOT (0.4224603235271819)),  EXISTS (SELECT 1 wHERE FALSE));
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database0';
CHECKSUM TABLE t0 EXTENDED;
INSERT HIGH_PRIORITY IGNORE INTO t0(c0) VALUES('NWD');
REPLACE INTO t0(c0, c1, c2) VALUES(0.007537957146379282, "\ru0J", NULL);
INSERT IGNORE INTO t0(c1, c0) VALUES('0.021155648495690293', 2055866235);
INSERT DELAYED IGNORE INTO t0(c2) VALUES(1287214582);
REPLACE INTO t0(c1) VALUES("-278424497"), (NULL), (''), (NULL), (347504061);
INSERT DELAYED INTO t0(c1, c2) VALUES(990540500, "HB");
REPLACE INTO t0(c2, c0) VALUES(823067160, NULL);
INSERT LOW_PRIORITY INTO t0(c0, c1, c2) VALUES("5v", 0.5654704512459118, -3.46589147E8);
REPLACE INTO t0(c0, c1, c2) VALUES('', -225388183, 0.27747208297512005);
REPLACE DELAYED INTO t0(c0) VALUES(NULL);
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database0';
INSERT INTO t0(c0) VALUES(NULL);
DELETE QUICK IGNORE FROM t0;
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database0';
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database0';
INSERT DELAYED INTO t0(c0, c2, c1) VALUES(1552161907, NULL, -1586513798), ('A+ऄr#N', 0.4236676639869805, NULL), (-1467807582, NULL, 0.6503392130416082);
REPLACE LOW_PRIORITY INTO t0(c0) VALUES(NULL);
INSERT IGNORE INTO t0(c2, c0) VALUES(-1747530225, 0.4671671441778463);
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database0';
SET SESSION max_points_in_geometry = 298633;
INSERT IGNORE INTO t0(c2) VALUES('1137262844');
SET SESSION show_create_table_verbosity = OFF;
REPLACE LOW_PRIORITY INTO t0(c0) VALUES(0.2034715224028908), (-1438613202), (0.4978720086307491);
REPLACE INTO t0(c0, c2, c1) VALUES(0.7099963347084008, 1628746943, 8.089672859561414E-4);
INSERT LOW_PRIORITY IGNORE INTO t0(c1) VALUES('');
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database0';
REPLACE INTO t0(c2, c1) VALUES(0.8478241756046616, 2031353683);
REPLACE INTO t0(c1, c2, c0) VALUES(NULL, '_D(]kA-B', 1001736692);
INSERT LOW_PRIORITY INTO t0(c2) VALUES(1309312895);
REPLACE INTO t0(c1) VALUES(0.41964587154852273);
REPLACE INTO t0(c0) VALUES(0.19699186064316832);
SET SESSION max_length_for_sort_data = 3783978;
INSERT INTO t0(c0, c1) VALUES(0.7746892469573985, NULL);
INSERT IGNORE INTO t0(c1) VALUES(-1492747101), (NULL), (-1875220950);
INSERT HIGH_PRIORITY IGNORE INTO t0(c2) VALUES(-346589147);
DELETE QUICK IGNORE FROM t0;
INSERT IGNORE INTO t0(c1, c0, c2) VALUES(NULL, '{}', -1903308817), (0.08155896092142245, -2113156198, 484991709), (-2080279731, -1981754836, -553381463);
REPLACE LOW_PRIORITY INTO t0(c2, c1, c0) VALUES(NULL, 0.3856561940870582, 0.3562319398130974);
REPLACE INTO t0(c1) VALUES("");
INSERT HIGH_PRIORITY INTO t0(c1, c2) VALUES(NULL, NULL);
SET SESSION max_points_in_geometry = 329273;
INSERT DELAYED IGNORE INTO t0(c1, c0, c2) VALUES("TRUE", 1945914573, -2024566661);
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database0';
ANALYZE TABLE t0 UPDATE HISTOGRAM ON c0, c1, c2;

When the initial state of the database is too simple, Oracle CERT judging the performance issues by rowCount1 != rowCount2 may doesn't work, which is implemented as follows.

if (increase && rowCount1 > rowCount2 || !increase && rowCount1 < rowCount2) {
    throw new AssertionError("Inconsistent result for query: EXPLAIN " + queryString1 + "; --" + rowCount1
            + "\nEXPLAIN " + queryString2 + "; --" + rowCount2);
}

Do you think it is possible to improve from the following:

  1. Generating more complex database initial states.
  2. Using a more reasonable way to judge performance issues.
    Thank you!
@bajinsheng
Copy link
Collaborator

Thanks for your interest!

CERT does report a lot of issues, but it does not mean these issues are false alarms.

CERT identifies performance issues by examining whether a more restrictive query has a higher estimated cardinality than a less restrictive query (e.g., LEFT JOIN vs FULL JOIN). If so, there is an issue in cardinality estimation, and would affect performance. Therefore, here, according to this report, there is an issue in cardinality estimation as the first query should return at least 2 rows.

Nevertheless, I cannot reproduce your results. I rerun this test case on 8.0.33, and the first query returns 3 rows, and the second query returns 2 rows. Both estimated cardinalities are consistent with the oracle, and CERT will not report it in theory.

@jinhui-lai
Copy link
Author

Both estimated cardinalities are consistent with the oracle, and CERT will not report it in theory. As for this question, the following cases may inllustate it:

--java.lang.AssertionError: Inconsistent result for query: EXPLAIN SELECT DISTINCTROW t0.c0 AS ref0 FROM t0, t1 WHERE  EXISTS (SELECT 1) GROUP BY t0.c0; --5
--EXPLAIN SELECT DISTINCTROW t0.c0 AS ref0 FROM t0, t1 WHERE ( EXISTS (SELECT 1)) AND (((+ (t1.c0))) IN (NULL, t1.c0)) GROUP BY t0.c0; --12
--	at sqlancer.mysql.oracle.MySQLCERTOracle.check(MySQLCERTOracle.java:88)
--	at sqlancer.ProviderAdapter.generateAndTestDatabase(ProviderAdapter.java:61)
--	at sqlancer.Main$DBMSExecutor.run(Main.java:387)
--	at sqlancer.Main$2.run(Main.java:582)
--	at sqlancer.Main$2.runThread(Main.java:564)
--	at sqlancer.Main$2.run(Main.java:555)
--	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
--	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
--	at java.base/java.lang.Thread.run(Thread.java:829)
---- Time: 2023/07/10 05:36:45
-- Database: database11
-- Database version: 8.0.33
-- seed value: 1688967294780
DROP DATABASE IF EXISTS database11;
CREATE DATABASE database11;
USE database11;
CREATE TABLE t0(c0 TEXT  STORAGE MEMORY COLUMN_FORMAT FIXED) ;
CREATE TABLE t1 LIKE t0;
CREATE TABLE IF NOT EXISTS t2(c0 LONGTEXT   COMMENT 'asdf'  STORAGE DISK, c1 TINYTEXT  COLUMN_FORMAT DYNAMIC STORAGE DISK NULL) ;
REPLACE INTO t2(c0) VALUES('');
REPLACE DELAYED INTO t2(c1) VALUES(NULL);
TRUNCATE TABLE t2;
DELETE LOW_PRIORITY QUICK IGNORE FROM t2 WHERE (t2.c0) IS FALSE;
REPLACE LOW_PRIORITY INTO t1(c0) VALUES(NULL), (-307622127), (-1020641883), (NULL), (0.41344834529577634);
REPLACE INTO t0(c0) VALUES(NULL);
INSERT DELAYED INTO t0(c0) VALUES(1475373537);
ANALYZE  TABLE t1, t0, t2;
REPLACE INTO t2(c0, c1) VALUES(NULL, 'Qm**8G+');
REPLACE DELAYED INTO t2(c0, c1) VALUES(0.20071582645132857, -1122950538);
INSERT INTO t1(c0) VALUES(0.021357907471556414);
DELETE QUICK IGNORE FROM t0 WHERE (- (LEAST(t0.c0, 0.869009973875369)));
DELETE LOW_PRIORITY QUICK IGNORE FROM t0;
INSERT INTO t1(c0) VALUES(-914502975);
ALTER TABLE t0 INSERT_METHOD FIRST, ENABLE KEYS, ROW_FORMAT COMPACT;
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database11';
DELETE QUICK FROM t2;
INSERT INTO t0(c0) VALUES(566309486);
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database11';
INSERT IGNORE INTO t0(c0) VALUES(NULL), (0.48473252118101195), ("1263547280");
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database11';
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database11';
REPLACE DELAYED INTO t1(c0) VALUES(0.9719854361175173);
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database11';
REPLACE LOW_PRIORITY INTO t1(c0) VALUES(NULL);
DELETE IGNORE FROM t2;
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database11';
DELETE LOW_PRIORITY QUICK IGNORE FROM t2;
REPLACE INTO t0(c0) VALUES(275859224);
CREATE UNIQUE INDEX i0 ON t2((CAST( EXISTS (SELECT 1) AS SIGNED)));
INSERT INTO t1(c0) VALUES(NULL);
ALTER TABLE t2 COMPRESSION 'ZLIB';
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database11';
select TABLE_NAME, ENGINE from information_schema.TABLES where table_schema = 'database11';
ALTER TABLE t2 DISABLE KEYS, PACK_KEYS DEFAULT, DROP c1;
ALTER TABLE t0 COMPRESSION 'ZLIB', PACK_KEYS 0, STATS_AUTO_RECALC 1, RENAME TO t0, INSERT_METHOD FIRST;
REPLACE LOW_PRIORITY INTO t1(c0) VALUES('BU3');
INSERT HIGH_PRIORITY INTO t1(c0) VALUES(1635186931);
REPLACE INTO t2(c0) VALUES(0.8328083827218419);
ANALYZE TABLE t0 UPDATE HISTOGRAM ON c0;
ANALYZE TABLE t1 UPDATE HISTOGRAM ON c0;
ANALYZE TABLE t2 UPDATE HISTOGRAM ON c0;

Executing EXPLAIN SELECT DISTINCTROW t0.c0 AS ref0 FROM t0, t1 WHERE EXISTS (SELECT 1) GROUP BY t0.c0; , it return:

mysql>  EXPLAIN SELECT DISTINCTROW t0.c0 AS ref0 FROM t0, t1 WHERE  EXISTS (SELECT 1) GROUP BY t0.c0;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                         |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------+
|  1 | PRIMARY     | t0    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    5 |   100.00 | Using temporary               |
|  1 | PRIMARY     | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   12 |   100.00 | Using join buffer (hash join) |
|  2 | SUBQUERY    | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | No tables used                |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------+

Executing SELECT DISTINCTROW t0.c0 AS ref0 FROM t0, t1 WHERE EXISTS (SELECT 1) GROUP BY t0.c0; , it return:

mysql> SELECT DISTINCTROW t0.c0 AS ref0 FROM t0, t1 WHERE  EXISTS (SELECT 1) GROUP BY t0.c0;
+---------------------+
| ref0                |
+---------------------+
| 275859224           |
| 1263547280          |
| 0.48473252118101195 |
| NULL                |
| 566309486           |
+---------------------+
5 rows in set (0.00 sec)

Executing EXPLAIN SELECT DISTINCTROW t0.c0 AS ref0 FROM t0, t1 WHERE ( EXISTS (SELECT 1)) AND (((+ (t1.c0))) IN (NULL, t1.c0)) GROUP BY t0.c0; , it return:

mysql> EXPLAIN SELECT DISTINCTROW t0.c0 AS ref0 FROM t0, t1 WHERE ( EXISTS (SELECT 1)) AND (((+ (t1.c0))) IN (NULL, t1.c0)) GROUP BY t0.c0;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                         |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------+
|  1 | PRIMARY     | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   12 |    20.00 | Using where; Using temporary  |
|  1 | PRIMARY     | t0    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    5 |   100.00 | Using join buffer (hash join) |
|  2 | SUBQUERY    | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | No tables used                |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------+
3 rows in set, 1 warning (0.00 sec)

Executing SELECT DISTINCTROW t0.c0 AS ref0 FROM t0, t1 WHERE ( EXISTS (SELECT 1)) AND (((+ (t1.c0))) IN (NULL, t1.c0)) GROUP BY t0.c0; , it return:

mysql> SELECT DISTINCTROW t0.c0 AS ref0 FROM t0, t1 WHERE ( EXISTS (SELECT 1)) AND (((+ (t1.c0))) IN (NULL, t1.c0)) GROUP BY t0.c0;
+---------------------+
| ref0                |
+---------------------+
| 566309486           |
| NULL                |
| 0.48473252118101195 |
| 1263547280          |
| 275859224           |
+---------------------+
5 rows in set (0.00 sec)

The order of the tables returned by using explain is not fixed, SQLancer may writes wrong rowCount2 in .log file, which is implemented as follows:

    private int getRow(SQLGlobalState<?, ?> globalState, String selectStr, List<String> queryPlanSequences)
            throws AssertionError, SQLException {
        int row = -1;
        String explainQuery = "EXPLAIN " + selectStr;

        // Log the query
        if (globalState.getOptions().logEachSelect()) {
            globalState.getLogger().writeCurrent(explainQuery);
            try {
                globalState.getLogger().getCurrentFileWriter().flush();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        // Get the row count
        SQLQueryAdapter q = new SQLQueryAdapter(explainQuery, errors);
        try (SQLancerResultSet rs = q.executeAndGet(globalState)) {
            if (rs != null) {
                while (rs.next()) {
                    int estRows = rs.getInt(10);
                    if (row == -1) {
                        row = estRows;
                    }
                    String operation = rs.getString(2);
                    queryPlanSequences.add(operation);
                }
            }
        } catch (Exception e) {
            throw new AssertionError(q.getQueryString(), e);
        }
        if (row == -1) {
            throw new IgnoreMeException();
        }
        return row;
    }

All in all, my point is that finding real performance bugs among thousands of .log file is not easy.

Thank you for your reply!

@bajinsheng
Copy link
Collaborator

bajinsheng commented Jul 11, 2023

SQLancer may writes wrong rowCount2 in .log file, which is implemented as follows:

Can you clarify why you think it is wrong?

From the query plans you posted, the estimated cardinality of the first query is 5, and that of the second query is 12. The second query is more restrictive than the first query, so the numbers are unexpected indicating a potential performance issue. What we examined is the number of rows in the query plans, not the real number of rows.

All in all, my point is that finding real performance bugs among thousands of .log file is not easy.

As far as I know, I have not found any false alarms in these reports. These issues are true issues in cardinality estimation, and fixing them definitely would improve performance as we demonstrated in Paper

@jinhui-lai
Copy link
Author

Hello!

The estimated cardinality of the first query is 5(t0), 12(t1), and the second query is 12(t1), 5(t0) too. So i think that the estimated cardinality of these two queries has no difference.

As i have mentioned, the order of the tables's row returned by using explain is not fixed(maybe return t0 fisrt, maybe not), but the code above doesn't make the distinction.

Thank you!

@jinhui-lai jinhui-lai changed the title Oracle CER May Have A High False Positive Rate Oracle CERT May Have A High False Positive Rate Jul 11, 2023
@bajinsheng
Copy link
Collaborator

Thanks for pointing out this issue!

I see. The reason for this is both query plans have different orders of operations for scanning tables. Can you give me more details on how you configure MySQL? I did not know the ordering of operations in query plans can vary. And also, I rerun this test case, and have not observed the second query plan, in which t2 is before t1. In theory, the first operation in a query plan represents the root operation, and the ordering of operations should be fixed. Therefore, I am not very sure how can it happen and never observed it previously.

@jinhui-lai
Copy link
Author

Thank you for your reply.

I compiled and installed MySQL from github source. I initialize datadir with --initialize-insecure, as follows:

/home/mysql/myinstall/bin/mysqld  --initialize-insecure --basedir=/home/mysql/myinstall --datadir=/home/mysql/myinstall/data
/home/mysql/myinstall/bin/mysqld --basedir=/home/mysql/myinstall --datadir=/home/mysql/myinstall/data -P 3306  &
java -jar /home/sqlancer/target/sqlancer-2.0.0.jar --num-tries 10000 --username root --password '' mysql --oracle CERT

There is no other configuration else. As for whether the order of operations should be fixed, I have not been able to prove it yet.

@bajinsheng
Copy link
Collaborator

I may need some time to investigate why the query plans in MySQL are not in a fixed order.

For your first suggestion, a more complicated database state, I agree. You can consider adjusting the parameter --max-num-inserts to insert more rows in tables. The default number is 30, which is not ideal for the CERT oracle.

Thanks for your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants