Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong results returned from RUM index with order_by_attach=TRUE #79

Open
michaelwu0505 opened this issue Feb 22, 2020 · 0 comments
Open

Comments

@michaelwu0505
Copy link

michaelwu0505 commented Feb 22, 2020

Below are two cases showing wrong results returned from RUM index with order_by_attach=TRUE. In both test cases, if order_by_attach=FALSE, correct results will be returned.

Tested with PostgreSQL 12.2 & newest commit e34375a.

CASE 1:

CREATE TABLE test (
	id bigint NOT NULL,
	folder bigint NOT NULL,
	time bigint NOT NULL,
	tsv tsvector NOT NULL
)
INSERT INTO test (id, folder, time, tsv) VALUES (1, 10, 100, to_tsvector('wordA'));
INSERT INTO test (id, folder, time, tsv) VALUES (2, 20, 200, to_tsvector('wordB'));
INSERT INTO test (id, folder, time, tsv) VALUES (3, 10, 300, to_tsvector('wordA'));
INSERT INTO test (id, folder, time, tsv) VALUES (4, 20, 400, to_tsvector('wordB'));

Below shows expected results when select without index:

SELECT * FROM test WHERE tsv @@ (to_tsquery('wordA')) AND (folder = 10::bigint)

Returns rows with id 1 and 3.

SELECT * FROM test WHERE tsv @@ (to_tsquery('wordA')) AND (folder = 10::bigint) ORDER BY time <=| 500::bigint

Returns rows with id 3 and 1.

After creating the following index, wrong results will be returned:

CREATE INDEX test_idx ON test USING rum(folder, tsv rum_tsvector_addon_ops, time) with (attach = 'time', to = 'tsv', order_by_attach=TRUE);
SET enable_seqscan = OFF;
SELECT * FROM test WHERE tsv @@ (to_tsquery('wordA')) AND (folder = 10::bigint)

Wrong result: Returns only row with id 1. (Expects 1 and 3)

SET enable_seqscan = OFF;
SELECT * FROM test WHERE tsv @@ (to_tsquery('wordA')) AND (folder = 10::bigint) ORDER BY time <=| 500::bigint

Wrong result: Returns nothing. (Expects rows with id 3 and 1)

CASE 2:

CREATE TABLE test2 (
	id bigint NOT NULL,
	time bigint NOT NULL,
	tsv tsvector NOT NULL
)
CREATE OR REPLACE PROCEDURE test2_init()
AS $$
DECLARE
	counter INTEGER := 1; 
	str TEXT;
	time BIGINT;
BEGIN
	WHILE counter <= 1000 LOOP
	
		IF counter % 10 = 0 THEN
			str := 'wordA wordB';
		ELSEIF counter % 11 = 0 THEN
			str := 'wordA wordB wordC';
		ELSE
			str := 'wordA wordD';
		END IF;
		
		-- insert rows with alternating time
		IF counter % 2 = 0 THEN
			time := counter;
		ELSE
			time := -counter;
		END IF;
		
		
		INSERT INTO test2 (id, time, tsv) VALUES (counter, time, to_tsvector(str));
		
		counter := counter + 1;
	END LOOP;
END
$$
LANGUAGE plpgsql;

CALL test2_init();

Below shows expected results when select without index:

SELECT * from test2 where tsv @@ (to_tsquery('wordA & wordB')) ORDER BY time <=| 1001::bigint 

Returned 181 rows.

After creating the following index, wrong results will be returned:

CREATE INDEX test2_idx ON test2 USING rum(tsv rum_tsvector_addon_ops, time) with (attach = 'time', to = 'tsv', order_by_attach=TRUE);
SET enable_seqscan = OFF;
SELECT * from test2 where tsv @@ (to_tsquery('wordA & wordB')) ORDER BY time <=| 1001::bigint 

Wrong result: Returns only 153 rows. (Expects 181 rows)

If the above query changes from ORDER BY time <=| 1001::bigint to ORDER BY <=> 1001::bigint, then correct number of rows will be returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant