Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch rewrite support for psql 15 Merge statement (#2579) #2582

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mmimica
Copy link

@mmimica mmimica commented Aug 2, 2022

See #2579.

Adds parser support for MERGE statements. Checks its specific to enable "batchReWrite" feature originally written for INSERT statements.

@vlsi
Copy link
Member

vlsi commented Aug 2, 2022

Thanks for putting the fix together.

However, I wonder what would be the behavior
a) if the syntax does not match exactly. For instance merge into ... using (select * from values(?,?)). Would it still work?
b) What happens if the bind variable is used in when block. For instance when matched then insert (id, name) values (src.id, ?). Would it still replace?

Would you please update the documentation to list the supported and unsupported syntaxes? It would be nice to have them tested as well (I mean the tests should use end-to-end MERGE + batch API).

@mmimica
Copy link
Author

mmimica commented Aug 2, 2022

Hi.

I believe the correct answers are
a) yes
b) no

I've pushed a couple of more test-cases that confirm it.

I admit the solution is a bit hackish, but so is the whole feature.

@davecramer
Copy link
Member

@vlsi can we merge this ?

@vlsi vlsi force-pushed the feature/support_merge_rewrite branch 2 times, most recently from 6537884 to a0861f2 Compare January 16, 2023 08:58
@davecramer
Copy link
Member

Is it possible to test when MERGE won't work ?

@vlsi
Copy link
Member

vlsi commented Jan 17, 2023

We should definitely test and document cases when rewrite is not activated

Comment on lines 1390 to 1392
if (!TestUtil.haveMinimumServerVersion(con, ServerVersion.v15)) {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, this should be assume so the test is displayed as skipped rather than successful.

@vlsi
Copy link
Member

vlsi commented Jan 17, 2023

I'm inclined to add pgjdbc-jfrunit subproject with https://github.com/moditect/jfrunit tests.
That would be executed with Java 17+ only, however, it would enable us to assert on the number of network roundtrips, so we could properly assert that batching works as expected.

@davecramer
Copy link
Member

Not sure the effort justifies, but I'll leave that up to you.

@sehrope
Copy link
Member

sehrope commented Jan 17, 2023

-1 to adding more dependencies for something like that.

If you want to test if the batching is done in a single command you could use defaults on the table to infer how many actual commands were used to populate the data. That'd work on across all JDK and PG versions too:

CREATE TABLE t (
 a int PRIMARY KEY,
 c_timestamp timestamptz DEFAULT clock_timestamp(), -- instatanenous wall time
 s_timestamp timestamptz DEFAULT statement_timestamp(), -- statement start time
 tx_timestamp timestamptz DEFAULT transaction_timestamp() -- transactionst start time
);

BEGIN;
INSERT INTO t (a) VALUES (1);
INSERT INTO t (a) VALUES (2), (3);
INSERT INTO t (a) SELECT generate_series(4, 100);
COMMIT;

SELECT
  COUNT(DISTINCT c_timestamp),
  COUNT(DISTINCT s_timestamp),
  COUNT(DISTINCT tx_timestamp)
FROM t;

This results in:

 count | count | count 
-------+-------+-------
   100 |     3 |     1

The only thing that we care about is the number of statements which would be the number of distinct statement timestamps. While it's theoretically possible for back to back statements to have the same timestamp, that's not going to happen in practice as we're talking about microsecond resolution. That's plenty for something that's just part of a test suite.

@vlsi
Copy link
Member

vlsi commented Jan 17, 2023

Is there a query to measure the number of network roundtrips and bytes sent/received?

@sehrope
Copy link
Member

sehrope commented Jan 17, 2023

I'm not aware of any functions for network traffic exposed by the server. There's also a bit of Heisenberg principle in asking the server for that information.

You could use a custom socket factory that tracks that kind of information. At least of the total bytes. Round trips are tricky as you wouldn't really know whether it's being buffered.

In any case, I can't imagine it'd be cleaner than the statement timestamp approach. I suggest going that route if you want to verify the number of individual server statements actually executed.

@mmimica
Copy link
Author

mmimica commented Jan 18, 2023

There is also SELECT xact_commit FROM pg_stat_database. It's global to the database and affected by background processes, but it should glow at a lower rate with bulk operations.

@vlsi
Copy link
Member

vlsi commented Jan 18, 2023

@mmimica do you think you could add tests that would cover the following?
a) ensure the rewrite did happen (e.g. via statement_timestamp)
b) ensure the rewrite does not happen for unsupported cases, and they still work

}

TestUtil.assertNumberOfRows(con, "batchingMerge", 2, "2 rows expected");
TestUtil.assertNumberOfRows(con, "batchingMerge", cnt - 2, "14 rows expected");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 will be a part of "expected: ... got:..." message anyway.
Please clarify in the message parameter the reasons why you expect certain things to happen.

Assert.assertEquals("hello 102", TestUtil.queryForString(con, "SELECT col1 FROM batchingMerge WHERE id = 102"));
Assert.assertEquals("hello 103", TestUtil.queryForString(con, "SELECT col1 FROM batchingMerge WHERE id = 103"));
Assert.assertEquals(insertRewrite ? "1" : String.valueOf(cnt - 2), TestUtil.queryForString(con, "SELECT COUNT(DISTINCT ts) FROM batchingMerge"));
TestUtil.assertNumberOfRows(con, "batchingMerge", expectedCnt, "Number of inserted rows.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the test fails with "Number of inserted rows. expected: 14 got: 15".
Does it look like a good bug report to you?

Frankly speaking, I would not say that message clarifies the reasons why 14 was expected.
It would be much better if you put a message that clarifies why you expect certain things.

As an alternative, it might work even better if you select full table contents, and verify it exactly. Then the failure message would shed some light on which rows were not expected, and so on.

@vlsi vlsi force-pushed the feature/support_merge_rewrite branch from f2c3728 to 073ea73 Compare May 12, 2024 13:09
@vlsi vlsi added the triage/needs-review Issue that needs a review - remove label if all is clear label May 12, 2024
@vlsi
Copy link
Member

vlsi commented May 12, 2024

I've rebased the PR, however, I have not reviewed the code and tests yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/needs-review Issue that needs a review - remove label if all is clear
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants