New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database HA with multiple <connection-url> values + <check-valid-connection-sql> always connecting to first database even if it is a bad instance #748
Comments
Did you try setting targetServerType to primary? |
I did not try that, I was not even aware of its existence. It would mean using a single And yes that might be another possible workaround but the JDBC driver for Postgresql would afaict need to enforce the following:
I don't really see this issue as a Postgresql issue though, I see it as a general problem that non-validated connections are handed out to the Java application. Partly this could also be seen as a documentation problem where |
I have tried this with JBoss with flush proeprty set to flush all failed and invalid connections. With targetServerType primary, it always tries to connect to master/writer and in case of failover, still will connect to master, if any of the node2 or node3 replicas has been promoted to master
Regards
Aamir
…________________________________
From: bergner ***@***.***>
Sent: Tuesday, March 29, 2022 12:07:16 AM
To: ironjacamar/ironjacamar ***@***.***>
Cc: aamirraza ***@***.***>; Comment ***@***.***>
Subject: Re: [ironjacamar/ironjacamar] Database HA with multiple <connection-url> values + <check-valid-connection-sql> always connecting to first database even if it is a bad instance (Issue #748)
I did not try that, I was not even aware of its existence. It would mean using a single instead with jdbc:postgresql://node1,node2,node3/accounting?targetServerType=master as documented here https://jdbc.postgresql.org/documentation/head/connect.html
And yes that might be another possible workaround but the JDBC driver for Postgresql would afaict need to enforce the following:
* On initial connection discover the master
* Either the JDBC driver needs to on its own validate the connection on every use and transparently switch to a different node, or we still need the to fail so a new connection is initialized
I don't really see this issue as a Postgresql issue though, I see it as a general problem that non-validated connections are handed out to the Java application. Partly this could also be seen as a documentation problem where and documentation and examples should be much more clear about the need for exceptions to be thrown in order for a connection to be considered "bad". It is imho very unintuitive that a new connection does not go through the + check before being considered a valid connection.
—
Reply to this email directly, view it on GitHub<#748 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABEIY4JGJMBWCDD6GGXRM6LVCGVIJANCNFSM5QZNK3TQ>.
You are receiving this because you commented.Message ID: ***@***.***>
|
If you stumble on this issue facing a similar problem, see the "Workaround" section at the bottom.
Setup
<connection-url>
contains jdbc:postgresql:... URLs to all 3 database servers<validate-on-match>
is set to true<background-validation>
is set to false<check-valid-connection-sql>
has been set to a couple of different statements in order to try to get the desired result that Wildfly/Ironjacamar only hands out connections to the primary database, regardless of which of the 3 nodes is the primary database.Problem
The
<check-valid-connection-sql>
does not seem to run at all on initial connection. Instead the connection returned back to the application is the first entry in the<connection-url>
list, regardless if that would be considered valid by the<check-valid-connection-sql>
or not. I see no signs in JDBC Spy log output that this SQL statement actually runs. The application can end up getting connected to a read replica and repeatedly fails with this error, without ever performing the SQL connection check:Attempt 1: SELECT 1 WHERE pg_is_in_recovery() = FALSE
This seemed like the first logical attempt to distinguish the primary database from a read replica, and also the reason why I felt
<validate-on-match>
true would be a must. The statement just doesn't seem to get executed on initial startup. If I reconfigure<connection-url>
so that the first entry is the current primary it does connect ok and I can see the<check-valid-connection-sql>
being executed, but that seems to happen MUCH later than the initial connection attempt.Attempt 2: Raise Postgresql exception on replicas
After skimming through 1000s of lines of Java code in Ironjacamar and Wildfly it seemed like a lot of the connection validation is centered around use of SQLException. For that reason I tried to change the
<check-valid-connection-sql>
to a statement that would raise an error on read replicas which I then hoped would propagate as an SQLException in the Java code. This was a long shot but I tried this after some testing in the psql cli:Again this does not work because the valid connection check seems to happen much later and if we accidentally chose a read replica to begin with we never get to the point of checking the connection with this SQL statement.
Expected behavior
Any connection returned to Wildfly or the Java application in question should have been validated according to the declared rules.
Questions
<validate-on-match>
true should mean that anyone requesting a connection would have synchronous connection validation happening before the connection is handed out to the application? I believe so but documentation is slightly vague here.<check-valid-connection-sql>
wouldn't be the first statement executed on a database connection?Log snippets
In this log example we have dbserv3, dbserv2, dbserv as databases in that order in
<connection-url>
with dbserv being the current primary. Putting dbserv as the first entry makes the service start ok.Further debugging and pinpointing
Afaict the gory details of the call chain to get a connection to a data source is shown below but in essence it boils down to a long chain of calls starting frmo WrapperDataSource's getConnection() without arguments ending up calling the LocalManagedConnection constructor with the JDBC connection to the first database listed in
<connection-url>
. LocalManagedConnection constructor does not lead to immediate validation of the connection and hence no exception is thrown and the second URL in<connection-url>
is not checked (LocalManagedConnectionFactory tries the other URLs if the first one fails, but if doesn't fail) because the JDBC connection to a read replica is still considered an ok connection at that stage.What I can seemingly try here is to set
<new-connection-sql>
to the earlier<check-valid-connection-sql>
that raised a Postgresql exception and then maybe get past this initial connection problem. The ValidateOnMatch logic in SemaphoreConcurrentLinkedDequeManagedConnectionPool only seems to come into play for already established connections in the connection pool and not for the initial connection.Workaround
The below data source configuration is the only one I've found that seems to actually work. Both
<new-connection-sql>
and<check-valid-connection-sql>
needs to be set to SQL statements that cause an exception to be thrown on the replica databases. This is because the<validate-on-match>
logic is too far away from the logic that tries other URLs. I still consider this to be an issue in the Ironjacamar library but at least this workaround is a viable alternative for Postgresql. You will have some ugly stack traces in your log file but at least those stack traces are clear in the sense that they contain the "Database is a read-only replica" message.The text was updated successfully, but these errors were encountered: