Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SELECT...FOR UPDATE to update session data in the database instead of DELETE, INSERT. #596

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ChristopherSchultz
Copy link
Contributor

I have only compile-tested this; I wanted to get feedback on the approach, how to handle errors, etc.

@isapir
Copy link
Member

isapir commented Mar 8, 2023

@ChristopherSchultz Is there a list of supported database systems with which the DataSourceStore is compatible? Are you sure that they all support "SELECT FOR UPDATE"? I tried to look that up and the "best" thing I found was https://www.sql-workbench.eu/dbms_comparison.html

}

_conn.commit();
} catch (SQLException sqle) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to put all the handlers in the same block [1]? e.g.

} catch (SQLException | RuntimeException | Error e) {
    // looks like mostly the same block to me
}

We don't need to support Java 6

[1] https://docs.oracle.com/javase/7/docs/technotes/guides/language/catch-multiple.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be okay merging them together, but we might want to use different error messages. This is one of the things I wanted to get feedback on: how important is it to say "we got an Error" vs "we got an SQLException", etc. when the actual exception and stack trace will likely be logged? I think it's probably okay to merge these exception handlers together but wanted some feedback about the logging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Stack Trace would be the same, true, but the exception class and message would still provide the details

@ChristopherSchultz
Copy link
Contributor Author

@ChristopherSchultz Is there a list of supported database systems with which the DataSourceStore is compatible? Are you sure that they all support "SELECT FOR UPDATE"? I tried to look that up and the "best" thing I found was https://www.sql-workbench.eu/dbms_comparison.html

Good question. I was originally going to use MERGE but the database I have at-hand (MariaDB) doesn't support it, so I went with SELECT...FOR UPDATE. It's not clear to me if the SELECT...FOR UPDATE NOWAIT listed in that table is specifically talking about the NOWAIT part of the query. For example, I know that both MariaDB and MySQL have supported SELECT...FOR UPDATE for a long time, but the entry for MySQL states "Since 8.0" and MariaDB says "No". A quick check of the HSQLDB Documentation shows that SELECT...FOR UPDATE is available while that list claims it is not supported. Same thing with IBM DB2 and SQL Server. But it's clear that not all RDMBSs support that syntax.

Maybe it's not possible to write a single code-path to satisfy all major RDBMS systems. I'd be happy to change the PR to offer a selection between the two via configuration.

@isapir
Copy link
Member

isapir commented Mar 9, 2023

Yeah, I guess that site is not up to date. I also used SELECT FOR UPDATE in MySQL 5.7.

There is also INSERT ON CONFLICT UPDATE support in MySQL and Postgres, but it would be difficult to find an optimized solution that fits all.

@aooohan
Copy link
Member

aooohan commented Mar 10, 2023

I have a question that why we don't add a real primary key(auto-increment) to solve the problem that primary key constraint violation when insert data to database simultaneously? And we can select session data from table by session-id and order (DESC) by ID (real primary key) when load session, then the newest result is what we need. For save can still keep the original code logic, delete (by session-id ) first and insert later. Thus, we can avoid adding lock(FOR UPDATE or others) from the database level by this way.
I think this will work and so simple, but I don't know if there will be any security issues.

Thoughts?

@ChristopherSchultz
Copy link
Contributor Author

The problem is that there is a window of opportunity between the existing DELETE and INSERT where the session_id column (which is UNIQUE or equivalent) can be INSERTed by another thread. Changing the definition of the table to include a separate PK and removing the UNIQUE constraint from the session_id column is one way to solve this issue, but it would break any existing installation which is not using that kind of table definition.

I would definitely not want to do that for a point-release, and my goal here is to back-port this all the way back to 8.5 at this point. I think some larger changes could be made separately from this current effort, and restricted to maybe Tomcat 11.0.x and later.

@ChristopherSchultz
Copy link
Contributor Author

I've been thinking about this more and I think it could cause problems for people not using appId+sessionId (or just sessionId) as the primary key for the DB table storing the sessions. INSERT and DELETE are not required to include a value for the PK. Some databases can automatically-allocate a value for that purpose and so the existing DELETE+INSERT scheme works for them. But the SELECT ... FOR UPDATE will not because it does not include the table's PK.

One way to fix this would be to add an optional PK column (columns?) to the configuration.

I think this is pretty important, because databases which use PK-indexed organization (SQL Server, InnoDB, and others) will thrash-around if appId+sessionId are the PK columns, so anybody using those kinds of DBs would likely choose a separate PK column that doesn't cause those ugly performance problems.

Any other thoughts?

@MikeNeilson
Copy link

Considering the difficult of handling all situation for all databases perhaps the following:

Punt the session update operation to a callback or interface, have one or two sensible defaults (like for the major databases and specific table configuration) and have a way for the user to provide their own implementation? Adds a little of complexity but code flow would arguably be cleaner and more organized since the DataStoreSource::save wouldn't need to have a bunch of knowledge on how it works. And it would be more flexible for users with more complex setups that they can't change.

Basically something similar to how the X509UserNameProvider works.

@isapir
Copy link
Member

isapir commented Mar 11, 2023

I would love to see an interface that goes a step further and allows for NoSQL implementations as well. For example, Redis is an excellent option for a data store IMO.

@isapir
Copy link
Member

isapir commented Mar 12, 2023

One way to fix this would be to add an optional PK column (columns?) to the configuration

+1 for a Primary Key column. It improves performance and is supported by all SQL compliant DBMS's.

@aooohan
Copy link
Member

aooohan commented Mar 13, 2023

One way to fix this would be to add an optional PK column (columns?) to the configuration.

+1

@ChristopherSchultz
Copy link
Contributor Author

I have a question that why we don't add a real primary key(auto-increment) to solve the problem that primary key constraint violation when insert data to database simultaneously?

The problem is that there is a window of opportunity between the existing DELETE and INSERT where the session_id column (which is UNIQUE or equivalent) can be INSERTed by another thread. Changing the definition of the table to include a separate PK and removing the UNIQUE constraint from the session_id column is one way to solve this issue, but it would break any existing installation which is not using that kind of table definition.

I would definitely not want to do that for a point-release, and my goal here is to back-port this all the way back to 8.5 at this point. I think some larger changes could be made separately from this current effort, and restricted to maybe Tomcat 11.0.x and later.

@ChristopherSchultz
Copy link
Contributor Author

I would love to see an interface that goes a step further and allows for NoSQL implementations as well. For example, Redis is an excellent option for a data store IMO.

This is the whole point of the Store interface. If you want to implement your own strategy, you can implement it however you want. I believe there are some Redis- and Memcached-based Store implementations already available from third-parties.

@ChristopherSchultz
Copy link
Contributor Author

Punt the session update operation to a callback or interface, have one or two sensible defaults (like for the major databases and specific table configuration) and have a way for the user to provide their own implementation?

If you are going to provide your own implementation, you could just subclass DataSourceStore and change the behavior of the store (or load, etc.) methods.

This does bring up a good point: rather than having a new configuration option, I could simply have a subclass of DataSourceStore which overrides the store method to do the SELECT...FOR UPDATE. Then there is a much smaller change to the existing code, which is beneficial for stability. In fact, I think no existing code needs to change; everything is in the new class, including this optional new primary key column option.

@isapir
Copy link
Member

isapir commented Mar 13, 2023

This does bring up a good point: rather than having a new configuration option, I could simply have a subclass of DataSourceStore which overrides the store method to do the SELECT...FOR UPDATE. Then there is a much smaller change to the existing code, which is beneficial for stability. In fact, I think no existing code needs to change; everything is in the new class, including this optional new primary key column option.

That's a great idea. It can also serve as a model for others who might want to extend the DataSourceStore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants