Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: clear session pool on connection close #543

Merged
merged 5 commits into from Nov 4, 2020

Conversation

IlyaFaer
Copy link
Contributor

@IlyaFaer IlyaFaer commented Oct 23, 2020

To avoid keeping Cloud Spanner sessions alive after connection close we should clear the related pool on the connection close.

@IlyaFaer IlyaFaer added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: spanner Issues related to the googleapis/python-spanner-django API. labels Oct 23, 2020
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Oct 23, 2020
@IlyaFaer IlyaFaer requested a review from c24t October 23, 2020 09:02
@IlyaFaer IlyaFaer marked this pull request as ready for review October 23, 2020 09:03
@IlyaFaer IlyaFaer requested a review from a team as a code owner October 23, 2020 09:03
Copy link
Contributor

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @olavloite in https://github.com/googleapis/python-spanner-django/pull/543/files#r510984476 that this shouldn't clear the session pool if it doesn't own it.

FWIW in python-spanner Database doesn't expose the session pool or give users an option to clear it (unless they created the pool themselves). Like #535 (comment) this might be an argument for changes to python-spanner instead of this library.

@IlyaFaer
Copy link
Contributor Author

I still don't see how one can use single pool for several databases. Pool is binded to a concrete database, if we'll use the same pool for connections to different databases, it's not gonna be correct. Thus, for me it looks like users should not use global pools at all, even for several connections to the same database, as we're not very able to control whether the pool is used only for single database.

@olavloite
Copy link
Contributor

olavloite commented Oct 27, 2020

I still don't see how one can use single pool for several databases. Pool is binded to a concrete database, if we'll use the same pool for connections to different databases, it's not gonna be correct. Thus, for me it looks like users should not use global pools at all, even for several connections to the same database, as we're not very able to control whether the pool is used only for single database.

Yes, that is correct, but I'm not sure that is what @c24t meant. So to summarize:

  1. A session pool is as @IlyaFaer noted always linked with one database. It is not possible to have a single session pool for multiple different databases.
  2. The current implementation of the DB API requires the user to supply a database instance when a connection is created. That means that the connection is not the owner of the database instance, and the connection should therefore not close or clear the session pool when the connection is closed. That is an action that the client application should take. We should add this to the documentation so it's clear for users what we expect from them.
  3. Users should not create a new database instance for each connection they want to open. Instead, connections for the same database should use the same database instance. This will effectively allow multiple connections to share the same session pool. This should also be documented.
  4. Assuming that we will be supporting SQLAlchemy in the future: SQLAlchemy expects users to create connections by specifying a connection string in a very similar way as connections are created in for example JDBC or ADO.NET. Once we support that, we will need to implement some kind of 'database pool' to ensure that when a user creates multiple connections to the same database, we re-use the database instance that we have already created for a previous connection for the same database.

My comment on using something like a global session pool was aimed at that last point from the list above, where 'global' means 'global per database'. So if a user creates multiple connections to different databases, we also need multiple session pools (one for each unique database).

The current solution of requiring the user to supply a database instance when creating a connection is as far as I can tell fine for the current use case (Django support).

@IlyaFaer
Copy link
Contributor Author

IlyaFaer commented Oct 30, 2020

@olavloite, our main way to create a connection is actually connect function, and it works with string ids of instance and database. However, one can send a session pool into this function. If user not setting pool, it'll be created by the original Spanner Database object here.

Thus, in fact user can use global pools, but looking at how Database works with the given pool, I'm starting to think that the original Spanner client doesn't consider using global pools - it'll bind the given pool to the database without checking if this pool is already binded to another database. I mean, in the original client user can propagate one pool to several different databases, it'll be incorrect, but we don't think about it (it looks like we don't). So, for me, it seems like Database intended to use its own pool. Still, one can pass the concrete pool to set the right type of a pool - not to implement one global pool for several Database objects. PTAL at the docs - they also describe only different types of pools, not practices about using global ones. That's why I consider closing pool by the connection as a right way of things. WDYT?

As the original client doesn't care about closing a binded pool of the database, I don't think this PR is very major. If we don't have a good and simple way to close the pool, I'm okay to just close the PR.

@olavloite
Copy link
Contributor

@IlyaFaer I would say that it would normally be a good practice to let the owner of the pool decide when a pool is closed. So that would mean:

  1. If the client application passes in a pool to a database or a connection, then the client application is the owner of the pool and should close it when appropriate. If we were to let the connection close such a session pool, we might cause unexpected errors if the same session pool is used for multiple connections.
  2. If the client application does not pass in a pool to a database or a connection, then the database/connection creates its own session pool. That session pool is owned by the database/connection, and should be closed by the database/connection whenever the database/connection is closed. The drawback of this solution for connections is that a connection will never really need more than one session, so it's quite an overkill to create a separate session pool for each connection.

A global session pool that can be used for different backend databases is as you say not possible, as a session pool is always bound to a specific database on the backend. What I meant with a global session pool is a global session pool per database. That would basically mean a pool of session pools with the database-id as the key for getting a session pool from this 'session pool pool'. That is something that would be preferable in the end for the SQLAlchemy implementation.

I had a closer look at the current database and sessionpool implementations in the Spanner Python client library, and they do a lot less than I had expected... For now I would say that we should do the following:

  1. When a user creates a connection and thereby specifies a pool that should be used (i.e. the client application passes in an actual session pool): When closing the connection we should do nothing.
  2. When a user creates a connection without a specific pool: Create a new session pool and use that to create the database instance. When the connection is closed, we should also close the session pool that was created when the connection was created.

@IlyaFaer
Copy link
Contributor Author

IlyaFaer commented Nov 2, 2020

@olavloite, the thing which I don't like in our case is that we're initiating Database() object by connect() function, and then we're passing it into Connection() object. So, Connection() object getting Database() object with initiated pool, and it doesn't know how this database got the pool - created its own or used the one passed into constructor. We could expose Connection() object constructor argument or a class property to let Connection() object know about pool, but it looks not very beautiful for me.

What if we'll expose a clear_pool arg in Connection.close() method, which will be True by default? If user wants to use global pool and control its close by himself, he'll have to call close like this: connection.close(clear_pool=False).

@olavloite
Copy link
Contributor

@olavloite, the thing which I don't like in our case is that we're initiating Database() object by connect() function, and then we're passing it into Connection() object. So, Connection() object getting Database() object with initiated pool, and it doesn't know how this database got the pool - created its own or used the one passed into constructor. We could expose Connection() object constructor argument or a class property to let Connection() object know about pool, but it looks not very beautiful for me.

What if we'll expose a clear_pool arg in Connection.close() method, which will be True by default? If user wants to use global pool and control its close by himself, he'll have to call close like this: connection.close(clear_pool=False).

I don't think we should add a parameter to the Connection.close() function for that. I don't think it would be clear for most users what that would mean. I would prefer a solution where the decision is automatically made by the Connection based on what was passed in to the connect() method. Can't we add a private field to the Connection class that indicates whether it owns the pool or not, and use that to determine whether to automatically close the pool when the connection is closed? That field can be set by the connect() method when the connection is created.

@IlyaFaer
Copy link
Contributor Author

IlyaFaer commented Nov 3, 2020

Can't we add a private field to the Connection class that indicates whether it owns the pool or not, and use that to determine whether to automatically close the pool when the connection is closed? That field can be set by the connect() method when the connection is created.

Of course we can. Can't say I like it, but seems like there is no ideal solution in here, so I'm pushing a protected property.

from google.cloud.spanner_dbapi import Connection

connection = Connection(self.INSTANCE, self.DATABASE)
connection = self._make_connection()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has an error: self.DATABASE is actually a string, not a Database object, so passing it into Connection object constructor will cause errors.

@olavloite
Copy link
Contributor

Can't we add a private field to the Connection class that indicates whether it owns the pool or not, and use that to determine whether to automatically close the pool when the connection is closed? That field can be set by the connect() method when the connection is created.

Of course we can. Can't say I like it, but seems like there is no ideal solution in here, so I'm pushing a protected property.

Thank you. I think we should look some more into this later if we start adding support for SqlAlchemy. From what I understand from the documentation on that, a connection is created by SqlAlchemy itself based on a connection string, which means that we probably need to do some more management of databases and session pools.

Copy link
Contributor

@olavloite olavloite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself looks good to me.

Can we do anything with the build failures, or are those related to the general Kokoro / test problems?

@IlyaFaer
Copy link
Contributor Author

IlyaFaer commented Nov 4, 2020

Can we do anything with the build failures, or are those related to the general Kokoro / test problems?

Yes, system tests on Kokoro are broken because of parallelization. I'm checking all other sessions, but system tests for now only can be run succesfully on emulator (separate GitHub Workflow check).

@IlyaFaer
Copy link
Contributor Author

IlyaFaer commented Nov 4, 2020

@c24t, we've processed this PR. Do you have any final comments?

@c24t c24t merged commit 14e4cac into googleapis:master Nov 4, 2020
@IlyaFaer IlyaFaer deleted the clear_pool branch November 5, 2020 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/python-spanner-django API. cla: yes This human has signed the Contributor License Agreement. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants