Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Average length of time for PostgreSQL queries significantly longer after switching tenants #635

Open
bradeaton opened this issue Feb 5, 2020 · 2 comments

Comments

@bradeaton
Copy link

We're attempting to implement apartment in our existing Rails 4.2 application. We've run into a performance issue that we can't seem to get around. The crux of the issue seems to be that database queries that we execute after doing an Apartment::Tenant.switch take significantly longer than if we are not switching tenants. It seems as if optimizations that we were getting from PostgreSQL without switching tenants are lost once we switch. We see this issue even when there aren't any separate private databases configured. In the steps to reproduce below config.tenant_names is set to an empty hash so that everything is still in the primary database. The difference in DB performance illustrated in the steps to reproduce are seen across all the queries we've looked at so far. The results are just slower response times, it brings the application to a crawl and then a halt with average page load times above 15 seconds or more.

Steps to reproduce

# Without switching to primary database each time
# This is just a class that stores benchmark times with a tag and reports the min, max, and averages
bm = BenchmarkReport.new(description: "Apartment Tests")
bm.start(stat: "Overall time")
# To simulate the number of accounts we have in production environment on local or staging, we loop 20 times.
(1..20).each do
  # Account is an excluded model
  Account.all.each do |account|
      bm.start(stat: "account.users.where...")
      # The users table is not excluded
      main_user = account.users.where('main = ?', true).take
      bm.stop(stat: "account.users.where...")
  end
end
bm.stop(stat: "Overall time")
puts bm.report.to_json

# With switching to primary database each time
# This is just a class that stores benchmark times with a tag and reports the min, max, and averages
bm = BenchmarkReport.new(description: "Apartment Tests")
bm.start(stat: "Overall time")
# To simulate the number of accounts we have in production environment on local or staging, we loop 20 times.
(1..20).each do
  # Account is an excluded model
  Account.all.each do |account|
    bm.start(stat: "switch")
    Apartment::Tenant.switch(account.db_name) do
      bm.start(stat: "account.users.where...")
      # The users table is not excluded
      main_user = account.users.where('main = ?', true).take
      bm.stop(stat: "account.users.where...")
    end
    bm.stop(stat: "switch")
  end
end
bm.stop(stat: "Overall time")
puts bm.report.to_json

Expected behavior

I would expect the average time to execute the account.users query with tenant switching and without to be at least close. I would expect some overhead in the switch.

Actual behavior

The query times where we are switching tenants in between are x times longer than when we are not switching tenants. Here are the benchmark results:

# Without switching tenants
{
  "description": "Apartment Tests",
  "stats": [
    {
      "stat": "account.users.where...",
      "count": 1300,
      "min": 0.0018922750023193657,
      "max": 0.13093664599000476,
      "avg": 0.00294993647022728
    },
    {
      "stat": "Overall time",
      "count": 1,
      "min": 4.397003222984495,
      "max": 4.397003222984495,
      "avg": 4.397003222984495
    }
  ]
}

# With switching tenants
{
  "description": "Apartment Tests",
  "stats": [
    {
      "stat": "account.users.where...",
      "count": 1300,
      "min": 0.033399779989849776,
      "max": 2.0633101440034807,
      "avg": 0.055884263064766015
    },
    {
      "stat": "switch",
      "count": 1300,
      "min": 0.06311358700622804,
      "max": 2.130101154005388,
      "avg": 0.10326035546714243
    },
    {
      "stat": "Overall time",
      "count": 1,
      "min": 134.57561817902024,
      "max": 134.57561817902024,
      "avg": 134.57561817902024
    }
  ]
}

System configuration

Heroku

  • Database: (Tell us what database and its version you use.)

PostgreSQL 10.11

  • Apartment version:

2.2.1

  • Apartment config (in config/initializers/apartment.rb or so):
#
# Apartment Configuration
#
Apartment.configure do |config|

  config.excluded_models = %w{
    AccountActivity
    AccountConfiguration
    AccountFeature
    Account
    ApartmentMigration
    ClickTrackingClientUrl
    ClickTrackingUtm
    Delayed::Backend::ActiveRecord::Job
    Document
    Feature
    InboundMailLog
    MobileCarrier
    Notification
    OmniLog
    Permission
    PhoneNumber
    QueuedWorkflowEmail
    ReportDatum
    SetupAccount
    SetupLocation
    SetupUser
    UserPasswordReset
    WorkflowNode
    WorkflowProcessNode
    WorkflowProcessRunnerInstance
    WorkflowProcess
    WorkflowVersion
    Workflow
    ZipCode
  }

 
  # We handle providing the list of tenant names by retrieving Account records from the DB where we have defined a separate database. This allows us
  # to have our multi-tenant clients in the primary database and our single-tenant('ish') clients in their own separate database
  tenant_names = {}

  Account.where('db_name IS NOT NULL').each do |account|
    tenant = {
        adapter: account.db_adapter,
        port: account.db_port,
        host: account.db_host,
        username: account.db_username,
        password: account.db_password,
        database: ENV['PRIMARY_DB_NAME']
    }

    tenant_names[account.db_name] = tenant if account.db_name.present?
  end

  config.tenant_names = tenant_names
  config.use_schemas = false
end
  • use_schemas: (true or false)
    false

  • Rails (or ActiveRecord) version:
    4.2

  • Ruby version:
    2.4.5

@lcjury
Copy link

lcjury commented Feb 6, 2020

Each database system implements multi-tenancy on his own way. Mysql has the use statement to change between databases, postgres has schemas, etc.

But, as you have use_schemas set to false, you're not using the schemas mecanism, instead, you're changing your connection on runtime each time you call to switch!

I don't know the inner workings of active record/apartment, but Apartment calls establish_connection who creates a new pool. Creating new connections is a really expensive operation, that's why we create connection pools and re-use them when possible, but, each you switch tenants, apartment is creating a new pool. This is a really expensive operation.

From the Readme: One can optionally use the full database creation instead if they want, though this is not recommended

I don't think this is an issue, for me it would me an expected outcome when using full database instead of schemas.

@kitsunde
Copy link

I don't know if it's the entire cause but when you change the search path, postgresql will replan prepared statements.

See notes: https://www.postgresql.org/docs/current/sql-prepare.html

I have plans of basically adding .where('? = ?', search_path, search_path) . to force some heavier prepared statements to not get replanned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants