active_record_connection_pool_waiting always returns 0 #242

bogn83 · 2022-08-02T13:34:29Z

The query for it always returns 0 even though we can see lots of

ActiveRecord::ConnectionTimeoutError
could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use

even fully unfiltered by labels and in a 30d time window:

ruby_active_record_connection_pool_waiting > 0

while

ruby_active_record_connection_pool_idle > 0

works.

The text was updated successfully, but these errors were encountered:

slipperywizard · 2022-08-24T18:13:53Z

+1

modulitos · 2022-08-31T17:22:05Z

I believe this metric is working correctly.

I had the same issue as the OP, and suspect it wasn't working as well. But digging deeper, I realized that this metric isn't likely to get scraped by Prometheus unless the system is under sustained load. Since Prometheus only scrapes metrics once in a while (default is 10 seconds), it's possible to get a short spike in load, raise an ActiveRecord::ConnectionTimeoutError, and have ruby_active_record_connection_pool_waiting return to 0 when scraped. So it's a race condition. For this metric to reliably report a value above 0, the system must be under load long enough for the threads waiting in the queue to get scraped. After running a load test, I can confirm that it incremented correctly.

We can see how this works in the code. This metric is set by calling the .stat method on the connection pool:

prometheus_exporter/lib/prometheus_exporter/instrumentation/active_record.rb

Line 60 in 8c8f4a5

metric.merge!(pool.stat)

and here's the .stat method:
https://github.com/rails/rails/blob/eed417637b594b33b3b58b08dd41daa10c57e083/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L703-L719
(Referenced AR version is 6.0.5.1)

we can see that the waiting metric is set to num_waiting_in_queue. Tracing through that code, we can see that it's a gauge for the number of threads waiting on a connection, which gets incremented/decremented in this loop:
https://github.com/rails/rails/blob/eed417637b594b33b3b58b08dd41daa10c57e083/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L203-L226

The above code snippet should convince us that the number of waiting threads is incremented and reported to the waiting metric, and that it's incremented before the ConnectionTimeoutError error is raised. This all happens within fractions of a second, so unless there's a sustained load, it might not get scraped by Prometheus.

bogn83 changed the title ~~active_record_connection_pool_waiting wrongly returns 0~~ active_record_connection_pool_waiting always returns 0 Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

active_record_connection_pool_waiting always returns 0 #242

active_record_connection_pool_waiting always returns 0 #242

bogn83 commented Aug 2, 2022

slipperywizard commented Aug 24, 2022

modulitos commented Aug 31, 2022

active_record_connection_pool_waiting always returns 0 #242

active_record_connection_pool_waiting always returns 0 #242

Comments

bogn83 commented Aug 2, 2022

slipperywizard commented Aug 24, 2022

modulitos commented Aug 31, 2022