Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

40% CPU spike due to query on QRTZ_LOCKS #1112

Closed
ndenkha opened this issue Feb 8, 2021 · 14 comments
Closed

40% CPU spike due to query on QRTZ_LOCKS #1112

ndenkha opened this issue Feb 8, 2021 · 14 comments

Comments

@ndenkha
Copy link

ndenkha commented Feb 8, 2021

My app runs on .NET Framework 4.7.2 and was running an older Quartz version 3.0.7 for over a year with low CPU utilization, and a few weeks ago we upgraded Quartz to 3.2.3 and we noticed an immediate 40% CPU increase due to having this query being executed much more often with the newer version.

SELECT *
FROM QRTZ_LOCKS WITH(UPDLOCK,ROWLOCK)
WHERE SCHED_NAME = @schedulerName
AND LOCK_NAME = @lockname

Version used

Version: 3.2.3

To Reproduce

Don't have code to reproduce, but my app creates simple jobs and at a given time it has 10s of them executed on 4 different VM instances.

Expected behavior

The CPU should remain as it was with the older version.

btw, there are no errors reported by Quartz or our app, but this is the only side effect we see with version 3.2.3

@lahma
Copy link
Member

lahma commented Feb 8, 2021

Can you post your scheduler factory configuration (stripping credentials etc)?

@ndenkha
Copy link
Author

ndenkha commented Feb 8, 2021

Sure.

==== processorScheduler ====

<scheduler name="processorScheduler">
<quartz>
  <property key="quartz.scheduler.instanceName" value="ProcessorScheduler" />
  <property key="quartz.scheduler.instanceId" value="AUTO" />
  <property key="quartz.scheduler.idleWaitTime" value="1000" />
  <property key="quartz.scheduler.exporter.type" value="Quartz.Simpl.RemotingSchedulerExporter, Quartz" />
  <property key="quartz.scheduler.exporter.port" value="1111" />
  <property key="quartz.scheduler.exporter.bindName" value="ProcessorScheduler" />
  <property key="quartz.scheduler.exporter.channelType" value="tcp" />
  <property key="quartz.scheduler.exporter.channelName" value="httpQuartz" />
  <property key="quartz.threadPool.type" value="Quartz.Simpl.DefaultThreadPool, Quartz" />
  <property key="quartz.threadPool.threadCount" value="20" />
  <property key="quartz.jobStore.type" value="Quartz.Impl.AdoJobStore.JobStoreTX, Quartz" />
  <property key="quartz.serializer.type" value="binary" />
  <property key="quartz.jobStore.clustered" value="true" />
  <property key="quartz.jobStore.clusterCheckinInterval" value="1000" />
  <property key="quartz.jobStore.misfireThreshold" value="60000" />
  <property key="quartz.jobStore.dataSource" value="default" />
  <property key="quartz.jobStore.driverDelegateType" value="Quartz.Impl.AdoJobStore.SqlServerDelegate, Quartz" />
  <property key="quartz.jobStore.tablePrefix" value="QRTZ_" />
  <property key="quartz.jobStore.useProperties" value="true" />
  <property key="quartz.dataSource.default.connectionString" value="Server=xxx;Database=quartz;user id=xxx;PWD=xxx;" />
  <property key="quartz.dataSource.default.provider" value="SqlServer" />
</quartz>
</scheduler>
<scheduler name="notificationScheduler">
<quartz>
  <property key="quartz.scheduler.instanceName" value="notificationScheduler" />
  <property key="quartz.scheduler.instanceId" value="notificationSchedulerInstance" />
  <property key="quartz.scheduler.proxy" value="true" />
  <property key="quartz.scheduler.proxy.address" value="tcp://xxx:2222/notificationScheduler" />
  <property key="quartz.threadPool.type" value="Quartz.Simpl.DefaultThreadPool, Quartz" />
  <property key="quartz.jobStore.type" value="Quartz.Impl.AdoJobStore.JobStoreTX, Quartz" />
  <property key="quartz.jobStore.misfireThreshold" value="60000" />
  <property key="quartz.jobStore.dataSource" value="default" />
  <property key="quartz.jobStore.driverDelegateType" value="Quartz.Impl.AdoJobStore.SqlServerDelegate, Quartz" />
  <property key="quartz.jobStore.lockHandler.type" value="Quartz.Impl.AdoJobStore.UpdateLockRowSemaphore, Quartz" />
  <property key="quartz.jobStore.tablePrefix" value="QRTZ_" />
  <property key="quartz.jobStore.useProperties" value="true" />
</quartz>
</scheduler>

==== notification scheduler ====

<scheduler name="notificationScheduler">
<quartz>
  <property key="quartz.scheduler.instanceName" value="notificationScheduler" />
  <property key="quartz.scheduler.instanceId" value="AUTO" />
  <property key="quartz.scheduler.idleWaitTime" value="1000" />
  <property key="quartz.scheduler.exporter.type" value="Quartz.Simpl.RemotingSchedulerExporter, Quartz" />
  <property key="quartz.scheduler.exporter.port" value="2222" />
  <property key="quartz.scheduler.exporter.bindName" value="notificationScheduler" />
  <property key="quartz.scheduler.exporter.channelType" value="tcp" />
  <property key="quartz.scheduler.exporter.channelName" value="httpQuartz" />
  <property key="quartz.threadPool.type" value="Quartz.Simpl.DefaultThreadPool, Quartz" />
  <property key="quartz.threadPool.threadCount" value="20" />
  <property key="quartz.jobStore.type" value="Quartz.Impl.AdoJobStore.JobStoreTX, Quartz" />
  <property key="quartz.serializer.type" value="binary" />
  <property key="quartz.jobStore.clustered" value="true" />
  <property key="quartz.jobStore.clusterCheckinInterval" value="1000" />
  <property key="quartz.jobStore.misfireThreshold" value="60000" />
  <property key="quartz.jobStore.dataSource" value="default" />
  <property key="quartz.jobStore.driverDelegateType" value="Quartz.Impl.AdoJobStore.SqlServerDelegate, Quartz" />
  <property key="quartz.jobStore.tablePrefix" value="QRTZ_" />
  <property key="quartz.jobStore.useProperties" value="true" />
  <property key="quartz.dataSource.default.connectionString" value="Server=xxx;Database=quartz;user id=xxx;PWD=xxx;" />
  <property key="quartz.dataSource.default.provider" value="SqlServer" />
</quartz>
</scheduler>

@lahma
Copy link
Member

lahma commented Feb 8, 2021

I'll need some time to investigate but I guess the biggest change has been that the query has been parametrized which also makes it look like being run two times more frequently if you have two separate schedulers (earlier there was different SQL string for each of them).

@ndenkha
Copy link
Author

ndenkha commented Feb 8, 2021

That's correct, I do have multiple schedulers which all have similar configs as the above.
Thanks for looking into it.

@ndenkha
Copy link
Author

ndenkha commented Feb 12, 2021

Hi Marko, touching base for any update?

@lahma
Copy link
Member

lahma commented Feb 12, 2021

I'm hoping to have time to work with this weekend. I haven't found anything obvious causing such performance regression but I think there's something between 2.x and 3.x that can be improved.

@ndenkha
Copy link
Author

ndenkha commented Feb 22, 2021

Hi Marko, did you have the change to take a look at this buddy?

@lahma
Copy link
Member

lahma commented Feb 25, 2021

Sorry, no big wins so far. I've discussed this with SQL Server DBA and he couldn't find any obvious reasons by testing, so it shouldn't be on DB side as it runs against same database page which should be super fast (small row count, two columns). If you have time to profile to pinpoint something obvious that I'm missing that would be super.

@ndenkha
Copy link
Author

ndenkha commented Feb 25, 2021

Unfortunately, we can't profile. But, I have some other insight to share with you.
Before the day we released our app with the upgraded Quartz, this query was being executed 250k times a day. Right after we released Quartz v3.0.7 the query execution times increased to 350k times a day (a jump of 125k). Our application data traffic hasn't changed a bit.

Also, we noticed that your query is doing UPDLOCK, ROWLOCK on a SELECT statement, any reason why it needs to when it's not updating? I think if you change this to NOLOCK would be better.

SELECT *
FROM QRTZ_LOCKS WITH(UPDLOCK,ROWLOCK)
WHERE SCHED_NAME = @schedulerName
AND LOCK_NAME = @lockname

Something in the v3.0.7 is making this query to run more often.

@ndenkha
Copy link
Author

ndenkha commented Feb 25, 2021

I just talked again to my DBA and he did some more digging and found out in the old version this UPDATE query was used to get run, and since the upgrade, this query disappeared and the new SELECT above showed up.

Notice also, the Sched_name was coming out as a text and not a parameter.

(@lockname nvarchar(14))
UPDATE QRTZ_LOCKS
SET LOCK_NAME = LOCK_NAME
WHERE SCHED_NAME = 'MaintenanceScheduler' AND LOCK_NAME = @lockname

And the old SELECT statement was this compared to the new one.

Before Jan 27
(@lockname nvarchar(14))
SELECT * FROM QRTZ_LOCKS WITH (UPDLOCK,ROWLOCK)
WHERE SCHED_NAME = 'MaintenanceScheduler'
AND LOCK_NAME = @lockname

Hope this helps.

@lahma
Copy link
Member

lahma commented Feb 26, 2021

Thank you for the update and information.

Also, we noticed that your query is doing UPDLOCK, ROWLOCK on a SELECT statement, any reason why it needs to when it's not updating? I think if you change this to NOLOCK would be better.

This would mean that there would be no locks to protect from concurrent access, so I wouldn't go there 😉

Notice also, the Sched_name was coming out as a text and not a parameter.

This is the exact behavior that was intended after #818 was merged. You should see a lot more queries now with this exact SQL if you have multiple schedulers, they all use the same query plan thanks to using the query parameter instead of hard-coding the scheduler name into query which causes different plans.

Re-reading your notificationScheduler configuration:

<property key="quartz.jobStore.lockHandler.type" value="Quartz.Impl.AdoJobStore.UpdateLockRowSemaphore, Quartz" />

This will cause it not to use the optimized SQL statement intended for SQL Server. See

if (LockHandler == null)
{
// If the user hasn't specified an explicit lock handler,
// then we *must* use DB locks with clustering
if (Clustered)
{
UseDBLocks = true;
}
if (UseDBLocks)
{
if (Delegate is SqlServerDelegate)
{
if (SelectWithLockSQL == null)
{
const string DefaultLockSql = "SELECT * FROM {0}LOCKS WITH (UPDLOCK,ROWLOCK) WHERE " + ColumnSchedulerName + " = @schedulerName AND LOCK_NAME = @lockName";
Log.InfoFormat("Detected usage of SqlServerDelegate - defaulting 'selectWithLockSQL' to '" + DefaultLockSql + "'.", TablePrefix, "'" + InstanceName + "'");
SelectWithLockSQL = DefaultLockSql;
}
}
Log.Info("Using db table-based data access locking (synchronization).");
LockHandler = new StdRowLockSemaphore(TablePrefix, InstanceName, SelectWithLockSQL, DbProvider);
}
else
{
Log.Info("Using thread monitor-based data access locking (synchronization).");
LockHandler = new SimpleSemaphore();
}
}
for the logic.

@ndenkha
Copy link
Author

ndenkha commented Mar 5, 2021

I've removed this property and also upgraded to the latest version, will let you know how it behaves.

@ndenkha
Copy link
Author

ndenkha commented Mar 25, 2021

Hi Marko, just an update. The CPU is back to normal behaviour after the above updates.
Thanks for your help buddy.

@ndenkha ndenkha closed this as completed Mar 25, 2021
@lahma
Copy link
Member

lahma commented Mar 26, 2021

Great to hear, thanks for closing the loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants