Skip to content
This repository has been archived by the owner on Jan 7, 2019. It is now read-only.

mindware/prgov_gmq_cap_workers

Repository files navigation

What we needed:
---------------
- Persistence
- See what's pending
- Modify pending jobs in-place
- Tags
- Priorities
- Fast pushing and popping
- See what workers are doing
- See what workers have done
- See failed jobs
- Kill fat workers
- Kill stale workers
- Kill workers that are running too long
- Distributed workers (run them on multiple machines)
- Workers can watch multiple (or all) tags

We looked at:
-------------

During this project we checked out RabbitMQ, SideKiq, Starling, ActiveMessaging, BackgroundJob, DelayedJob, and beanstalkd. 

Resque:
-------
Resque was identified as a good choice for applications that run multiple queues each with many thousands of job entries, where worker behavior can be volatile. Volatile worker behavior is mitigated by forking children processes to handle jobs which ensures that any out of control workers can be dealt with in isolation.

Resque-Retry:
------------
Resque while meeting all our requirements did not have built in retry by default. This is left to the user to implement. We identified a highly-active (commits to master ocurred just yesterday) project called Resque-Retry which as the capabilty for exponential backoff, a feature we were going to implement. Seeing as the project is stable, active and does exactly what we meant to do, we've incorporated it into our project. 

About the Resque commmand:
--------------------------

Our Procfile contains a series of command that must run in order for this worker system to be online. Among those commands lies the resque command, which receives a series of parameters.

TERM_CHILD
---------- 
TERM_CHILD tells Resque to pass SIGTERM from the parent to child process to ensure that all child worker process have time to execute an orderly shutdown.

RESQUE_TERM_TIMEOUT
-------------------
The default period Resque waits before sending SIGKILL to its child processes is 4 seconds. To modify this value and give our workers more time to gracefully shutdown, modify the RESQUE_TERM_TIMEOUT environment variable, (ie RESQUE_TERM_TIMEOUT=7)

QUEUE
-----
The name of Queue to be used. 


To start the web interface:
---------------------------

Resque-retry patches the resque-web interface. In order to run it, we've created a simple rack configuration (config.ru), which can be easily run with the following command: bundle exec rackup -p 9292 config.ru

That loads the web interface with a few following additions (Scheduled, Delayed and Retry tabs). You can check it out at: http://localhost:9292


Debugging errors:
-----------------

Using the redis-cli you can connect to redis. 
The following are resque keys (since using keys *resque* is not safe in production), here they are:
1) "resque:failed"
2) "resque:stat:processed"
3) "resque:queue:prgov_cap"
4) "resque:stat:failed"

We can check the resque:queue:prgov_cap key to see what type it is:
> type resque:queue:prgov_cap
list 

This tells us the key is of type list. Let's print the last 10 items 
> lrange resque:queue:prgov_cap 0 10 
1) ""
2) ""
3) ""
4) "{\"class\":{},\"args\":[\"028639abedc7445a7a1d04154454bf0d9\",{\"queued_at\":\"2014-08-26 06:13:12 UTC\"}]}"

In order to delete all item matching a string, we could use LREM:
> lrem resque:queue:prgov_cap 0 "{\"class\":{},\"args\":[\"028639abedc7445a7a1d04154454bf0d9\",{\"queued_at\":\"2014-08-26 06:13:12 UTC\"}]}" 
(integer) 1 

This means it was deleted:
> lrange resque:queue:prgov_cap 0 10
1) ""
2) ""
3) ""



----- 
License: MIT License

About

Workers that process Good Standing Certificate-related jobs, asynchronously, retry in case of failures and are part of the Government Messaging Queue

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published