-
Notifications
You must be signed in to change notification settings - Fork 951
Description
I've been happily using DJ on Heroku for many years (as well as other queue adapters - resque, sidekiq).
I understand that Heroku sends SIGTERM to DJ during worker termination / restart, and that I can configure whether to immediately forward that to my jobs with Delayed::Worker.raise_signal_exceptions = :term.
Most of my jobs are very short-lived, and I'd prefer them to finish before being killed, so I normally don't want to immediately raise SIGTERM in my jobs, and instead wait for them to finish and DJ to clean up. But occasionally, something goes sideways - e.g. a 3rd-party web service is taking a long time to respond to requests, say > 30 seconds - and Heroku has to come along with a SIGKILL before DJ was able to finish and clean up.
What would be great is if I could configure DJ to give my jobs a few seconds of leeway to finish up after SIGTERM, then raise an error in them if they haven't finished in that timeframe so that DJ can clean up as well before the SIGKILL comes in. This should probably be configurable - perhaps by overloading Delayed::Worker.raise_signal_exceptions = 5 for a 5-second grace period, or a different config mechanism.
But to go further, what I've really found is that I don't actually want my jobs to get SignalException, because in general I'm not looking for that - I mostly handle default StandardError with a plain rescue => e. So perhaps a better mechanism would be to enable a block that is called after a configurable delay (probably should be bigger than sleep_delay?), so that I could choose whether to raise SignalException, 'TERM', or raise 'standard error to stop a job', or some other unknown requirement.
Without having support for this built-in, here is what I'm using to augment DJ (in an initializer). The stop method is the easiest to plug into, since that's what is being called in trap('TERM').
module Delayed
class Worker
module HandleSigterm
def stop
super
Thread.new do
Thread.current.abort_on_exception = true
sleep(25) # bigger than sleep_delay
raise 'STOP' # StandardError instead of SignalException to engage normal rescue blocks
end # No #join, so this thread will be killed if main process finishes first.
end
end
prepend(HandleSigterm)
end
end