Skip to content

Feature request: Give jobs some time to finish before forwarding SIGTERM #1237

@sbull

Description

@sbull

I've been happily using DJ on Heroku for many years (as well as other queue adapters - resque, sidekiq).

I understand that Heroku sends SIGTERM to DJ during worker termination / restart, and that I can configure whether to immediately forward that to my jobs with Delayed::Worker.raise_signal_exceptions = :term.

Most of my jobs are very short-lived, and I'd prefer them to finish before being killed, so I normally don't want to immediately raise SIGTERM in my jobs, and instead wait for them to finish and DJ to clean up. But occasionally, something goes sideways - e.g. a 3rd-party web service is taking a long time to respond to requests, say > 30 seconds - and Heroku has to come along with a SIGKILL before DJ was able to finish and clean up.

What would be great is if I could configure DJ to give my jobs a few seconds of leeway to finish up after SIGTERM, then raise an error in them if they haven't finished in that timeframe so that DJ can clean up as well before the SIGKILL comes in. This should probably be configurable - perhaps by overloading Delayed::Worker.raise_signal_exceptions = 5 for a 5-second grace period, or a different config mechanism.

But to go further, what I've really found is that I don't actually want my jobs to get SignalException, because in general I'm not looking for that - I mostly handle default StandardError with a plain rescue => e. So perhaps a better mechanism would be to enable a block that is called after a configurable delay (probably should be bigger than sleep_delay?), so that I could choose whether to raise SignalException, 'TERM', or raise 'standard error to stop a job', or some other unknown requirement.

Without having support for this built-in, here is what I'm using to augment DJ (in an initializer). The stop method is the easiest to plug into, since that's what is being called in trap('TERM').

module Delayed
  class Worker
    module HandleSigterm
      def stop
        super
        Thread.new do
          Thread.current.abort_on_exception = true
          sleep(25) # bigger than sleep_delay
          raise 'STOP' # StandardError instead of SignalException to engage normal rescue blocks
        end # No #join, so this thread will be killed if main process finishes first.
      end
    end
    prepend(HandleSigterm)
  end
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions