Skip to content

BUGFIX: Prevent deletion of system-critical Quartz jobs#137

Merged
sfmskywalker merged 5 commits into
elsa-workflows:mainfrom
kk-nuv:quartz-delete-job-fix
May 13, 2026
Merged

BUGFIX: Prevent deletion of system-critical Quartz jobs#137
sfmskywalker merged 5 commits into
elsa-workflows:mainfrom
kk-nuv:quartz-delete-job-fix

Conversation

@kk-nuv
Copy link
Copy Markdown
Contributor

@kk-nuv kk-nuv commented Apr 15, 2026

A bug in the Quartz scheduler causes RunWorkflowJob and ResumeWorkflowJob to be deleted from the Quartz job tables when an exception occurs, which leaves cron and event-driven workflows in a complete standstill (plus several other bugs) #101

This behavior doesn't exist in the code for hangfire.

RunWorkflowJob and ResumeWorkflowJob are getting registered at the start of the workflowserver. The server assumes that both jobs are present in the tables at any time.
Adding them back after the delete has happend is not practicable, resulting in new bugs and unpredictable behavior.

Solution would be to prevent them from being deleted.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 15, 2026

Greptile Summary

This PR introduces a guard (QuartzDeleteJobHandler) to prevent RunWorkflowJob and ResumeWorkflowJob from deleting themselves from the Quartz job store when an exception occurs. The fix correctly routes existing context.Scheduler.DeleteJob calls through a new extension method that short-circuits when the job key matches either system-critical job type, addressing the catastrophic bug where a single failing workflow execution could remove the shared durable job definition and stall all cron/event-driven workflows.

Confidence Score: 5/5

Safe to merge — the core guard logic is correct and directly addresses the critical bug. Remaining findings are style and a pre-existing trigger cleanup gap.

The name-based check in IsJobAllowedToBeDeleted is correct: JobKeyProvider uses typeof(TJob).Name as the key name, which equals nameof(RunWorkflowJob) / nameof(ResumeWorkflowJob) at both registration and execution time. All existing direct DeleteJob call sites have been updated. The only open items are a P2 placement/style issue and a pre-existing trigger cleanup gap that this PR does not worsen.

RunWorkflowJob.cs — the WorkflowGraphNotFoundException catch block should unschedule the specific trigger (context.Trigger.Key) now that job deletion is guarded.

Important Files Changed

Filename Overview
src/modules/scheduling/Elsa.Scheduling.Quartz/Handlers/QuartzDeleteJobHandler.cs New static extension-method helper that guards against system-critical job deletion; correctly uses nameof() to match runtime job key names, but is placed in a Handlers folder rather than the existing Extensions pattern and is missing a trailing newline.
src/modules/scheduling/Elsa.Scheduling.Quartz/Jobs/RunWorkflowJob.cs Two DeleteJob call sites updated to use the new guarded extension method; prevents catastrophic job definition deletion on WorkflowGraphNotFoundException and generic exceptions, but the specific firing trigger is not unscheduled in the WorkflowGraphNotFoundException case.
src/modules/scheduling/Elsa.Scheduling.Quartz/Jobs/ResumeWorkflowJob.cs Single DeleteJob call site updated to use the new guarded extension method; straightforward and correct.

Sequence Diagram

sequenceDiagram
    participant Trigger as Quartz Trigger
    participant RWJ as RunWorkflowJob / ResumeWorkflowJob
    participant Guard as QuartzDeleteJobHandler.DeleteJob()
    participant Scheduler as IScheduler

    Trigger->>RWJ: Execute()
    RWJ->>RWJ: Exception caught
    RWJ->>Guard: context.DeleteJob(context.JobDetail.Key)
    Guard->>Guard: IsJobAllowedToBeDeleted(jobKey.Name)?
    alt name == RunWorkflowJob or ResumeWorkflowJob
        Guard-->>RWJ: Delete skipped (system-critical job protected)
    else other job name
        Guard->>Scheduler: scheduler.DeleteJob(jobKey)
        Scheduler-->>Guard: deleted
        Guard-->>RWJ: Deleted
    end
Loading

Reviews (1): Last reviewed commit: "added doc" | Re-trigger Greptile

Comment thread src/modules/scheduling/Elsa.Scheduling.Quartz/Handlers/QuartzDeleteJobHandler.cs Outdated
@kk-nuv
Copy link
Copy Markdown
Contributor Author

kk-nuv commented Apr 15, 2026

@dotnet-policy-service agree company="Nuvotex"

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Quartz scheduler behavior where exceptions in system-critical workflow jobs can cause the durable RunWorkflowJob / ResumeWorkflowJob definitions to be removed, stalling all Quartz-based scheduling.

Changes:

  • Route job deletion through a new helper that blocks deletion of RunWorkflowJob and ResumeWorkflowJob.
  • Update RunWorkflowJob and ResumeWorkflowJob to use the new deletion helper.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/modules/scheduling/Elsa.Scheduling.Quartz/Jobs/RunWorkflowJob.cs Switches job deletion calls to a guarded helper method.
src/modules/scheduling/Elsa.Scheduling.Quartz/Jobs/ResumeWorkflowJob.cs Switches job deletion calls to a guarded helper method.
src/modules/scheduling/Elsa.Scheduling.Quartz/Handlers/QuartzDeleteJobHandler.cs Introduces guarded deletion extension to prevent removing system-critical durable jobs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/modules/scheduling/Elsa.Scheduling.Quartz/Handlers/QuartzDeleteJobHandler.cs Outdated
Comment thread src/modules/scheduling/Elsa.Scheduling.Quartz/Handlers/QuartzDeleteJobHandler.cs Outdated
@RyanTuckerN
Copy link
Copy Markdown

Any movement on this PR?

@sfmskywalker sfmskywalker merged commit 8c6dad7 into elsa-workflows:main May 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants