BUGFIX: Prevent deletion of system-critical Quartz jobs#137
Conversation
Greptile SummaryThis PR introduces a guard ( Confidence Score: 5/5Safe to merge — the core guard logic is correct and directly addresses the critical bug. Remaining findings are style and a pre-existing trigger cleanup gap. The name-based check in IsJobAllowedToBeDeleted is correct: JobKeyProvider uses typeof(TJob).Name as the key name, which equals nameof(RunWorkflowJob) / nameof(ResumeWorkflowJob) at both registration and execution time. All existing direct DeleteJob call sites have been updated. The only open items are a P2 placement/style issue and a pre-existing trigger cleanup gap that this PR does not worsen. RunWorkflowJob.cs — the WorkflowGraphNotFoundException catch block should unschedule the specific trigger (context.Trigger.Key) now that job deletion is guarded.
|
| Filename | Overview |
|---|---|
| src/modules/scheduling/Elsa.Scheduling.Quartz/Handlers/QuartzDeleteJobHandler.cs | New static extension-method helper that guards against system-critical job deletion; correctly uses nameof() to match runtime job key names, but is placed in a Handlers folder rather than the existing Extensions pattern and is missing a trailing newline. |
| src/modules/scheduling/Elsa.Scheduling.Quartz/Jobs/RunWorkflowJob.cs | Two DeleteJob call sites updated to use the new guarded extension method; prevents catastrophic job definition deletion on WorkflowGraphNotFoundException and generic exceptions, but the specific firing trigger is not unscheduled in the WorkflowGraphNotFoundException case. |
| src/modules/scheduling/Elsa.Scheduling.Quartz/Jobs/ResumeWorkflowJob.cs | Single DeleteJob call site updated to use the new guarded extension method; straightforward and correct. |
Sequence Diagram
sequenceDiagram
participant Trigger as Quartz Trigger
participant RWJ as RunWorkflowJob / ResumeWorkflowJob
participant Guard as QuartzDeleteJobHandler.DeleteJob()
participant Scheduler as IScheduler
Trigger->>RWJ: Execute()
RWJ->>RWJ: Exception caught
RWJ->>Guard: context.DeleteJob(context.JobDetail.Key)
Guard->>Guard: IsJobAllowedToBeDeleted(jobKey.Name)?
alt name == RunWorkflowJob or ResumeWorkflowJob
Guard-->>RWJ: Delete skipped (system-critical job protected)
else other job name
Guard->>Scheduler: scheduler.DeleteJob(jobKey)
Scheduler-->>Guard: deleted
Guard-->>RWJ: Deleted
end
Reviews (1): Last reviewed commit: "added doc" | Re-trigger Greptile
|
@dotnet-policy-service agree company="Nuvotex" |
There was a problem hiding this comment.
Pull request overview
Fixes Quartz scheduler behavior where exceptions in system-critical workflow jobs can cause the durable RunWorkflowJob / ResumeWorkflowJob definitions to be removed, stalling all Quartz-based scheduling.
Changes:
- Route job deletion through a new helper that blocks deletion of
RunWorkflowJobandResumeWorkflowJob. - Update
RunWorkflowJobandResumeWorkflowJobto use the new deletion helper.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/modules/scheduling/Elsa.Scheduling.Quartz/Jobs/RunWorkflowJob.cs | Switches job deletion calls to a guarded helper method. |
| src/modules/scheduling/Elsa.Scheduling.Quartz/Jobs/ResumeWorkflowJob.cs | Switches job deletion calls to a guarded helper method. |
| src/modules/scheduling/Elsa.Scheduling.Quartz/Handlers/QuartzDeleteJobHandler.cs | Introduces guarded deletion extension to prevent removing system-critical durable jobs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… class and changed behaviour for WorklfowGraphNotFoundException
|
Any movement on this PR? |
A bug in the Quartz scheduler causes RunWorkflowJob and ResumeWorkflowJob to be deleted from the Quartz job tables when an exception occurs, which leaves cron and event-driven workflows in a complete standstill (plus several other bugs) #101
This behavior doesn't exist in the code for
hangfire.RunWorkflowJobandResumeWorkflowJobare getting registered at the start of the workflowserver. The server assumes that both jobs are present in the tables at any time.Adding them back after the delete has happend is not practicable, resulting in new bugs and unpredictable behavior.
Solution would be to prevent them from being deleted.