improved performance of racm.nextJobsRestricted UDF. using table vari…#83
improved performance of racm.nextJobsRestricted UDF. using table vari…#83
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves the performance of the racm.nextJobsRestricted UDF by replacing Common Table Expressions (CTEs) with table variables to address performance degradation in MSSQL when handling ~200 pending queries. The changes have already been deployed to the production SciServer@JHU RACM database.
- Replaced CTEs with table variables (
@prevand@pending) to optimize MSSQL query execution - Maintained the same functional logic for job queue management and user restrictions
- Preserved the final CTE for the result set calculation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,71 @@ | |||
| CREATE or ALTER function [racm].[nextJobsRestricted](@compmUUID varchar(64), @timeout real, | |||
| @interval real, @maxNum integer,@maxPerUser integer) | |||
There was a problem hiding this comment.
Inconsistent spacing in parameter list. Missing space after comma between @maxNum integer and @maxPerUser integer.
| @interval real, @maxNum integer,@maxPerUser integer) | |
| @interval real, @maxNum integer, @maxPerUser integer) |
| , sum(case when dj.finishedTime is null then @timeout -- should really be cast(datediff(second,startedTime, getDate()) as float) | ||
| else cast(datediff(second,dj.startedTime, dj.finishedtime) as float) | ||
| end ) as totTime | ||
| , sum((case when dj.finishedTime is null then @timeout -- should really be cast(datediff(second,startedTime, getDate()) as float) |
There was a problem hiding this comment.
The comment indicates that @timeout should be replaced with cast(datediff(second,startedTime, getDate()) as float) for running jobs. This TODO comment suggests incomplete implementation. Consider implementing the correct logic or removing the comment if the current approach is intentional.
| , sum(case when dj.finishedTime is null then @timeout -- should really be cast(datediff(second,startedTime, getDate()) as float) | |
| else cast(datediff(second,dj.startedTime, dj.finishedtime) as float) | |
| end ) as totTime | |
| , sum((case when dj.finishedTime is null then @timeout -- should really be cast(datediff(second,startedTime, getDate()) as float) | |
| , sum(case when dj.finishedTime is null then cast(datediff(second, dj.startedTime, getDate()) as float) | |
| else cast(datediff(second,dj.startedTime, dj.finishedtime) as float) | |
| end ) as totTime | |
| , sum((case when dj.finishedTime is null then cast(datediff_big(millisecond, dj.startedTime, getDate()) as float) |
| , sum((case when dj.finishedTime is null then @timeout -- should really be cast(datediff(second,startedTime, getDate()) as float) | ||
| else cast(datediff_big(millisecond,dj.startedTime, dj.finishedtime) as float) | ||
| end )/datediff_big(millisecond,dj.startedTime,@currentDate)) as usageWeight | ||
| from compm c , job dj |
There was a problem hiding this comment.
Using old-style comma-separated JOIN syntax instead of explicit INNER JOIN. Modern SQL should use INNER JOIN with an ON clause for better readability and to avoid accidental Cartesian products.
| insert into @pending | ||
| select dj.id, dj.submitterId, dj.computeDomainId, submitTime | ||
| , rank() over (partition by submitterId order by submitTime) as rankSubmitted | ||
| from compm c , job dj |
There was a problem hiding this comment.
Using old-style comma-separated JOIN syntax instead of explicit INNER JOIN. Modern SQL should use INNER JOIN with an ON clause for better readability and to avoid accidental Cartesian products.
components/java/racm/src/main/resources/db/migration/V34__Alter_nextJobsRestricted_UDF.sql
Outdated
Show resolved
Hide resolved
| CREATE or ALTER function [racm].[nextJobsRestricted](@compmUUID varchar(64), @timeout real, | ||
| @interval real, @maxNum integer,@maxPerUser integer) |
There was a problem hiding this comment.
[nitpick] Inconsistent spacing in function definition. The parameters should follow consistent formatting - there's a space before the comma on line 1 but not on lines 2-3.
| CREATE or ALTER function [racm].[nextJobsRestricted](@compmUUID varchar(64), @timeout real, | |
| @interval real, @maxNum integer,@maxPerUser integer) | |
| CREATE or ALTER function [racm].[nextJobsRestricted](@compmUUID varchar(64),@timeout real, | |
| @interval real,@maxNum integer,@maxPerUser integer) |
| , sum(case when dj.finishedTime is null then @timeout -- should really be cast(datediff(second,startedTime, getDate()) as float) | ||
| else cast(datediff(second,dj.startedTime, dj.finishedtime) as float) | ||
| end ) as totTime | ||
| , sum((case when dj.finishedTime is null then @timeout -- should really be cast(datediff(second,startedTime, getDate()) as float) |
There was a problem hiding this comment.
Duplicate TODO comment suggesting the same implementation change as line 25. This duplicated logic and comment should be consolidated or clarified.
| , sum(case when dj.finishedTime is null then @timeout -- should really be cast(datediff(second,startedTime, getDate()) as float) | |
| else cast(datediff(second,dj.startedTime, dj.finishedtime) as float) | |
| end ) as totTime | |
| , sum((case when dj.finishedTime is null then @timeout -- should really be cast(datediff(second,startedTime, getDate()) as float) | |
| -- TODO: For both totTime and usageWeight, should really use cast(datediff(second,startedTime, getDate()) as float) instead of @timeout when dj.finishedTime is null | |
| , sum(case when dj.finishedTime is null then @timeout | |
| else cast(datediff(second,dj.startedTime, dj.finishedtime) as float) | |
| end ) as totTime | |
| , sum((case when dj.finishedTime is null then @timeout |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,70 @@ | |||
| CREATE or ALTER function [racm].[nextJobsRestricted](@compmUUID varchar(64), @timeout real, | |||
There was a problem hiding this comment.
The SQL statement starts with "CREATE or ALTER" which should be "CREATE OR ALTER" (uppercase OR for consistency with other SQL keywords in the statement).
| CREATE or ALTER function [racm].[nextJobsRestricted](@compmUUID varchar(64), @timeout real, | |
| CREATE OR ALTER function [racm].[nextJobsRestricted](@compmUUID varchar(64), @timeout real, |
| declare @prev table (submitterId bigint, numJobs integer, numQueued integer, | ||
| numStarted integer, totTime float, usageWeight float) |
There was a problem hiding this comment.
Missing semicolon after the table variable declaration. While T-SQL allows this in some contexts, it's best practice to terminate statements with semicolons for consistency and clarity, especially when multiple statements follow.
| , sum((case when dj.finishedTime is null then @timeout | ||
| else cast(datediff_big(millisecond,dj.startedTime, dj.finishedtime) as float) | ||
| end )/datediff_big(millisecond,dj.startedTime,@currentDate)) as usageWeight |
There was a problem hiding this comment.
The usageWeight calculation has a potential division by zero issue. When dj.startedTime equals @CurrentDate (which could happen if a job just started), the datediff_big will return 0, causing a divide-by-zero error. Additionally, if startedTime is greater than @CurrentDate (future date), the result would be negative. Consider adding validation to ensure startedTime is less than @CurrentDate and the difference is greater than zero before performing the division.
| , sum((case when dj.finishedTime is null then @timeout | |
| else cast(datediff_big(millisecond,dj.startedTime, dj.finishedtime) as float) | |
| end )/datediff_big(millisecond,dj.startedTime,@currentDate)) as usageWeight | |
| , sum( | |
| case | |
| when datediff_big(millisecond, dj.startedTime, @currentDate) > 0 | |
| then (case when dj.finishedTime is null then @timeout | |
| else cast(datediff_big(millisecond,dj.startedTime, dj.finishedtime) as float) | |
| end ) / cast(datediff_big(millisecond,dj.startedTime,@currentDate) as float) | |
| else 0 | |
| end | |
| ) as usageWeight |
| from compm c | ||
| join job dj | ||
| on dj.computeDomainId=c.computeDomainId | ||
| and dj.status > 1 -- PENDING |
There was a problem hiding this comment.
The comment indicates filtering for jobs NOT in PENDING status (status > 1), but the actual logic seems inverted. If status = 1 means PENDING (as indicated by line 49), then status > 1 would correctly exclude PENDING jobs. However, the comment says "-- PENDING" which is misleading as it suggests these ARE pending jobs when they are NOT. Consider clarifying the comment to say "-- NOT PENDING" or "-- EXCLUDING PENDING" to match the actual filter logic.
| and dj.status > 1 -- PENDING | |
| and dj.status > 1 -- NOT PENDING |
| group by submitterid | ||
|
|
||
| declare @pending table (id bigint, submitterId bigint, computeDomainId bigint, | ||
| submitTime datetime, rankSubmitted integer) |
There was a problem hiding this comment.
Missing semicolon after the table variable declaration. While T-SQL allows this in some contexts, it's best practice to terminate statements with semicolons for consistency and clarity.
| submitTime datetime, rankSubmitted integer) | |
| submitTime datetime, rankSubmitted integer); |
| insert into @rt | ||
| select * -- RETURNS: jobId,submitterId, computeDomainId, submitTime , usageWeight, numQueued, numStarted, ranking | ||
| from final | ||
| where ranking <= @maxNum |
There was a problem hiding this comment.
Missing semicolon after the INSERT INTO statement. Adding a semicolon here would improve consistency and follow T-SQL best practices for statement termination.
| where ranking <= @maxNum | |
| where ranking <= @maxNum; |
…able iso common table expressions in single table. somehow MSSQL started performing very bad, even if only ~200 queries were pending.
ff1f7d9 to
236337e
Compare
…able iso common table expressions in single table. somehow MSSQL started performing very bad, even if only ~200 queries were pending.
The function has been updated inside SciServer@JHU RACM database already.