Skip to content

Make rxode2 more thread safe#992

Open
billdenney wants to merge 9 commits intomainfrom
parallel-updates
Open

Make rxode2 more thread safe#992
billdenney wants to merge 9 commits intomainfrom
parallel-updates

Conversation

@billdenney
Copy link
Contributor

This is part of an effort I'm trying to make focei estimation in nlmixr2 more thread safe.

billdenney and others added 9 commits March 2, 2026 17:47
…or thread safety

In parallel ODE solving, multiple threads reading/writing the shared
op->badSolve flag creates a race condition where one thread's failed
solve can incorrectly prevent other threads from continuing. This
introduces a localBadSolve variable in ind_indLin0, ind_liblsoda0,
and ind_lsoda0 that tracks bad-solve state per-individual, and removes
the shared op->badSolve check from iniSubject in par_solve.h.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These shared fields are written from multiple threads during parallel
ODE solving. Without synchronization this is undefined behavior in
C/C++. Use #pragma omp atomic write to ensure thread-safe stores.

Also refactor push*Dose allocation failure handling to use local flags
instead of writing op->badSolve inside critical sections then reading
it outside.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rf_errorcall is not thread-safe and will crash or corrupt R state when
called from OpenMP worker threads. Replace with the same NA_REAL
return pattern already used for backward==2 case, plus setting the
badSolve flag atomically to signal the error after the parallel region.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The abort flag is read by all threads and written by the master thread
without synchronization. Using volatile prevents the compiler from
caching the value in a register, ensuring threads see updates promptly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
rx_get_thread(mx) used <= which allowed index==mx (one past the end
of per-thread arrays sized to mx). Changed to < for correct bounds.

Also changed the overflow fallback from returning 0 (which collides
with thread 0's slot) to clamping to mx-1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
linCmtB used a single global stan::math::linCmtStan object and
associated Eigen matrices, making it unsafe for parallel execution.
Refactored to use the same per-thread vector pattern as linCmtA:
- Created linB_t struct holding all per-thread state
- Changed __linCmtB from a single object to std::vector<linB_t>
- Added ensureLinCmtB(nCores) called during solve setup
- linCmtB() now indexes __linCmtB by omp_get_thread_num()
- R-facing utility functions use element 0 (main thread)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move all 16 file-scope static variables in dop853.c into a dop853_ctx_t
struct defined in dop853.h.  Each call to dop853() now allocates the
context on the stack, making the solver reentrant and safe for concurrent
use from multiple threads.

Changes:
- Add dop853_ctx_t struct to dop853.h containing nfcn/nstep/naccpt/nrejct,
  hout/xold/xout, nrds/indir, yy1/k1-k10, and rcont1-rcont8
- dopcor() takes dop853_ctx_t* as first parameter, uses local aliases
  for array pointers to minimize diff in the computation body
- contd8() becomes static and takes dop853_ctx_t* (never called externally
  since all callers use iout=0)
- Remove accessor functions (nfcnRead, nstepRead, etc.) which were never
  called from outside dop853.c
- Public dop853() API is unchanged - callers need no modifications

This enables future parallelization of par_dop() with OpenMP, which was
previously blocked by the comment "This part CAN be parallelized, if dop
is thread safe..."

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts in src/handle_evid.cpp by keeping our thread-safe
version (atomic badSolve flag + NA_REAL return) over main's
Rf_errorcall calls, which are not safe from OpenMP parallel regions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The macro `#define J lcb.J` causes `lcb.J` on line 683 to expand to
`lcb.lcb.J`, which fails because linB_t has no member named lcb.
Use the macro name `J` directly instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattfidler
Copy link
Member

Some of these are more dangerous like the dop optimizations.

Claude (at least for me), isn't exactly always doing the right things 😄

@mattfidler
Copy link
Member

But they don't hurt. It just takes a review and testing I guess.

@billdenney
Copy link
Contributor Author

billdenney commented Mar 3, 2026

Yeah, I looked at each and they seemed reasonable to me, but I'm not sure that they are. I have several projects now that need faster estimation, so I thought, what faster way than making nlmixr2 faster! 😄

@mattfidler
Copy link
Member

mattfidler commented Mar 3, 2026

Well all the pooled estimation methods are now multi-threaded in the development version.

This issue with focei may be in n1qn1, lbfgsb3, nlmixr2est, rxode2 or a combination of all the packages.

The fixes here may make the threading more stable, but rxode2 is already thread safe for saem, nlm (and relevant methods), and simple simulation.

@john-harrold
Copy link

I think the key is to make tests such that you feel you're adequately plumbing the depths. That's what I've been doing. I'll take a function that I want Claude to work on an just build a battery of tests for it -- I have claude make the tests :). Then I have it do what I want with those tests run at the end to make sure it didn't break anything. I think the key is it may be difficult to do this with all those. Though, like I said, you can have Claude make them.

@mattfidler
Copy link
Member

For #991 I have re-prompted and it still doesn't get the code right yet... It has tests but it isn't producing the right behavior.

@john-harrold
Copy link

Are you using copilot? If you can, try pulling it to your personal computer and use claude code it's indescribably better. Copilot at Pfizer is still telling me to use R functions that dont exist.

@mattfidler
Copy link
Member

mattfidler commented Mar 3, 2026

I am using claude in the terminal. I will have to wait unitl tomorrow since I have maxed the usage again.

@mattfidler
Copy link
Member

(You can see that Bill is also using claude. In github it shows Bill and I are co-authoring with Claude).

@john-harrold
Copy link

Sorry. You mentioned copilot in the meeting the other day.

I'll try and see if I can help tomorrow after work. Can you point out the tests you're using?

@billdenney
Copy link
Contributor Author

And FYI, the reason I came here is that a crash with some of the work I'm having Claude do in nlmixr2est noted a crash in rxode2 (mainly around the unsuccessful integration flag). So, I thought that I'd do a general check rather than just the targeted check for the one issue already identified.

@mattfidler
Copy link
Member

It seems more reasonable now, but it doens't pass the test suite. I am asking for it to fix it. We will see what happens.

@mattfidler
Copy link
Member

This passes, asking for copilot review.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is part of a broader effort to make rxode2 (and by extension nlmixr2 FOCEI estimation) more thread-safe for parallel ODE solving under OpenMP.

Changes:

  • Refactored dop853 from file-scope static variables to a per-call context struct (dop853_ctx_t), making it fully reentrant.
  • Refactored linCmtB from a single global object to a per-thread std::vector<linB_t>, matching the existing linCmtA threading pattern.
  • Replaced reads of the shared op->badSolve flag mid-solve with per-individual localBadSolve variables, and protected writes to op->badSolve/op->naTime with #pragma omp atomic write; replaced non-thread-safe Rf_errorcall calls in hot-path functions.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/rxomp.h Fixed off-by-one in thread index check (<=<) and changed out-of-range fallback from slot 0 to last valid slot
src/rxData.cpp Added call to ensureLinCmtB(nCores) before parallel solve to pre-allocate per-thread linCmtB state
src/par_solve.h Removed early op->badSolve exit from iniSubject to prevent one thread's failure from blocking other threads' initialization
src/par_solve.cpp Added localBadSolve variables, volatile int abort, and _Pragma("omp atomic write") in badSolveExit macro
src/linCmt.cpp Replaced global __linCmtB singleton with a per-thread std::vector<linB_t>; added ensureLinCmtB
src/handle_evid.cpp Replaced Rf_errorcall (not thread-safe) with atomic write to op->badSolve + return NA
src/dop853.h Added dop853_ctx_t struct; removed declarations of deleted accessor functions
src/dop853.c Moved all static solver state into a local dop853_ctx_t on the stack; deleted global accessor functions
inst/include/rxode2parseHandleEvid.h Used allocFailed flag + #pragma omp atomic write instead of direct op->badSolve assignment inside/outside critical sections
inst/include/rxode2parseGetTime.h Replaced direct writes to op->badSolve/op->naTime with atomic writes via temporary variables

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

#define yp __linCmtByp
#define g __linCmtBg
// Per-thread linCmtB state (matching linCmtA pattern for thread safety)
linB_t &lcb = __linCmtB[omp_get_thread_num()];
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linCmtB function indexes into __linCmtB using omp_get_thread_num() directly, without any bounds check. If omp_get_thread_num() returns a value >= __linCmtB.size() (e.g., if the function is called from a context where more threads exist than nCores was set to, or if ensureLinCmtB was not called), this produces an out-of-bounds vector access and undefined behavior. The similar linCmtA function has the same pattern, but the safer rx_get_thread(op->cores) is used elsewhere in the codebase (e.g., in _setIndPointersByThread). Consider using rx_get_thread(__linCmtB.size()) or adding a bounds assertion here.

Suggested change
linB_t &lcb = __linCmtB[omp_get_thread_num()];
linB_t &lcb = __linCmtB[rx_get_thread(__linCmtB.size())];

Copilot uses AI. Check for mistakes.
Comment on lines 40 to +46
if (ISNA(ret)) {
op->badSolve=1;
if (op->naTime == 0) {
op->naTime = 1 + 10*cmt;
}
int newBadSolve = 1;
int newNaTime = 1 + 10*cmt;
#pragma omp atomic write
op->badSolve = newBadSolve;
#pragma omp atomic write
op->naTime = newNaTime;
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code had if (op->naTime == 0) { op->naTime = 1 + 10*cmt; } to preserve the first error's diagnostic code. The new code unconditionally overwrites op->naTime with every error occurrence. In parallel mode this means the error code used for the final error message (in rxode2_df.cpp:212-239) is non-deterministic — it may reflect a later/different error rather than the first one that triggered the bad solve. While op->badSolve just needs to be set once, the op->naTime diagnostic value ideally should preserve the first non-zero value to give a consistent error message. The same behavior change applies to getRate, getDur, and handleInfusionItem in this file.

Copilot uses AI. Check for mistakes.
Comment on lines 160 to +192
Several functions provide access to different values :

xRead x value for which the solution has been computed (x=xend after
successful return).

hRead Predicted step size of the last accepted step (useful for a subsequent
call to dop853).

nstepRead Number of used steps.
naccptRead Number of accepted steps.
nrejctRead Number of rejected steps.
nfcnRead Number of function calls.


*/


#include <stdio.h>
#include <limits.h>

typedef void (*FcnEqDiff)(int *nptr, double x, double *y, double *f);
typedef void (*SolTrait)(long int nr, double xold, double x, double* y, int *nptr, int* irtrn);

/* Context struct holding all solver state, enabling thread-safe reentrant calls.
Previously these were file-scope static variables in dop853.c. */
typedef struct {
long int nfcn, nstep, naccpt, nrejct;
double hout, xold, xout;
int nrds, *indir;
double *yy1, *k1, *k2, *k3, *k4, *k5, *k6, *k7, *k8, *k9, *k10;
double *rcont1, *rcont2, *rcont3, *rcont4;
double *rcont5, *rcont6, *rcont7, *rcont8;
} dop853_ctx_t;
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dop853.h documentation comment block (lines 160-171) still references the now-deleted accessor functions (xRead, hRead, nstepRead, naccptRead, nrejctRead, nfcnRead) and the comment at line 77 documents contd8(i,s) with the old two-argument signature, whereas the refactored contd8 is now static and takes three arguments (dop853_ctx_t *ctx, int ii, double x). This documentation section should be updated or removed to avoid confusion for future maintainers.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants