ENH: Add free-threaded Python support#1580
Conversation
f1a2469 to
8398e31
Compare
|
cc @ngoldbaum too for the assessment over in #1515 (nvm the failures they're from #1581) |
8398e31 to
1e66268
Compare
|
It'd probably be good to drop |
| public: | ||
| Cache() {}; | ||
| bool get(size_t left, size_t right, double *mu, double *dist) const { | ||
| std::lock_guard<std::mutex> lock(mtx_); |
There was a problem hiding this comment.
I think you have an attached thread state here, since you don't detach inside RangeMedian_mu_dist, RangeMedian_mu, or RangeMedian_dist. That means this call can possibly lead to deadlocks with the interpreter on both the GIL-enabled and free-threaded build.
To avoid that, you need to detach from the interpreter before locking native mutexes.
That said, I'm not actually sure whether it makes sense to do that whole dance on the GIL-enabled, given that the GIL is held. So then you'd want to conditionally use locking on the free-threaded build, with #ifdef Py_GIL_DISABLED blocks.
I also suspect that despite the existence of the set method -- which I don't think is used at all from Python -- in practice this is a cache that is initalized once and then thereafter not modified, so using a single initialization API would probably make more sense than adding a mutex to the type, since you'd only have the possibility of blocking calls when the cache is created and not thereafter. Note that the C++ call_once API involves blocking calls, so you need to detach from the interpreter before calling any API that can block.
At a higher level - do you actually want to support sharing RangeMedian objects between threads? A simpler approach would be to not support that, replace the blocking calls to lock here with non-blocking try_lock calls. Then, if acquiring the lock fails, you can bubble up an error to Python saying that another thread is already using the RangeMedian.
Then it becomes the responsibility of the Python programmer to not share RangeMedian objects between threads. I don't think asv does that, so you'll never hit that error in practice until some future point when you want to add multithreaded parallelism using RangeMedian instances.
I'd also definitely add a multithreaded test. I bet if you wrote a multithreaded test that reads and writes to a RangeMedian from more than one thread using this current PR, you'd hit deadlocks. IMO it's always worthwhile to write tests for nontrivial code to handle multithreaded parallelism.
Declare Py_mod_gil with Py_MOD_GIL_NOT_USED so the extension does not force-enable the GIL on free-threaded builds. Thread safety audit and fixes: - Add std::mutex to Cache class so concurrent get/set from multiple threads on the same RangeMedian object is safe (per ngoldbaum airspeed-velocity#1515) - Fix __init__ to free old y/cache before reallocating, preventing a memory leak on reinit and a data race if __init__ is called concurrently on the same object The remaining state is safe: y vector is read-only after init, compute_weighted_median uses only stack locals, find_best_partition uses stack-local vectors, and module state is set once at import. Enable free-threaded wheel builds in cibuildwheel so cp313t and cp314t wheels are published. Closes airspeed-velocity#1515
1e66268 to
8f8b2bb
Compare
Declare
Py_mod_gilwithPy_MOD_GIL_NOT_USEDso the extension does not force-enable the GIL on free-threaded builds. The module holds no global mutable state (all data lives in per-object or per-module-state structs), so running without the GIL should be safe...Enable free-threaded wheel builds in cibuildwheel so cp313t and cp314t wheels are published.
Closes #1515.