Use multiple CMPXCHG16B locks instead of 1 global lock by OFFTKP · Pull Request #527 · OFFTKP/felix86

OFFTKP · 2026-05-27T16:32:26Z

CMPXCHG16B would previously use a single global lock to provide some (but not perfect) atomicity to the instruction, since we don't have hardware support. This would cause high contention. In some games, the blocks with the lock would take up 80% of the CPU and lag immensely.

This PR implements a fast per-address spinlock. A random spinlock out of 256 is picked based on a hash of the address. This greatly reduces collisions and improves performance in affected games by a lot. If two threads perform CAS on the same address, the same spinlock is picked, providing the same atomicity guarantees as before, assuming all the instructions are aligned.

OFFTKP added 6 commits May 27, 2026 19:11

.

1af73fa

.

02d5976

.

60fae31

.

02bcf80

.

f0d9361

.

0e0c722

OFFTKP merged commit 302eb9a into master May 27, 2026
2 checks passed

OFFTKP deleted the cas branch May 27, 2026 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use multiple CMPXCHG16B locks instead of 1 global lock#527

Use multiple CMPXCHG16B locks instead of 1 global lock#527
OFFTKP merged 6 commits into
masterfrom
cas

OFFTKP commented May 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

OFFTKP commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

OFFTKP commented May 27, 2026 •

edited

Loading