Fix UB by checking for invalid moves in RL and play_game#45
Fix UB by checking for invalid moves in RL and play_game#45LambertWSJ wants to merge 1 commit intojserv:mainfrom
Conversation
LambertWSJ
commented
Nov 23, 2025
- In play_rl(), skip writing to table when the agent returns -1 to avoid writing out of bounds.
- Do the same in play_game() so negamax/MCTS/RL won't update the board with an invalid move.
- In play_rl(), skip writing to table when the agent returns -1 to avoid writing out of bounds. - Do the same in play_game() so negamax/MCTS/RL won't update the board with an invalid move.
|
How did you find this problem? |
|
IIUC, get_action_exploit() only returns -1 when the board is full, which causes the issue you described. However, in that scenario, play_game() should stop calling play_rl() because check_win() would return 'D'. Therefore, calling play_rl() when there are no empty spaces shouldn't happen. If this error was observed during runtime, I suspect an implementation error elsewhere is the root cause. Adding a check here would likely just hide the real problem. |
|
When porting the RL module to kxo[1], I occasionally encountered cases where move = -1. Using dmesg, I noticed that a bit shift had exceeded 31 bits. Because of this, I added last_e to get_action_exploit so that if no best action was found, it would return the last visited position. Strangely, when I switched back to this commit[1] and removed the max_action == -1 check, the issue disappeared. |
|
I found that this error occurs when the state space isn’t fully initialized at the beginning during the porting of the RL module to kxo, which can cause move to become -1. However, ttt initializes the entire state space from the beginning, ensuring that the agent knows every possible board state and preventing the move = -1 issue. As this issue does not occur in the current implementation, this PR can be closed. |