Fix UB by checking for invalid moves in RL and play_game by LambertWSJ · Pull Request #45 · jserv/ttt

LambertWSJ · 2025-11-23T11:35:15Z

In play_rl(), skip writing to table when the agent returns -1 to avoid writing out of bounds.
Do the same in play_game() so negamax/MCTS/RL won't update the board with an invalid move.

- In play_rl(), skip writing to table when the agent returns -1 to avoid writing out of bounds. - Do the same in play_game() so negamax/MCTS/RL won't update the board with an invalid move.

visitorckw · 2025-11-23T12:29:33Z

How did you find this problem?
Was it caught during runtime or purely by code review?

visitorckw · 2025-11-23T12:35:40Z

IIUC, get_action_exploit() only returns -1 when the board is full, which causes the issue you described. However, in that scenario, play_game() should stop calling play_rl() because check_win() would return 'D'. Therefore, calling play_rl() when there are no empty spaces shouldn't happen.

If this error was observed during runtime, I suspect an implementation error elsewhere is the root cause. Adding a check here would likely just hide the real problem.

LambertWSJ · 2025-11-23T13:16:35Z

When porting the RL module to kxo[1], I occasionally encountered cases where move = -1. Using dmesg, I noticed that a bit shift had exceeded 31 bits. Because of this, I added last_e to get_action_exploit so that if no best action was found, it would return the last visited position.

Strangely, when I switched back to this commit[1] and removed the max_action == -1 check, the issue disappeared.

[1] kxo - Port reinforcement learning from ttt

LambertWSJ · 2025-11-23T13:36:11Z

I found that this error occurs when the state space isn’t fully initialized at the beginning during the porting of the RL module to kxo, which can cause move to become -1.

However, ttt initializes the entire state space from the beginning, ensuring that the agent knows every possible board state and preventing the move = -1 issue.

As this issue does not occur in the current implementation, this PR can be closed.

Fix UB by checking for invalid moves in RL and play_game

54a4fef

- In play_rl(), skip writing to table when the agent returns -1 to avoid writing out of bounds. - Do the same in play_game() so negamax/MCTS/RL won't update the board with an invalid move.

LambertWSJ closed this Nov 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix UB by checking for invalid moves in RL and play_game#45

Fix UB by checking for invalid moves in RL and play_game#45
LambertWSJ wants to merge 1 commit intojserv:mainfrom
LambertWSJ:fix_agent_ub

LambertWSJ commented Nov 23, 2025

Uh oh!

visitorckw commented Nov 23, 2025

Uh oh!

visitorckw commented Nov 23, 2025

Uh oh!

LambertWSJ commented Nov 23, 2025 •

edited

Loading

Uh oh!

LambertWSJ commented Nov 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LambertWSJ commented Nov 23, 2025

Uh oh!

visitorckw commented Nov 23, 2025

Uh oh!

visitorckw commented Nov 23, 2025

Uh oh!

LambertWSJ commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LambertWSJ commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LambertWSJ commented Nov 23, 2025 •

edited

Loading

LambertWSJ commented Nov 23, 2025 •

edited

Loading