Mortal slightly miscalculating overtake scores leading to wrong hand building

<https://github.com/Equim-chan/Mortal/blob/0cff2b52982be5b1163aa9a62fb01f03ce91e0d2/libriichi/src/state/obs_repr.rs#L149-L165>

It has been observed that while Mortal will never accept a ron/tsumo that locks it into 4th place in All Last ([`fn rule_based_agari_slow`](https://github.com/Equim-chan/Mortal/blob/0cff2b52982be5b1163aa9a62fb01f03ce91e0d2/libriichi/src/state/agent_helper.rs#L291-L330)), it will still **build the hand** in ways that make it impossible to overtake in score, especially when it discovers that it would fall short **by a small number of points**, which points to the input encoding format being to blame.

In order to prevent this, score should be presented to the model in a format that emphasizes the overtake thresholds, rather than as integer raw scores.

<details><summary>My initial (bad) sketch of this idea went something like:</summary>

```
(mirrored in negative)
0 equal score
1/
2/8: 6500 points, 3-40 nondealer-nondealer tsumo diff
3/8: 10k points, nondealer-nondealer mangan tsumo diff
24+1/64: 10400
24+2/64: 11800
24+4/64: 12000
24+5/64: 12800
24+7/64: 13600
4/8: 15k points, nondealer-nondealer haneman tsumo diff
5/8: 20k points, nondealer-nondealer baiman tsumo diff
6/8: 30k points, nondealer-nondealer sanbaiman tsumo diff
7/8: 40k points, nondealer-nondealer yakuman tsumo diff
+1.0: 96100 points, dealer yakuman direct hit not enough to overtake (e.g. entire table at 100 points)
```
</details>

now that i've written that out, it seems unlikely that a human-designed score threshold would have any chance of being the optimal encoding.

I think an better idea would be to train an encoder on score differences somehow, plot the curve, and hardcode an estimation of the curve?


	for &score in &state.scores {
	let v = score.clamp(0, 100_000) as f32 / 100_000.;
	self.arr.fill(self.idx, v);
	self.idx += 1;

	match self.version {
	2 \| 3 => IntegerEncoder::new(score as usize / 100, 500)
	.rbf_intervals(10)
	.encode(&mut self),
	4 => {
	let v = score.clamp(0, 30_000) as f32 / 30_000.;
	self.arr.fill(self.idx, v);
	self.idx += 1;
	}
	_ => (),
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mortal slightly miscalculating overtake scores leading to wrong hand building #111

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Mortal slightly miscalculating overtake scores leading to wrong hand building #111

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions