-
-
Notifications
You must be signed in to change notification settings - Fork 182
Description
Mortal/libriichi/src/state/obs_repr.rs
Lines 149 to 165 in 0cff2b5
| for &score in &state.scores { | |
| let v = score.clamp(0, 100_000) as f32 / 100_000.; | |
| self.arr.fill(self.idx, v); | |
| self.idx += 1; | |
| match self.version { | |
| 2 | 3 => IntegerEncoder::new(score as usize / 100, 500) | |
| .rbf_intervals(10) | |
| .encode(&mut self), | |
| 4 => { | |
| let v = score.clamp(0, 30_000) as f32 / 30_000.; | |
| self.arr.fill(self.idx, v); | |
| self.idx += 1; | |
| } | |
| _ => (), | |
| } | |
| } |
It has been observed that while Mortal will never accept a ron/tsumo that locks it into 4th place in All Last (fn rule_based_agari_slow), it will still build the hand in ways that make it impossible to overtake in score, especially when it discovers that it would fall short by a small number of points, which points to the input encoding format being to blame.
In order to prevent this, score should be presented to the model in a format that emphasizes the overtake thresholds, rather than as integer raw scores.
My initial (bad) sketch of this idea went something like:
(mirrored in negative)
0 equal score
1/
2/8: 6500 points, 3-40 nondealer-nondealer tsumo diff
3/8: 10k points, nondealer-nondealer mangan tsumo diff
24+1/64: 10400
24+2/64: 11800
24+4/64: 12000
24+5/64: 12800
24+7/64: 13600
4/8: 15k points, nondealer-nondealer haneman tsumo diff
5/8: 20k points, nondealer-nondealer baiman tsumo diff
6/8: 30k points, nondealer-nondealer sanbaiman tsumo diff
7/8: 40k points, nondealer-nondealer yakuman tsumo diff
+1.0: 96100 points, dealer yakuman direct hit not enough to overtake (e.g. entire table at 100 points)
now that i've written that out, it seems unlikely that a human-designed score threshold would have any chance of being the optimal encoding.
I think an better idea would be to train an encoder on score differences somehow, plot the curve, and hardcode an estimation of the curve?