The intuition is the more options have 'y', more the kernel size is.
I have an older implementation here: https://nbviewer.jupyter.org/github/TuxML/ProjetIrma/blob/dev/miscellaneous/csv_kernels/TUXML-basic.ipynb
I am porting it right now.
Then we will have to experiment with and without this hand-crafted feature (does it pay off wrt accuracy?)
@HugoJPMartin can we assume that all values are encoding in the same way (eg 'n' is 0, 'y' is 1, 'm' is 2)? what's the encoding?