ENH: tighter parallelization


there is existing work on parallelizing KM computation for large N: #3. It needs to be studied thoroughly for computational and storage efficiencies.

Also, if the better internal data structure (other than the current `dict`) is feasible.