Matt Barnes review several measures for evaluating ER in a tech report. We have already implemented most of them, but could consider adding:
- cluster precision, recall, F-score
- closest cluster precision, recall, F-score
- purity and K measure
- generalized merge distance