Dear authors, I have some problems while reading the paper KNOWLEDGE FUSION OF LARGE LANGUAGE MODELS, I noticed there are two formats discribed the losses designed for training. I suppose the loss function should be minimized while the discrepancy function D(·) is minimized, but these formats show an opposite result?

