Need Complete Example for Running Evaluation

Hi! The current example (`local_BGE_local_LLM.py`) is too simplified - it only shows a basic query without any evaluation.

Could you provide a complete example that shows:
- How to load and evaluate on actual datasets (e.g., ComplexTR)
- How to calculate metrics (accuracy, F1, etc.)

Basically, how do I reproduce the paper's results?

Thanks!