-
Notifications
You must be signed in to change notification settings - Fork 0
AGNO pipeline prompt refinement #5
Copy link
Copy link
Open
Description
Problem: Currently, we have tested the LLM-as-a-judge for SQL query equivalence only on 56 queries (one query derived from each template). This has given us an average of 57%. In some reasoning, the judge LLM assigned a lower score because the generated query contained words like 'LIMIT 1000', or for concept relationships, the generated query used JOINS instead of 'maps to'.
Solution: Modify the database agent to remove the LIMIT keyword and to use for concept_relationships the 'maps to' approach, which is the way that the ground truth was crafted
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels