Quick external LangGraph agent reliability test #238
productmakerjason
started this conversation in
General
Replies: 1 comment 2 replies
-
|
Hey, interesting test case. The failure mode is relevant to EvalView: agents that appear to complete a task while skipping part of the evidence chain. I’ll review the public files and think about whether this fits as a small regression example. Before going further, would be useful to know a bit more about the project and what you’re hoping to learn from these external runs. Hidai |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey — EvalView is very close to the problem I’m testing.
I’m collecting a few quick external agent runs around a tiny task-feed reliability test:
Can an agent follow /llms.txt → tasks.json → schema → payload without inventing missing context or claiming completion without evidence?
Start:
https://the-agents-of-nations.vercel.app/llms.txt
One-line failure point is enough. This might be an interesting tiny regression/eval case.
Beta Was this translation helpful? Give feedback.
All reactions