Hi, and thank you for maintaining awesome-mlops. It has been a very useful starting point for many teams.
I wanted to ask whether WFGY might fit somewhere in this list, probably in the area of testing, evaluation, or observability for LLM based systems.
WFGY is an open source framework that tries to make large language models more auditable and debuggable at the text layer. It currently has three public stages:
-
WFGY 1.0 is a PDF that treats LLMs as self-repairing systems. It defines a loop of modules on top of any base model and reports experimental results on standard benchmarks. The emphasis is on making the reasoning process explicit and reproducible.
-
WFGY 2.0 builds a tension style metric and a 16 case ProblemMap that target real engineering failures, for example RAG pipelines that behave randomly, vector store fragmentation, and deployment order issues. It is used as a diagnostic checklist when systems misbehave in production.
-
WFGY 3.0 · Singularity Demo is a TXT pack that can be read by any LLM that supports file input. It acts as a long horizon stress test with 131 S-class questions. Models read and respond inside this TXT based environment, and their drift or collapse can be observed over long sessions. The same TXT can be replayed across different models or pipelines.
All stages are in one GitHub repo under MIT license, with logs and artifacts designed so that other teams can reproduce the behavior or plug it into their own evaluation pipelines.
If you think this is a fit for awesome-mlops, I would be happy to create a small PR to add a concise entry in the most relevant section. If not, that is also totally fine and I appreciate the time you spent reading this.
Thanks again for curating this resource.
Hi, and thank you for maintaining awesome-mlops. It has been a very useful starting point for many teams.
I wanted to ask whether WFGY might fit somewhere in this list, probably in the area of testing, evaluation, or observability for LLM based systems.
WFGY is an open source framework that tries to make large language models more auditable and debuggable at the text layer. It currently has three public stages:
WFGY 1.0 is a PDF that treats LLMs as self-repairing systems. It defines a loop of modules on top of any base model and reports experimental results on standard benchmarks. The emphasis is on making the reasoning process explicit and reproducible.
WFGY 2.0 builds a tension style metric and a 16 case ProblemMap that target real engineering failures, for example RAG pipelines that behave randomly, vector store fragmentation, and deployment order issues. It is used as a diagnostic checklist when systems misbehave in production.
WFGY 3.0 · Singularity Demo is a TXT pack that can be read by any LLM that supports file input. It acts as a long horizon stress test with 131 S-class questions. Models read and respond inside this TXT based environment, and their drift or collapse can be observed over long sessions. The same TXT can be replayed across different models or pipelines.
All stages are in one GitHub repo under MIT license, with logs and artifacts designed so that other teams can reproduce the behavior or plug it into their own evaluation pipelines.
If you think this is a fit for awesome-mlops, I would be happy to create a small PR to add a concise entry in the most relevant section. If not, that is also totally fine and I appreciate the time you spent reading this.
Thanks again for curating this resource.