prompt-evaluation

Here are 23 public repositories matching this topic...

loloMD / awesome_chainforge

Another day, another Awesome List repo. A comprehensive list of Chainforge-related content

awesome ai evaluation awesome-list model-evaluation gpt-4 large-language-models llm prompt-engineering llms chatgpt llmops prompt-injection prompt-evaluation tools-for-prompt-engineering auditing-models testing-prompts

Updated Oct 24, 2025

thunderous77 / GLaPE

Star

Official implementation for "GLaPE: Gold Label-agnostic Prompt Evaluation and Optimization for Large Language Models" (stay tuned & more will be updated)

large-language-models prompt-optimization prompt-evaluation

Updated Feb 6, 2024
Python

prompt-foundry / python-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for Python

python python3 open-ai llm prompt-engineering prompt-management llm-eval llm-evaluation prompt-evaluation

Updated Sep 17, 2024
Python

shinpr / rashomon

Sponsor

Star

Compare, improve, and verify prompt changes with evidence — not vibes.

developer-tools ai-tools llm prompt-engineering prompt-optimization prompt-evaluation claude-code

Updated Jan 29, 2026
Shell

prompt-foundry / typescript-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.

typescript gpt open-ai gpt-3 gpt-4 llm prompt-engineering llmops prompt-testing prompt-manager prompt-management llm-eval llm-test llm-ops llm-evaluation prompt-evaluation

Updated Nov 15, 2025
TypeScript

GuilhermeRuy97 / prompt-evaluation-langsmith

Star

A project to take a suboptimal prompt from Langsmith, enhance it, submit it again, and then reevaluate the results. #LangSmith #PromptEngineer

python prompt-engineering langsmith prompt-evaluation

Updated Feb 19, 2026
Python

tryingET / pi-evalset-lab

Star

pi extension for fixed-task-set eval runs and prompt/system comparisons with reproducible reports

background-processes llm-evaluation prompt-evaluation evalset pi-extension pi-package safety-governance fixed-task-set ux-observability context-codebase-mapping web-docs-retrieval review-quality-loops planning-orchestration subagents-parallelization model-prompt-management interactive-clis-editors skills-rules-packs paste-code-extraction

Updated Feb 24, 2026
TypeScript

DrixoT / Prompt-Optimization

Star

A Simple Prompt Optimization Using 3 different algorithms for testing.

prompt-engineering prompt-optimization prompt-evaluation

Updated Nov 4, 2025
Jupyter Notebook

syed-waleed-ahmed / LLM-as-Judge

Star

A Streamlit web app that uses a Groq-powered LLM (Llama 3) to act as an impartial judge for evaluating and comparing two model outputs. Supports custom criteria, presets like creativity and brand tone, and returns structured scores, explanations, and a winner. Built end-to-end with Python, Groq API, and Streamlit.

python code-evaluation a-b-testing text-evaluation groq streamlit model-benchmarking ai-automation ai-evaluation llm prompt-evaluation llama3 llm-judge output-evaluation scoring-framework

Updated Nov 24, 2025
Python

deans-code / eval

Star

Building a framework to run prompt evaluation tasks.

grader lm-studio prompt-evaluation prompt-eval gpt-oss-120b

Updated Jan 7, 2026
C#

prompt-foundry / ruby-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for Ruby.

ruby ruby-gem openai ruby-on-rails prompt-engineering prompt-manager prompt-management llm-eval llm-evaluation prompt-evaluation

Updated Jun 16, 2024

othercodes / prompt-lab

Star

Test prompt variants across LLM providers with LLM-as-judge evaluation

prompts llm prompt-engineering prompt-evaluation

Updated Feb 13, 2026
Python

MohsinCreed / LangfuseOllama

Star

Free, local Langfuse OSS setup with Ollama for LLM evaluation, scoring, and datasets.

docker open-source self-hosted free no-cost local-llm ollama langfuse llm-evaluation prompt-evaluation offline-ai llm-as-judge llm-observability ai-evals

Updated Feb 5, 2026
TypeScript

danielrosehill / LLM-Evaluation-Prompts

Star

A few prompts that I am storing in a repo for the purpose of running controlled experiments comparing and benchmarking different LLMs for defined use-cases

prompt-engineering prompt-evaluation prompt-eval prompt-benchmarking

Updated Dec 4, 2024
Python

prompt-foundry / java-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for Java.

java evaluation openai prompt-engineering prompt-manager prompt-management llm-evaluation prompt-evaluation

Updated Jun 16, 2024

prompt-foundry / kotlin-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for Kotlin.

kotlin open-ai llm prompt-engineering prompt-management llm-eval llm-evaluation prompt-evaluation

Updated Jun 16, 2024

Rickcau / ConsoleApp-Prompt-Testing

Star

testing ai azure evaluation prompt genai prompt-evaluation

Updated Jan 1, 2025
C#

michellepace / anthropic-model-compare

Star

Runs two simple test prompts against 5 Anthropic models. Visually compares speed, capability, costs.

python pandas plotly-express large-language-models prompt-evaluation anthropic-api

Updated Feb 20, 2025
Jupyter Notebook

prompt-foundry / dotnet-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for C# and .NET

csharp dotnet prompt prompt-engineering prompt-manager prompt-management llm-eval llm-evaluation prompt-evaluation

Updated Jun 16, 2024

genaivitbcommunity / prompt_analyzer

Star

A hybrid machine learning system for scoring LLM prompts. Features a BERT-based gatekeeper for structural validation and an LLM-based classifier to ensure semantic intent, delivering consistent empirical metrics for prompt engineering.

python nlp flask machine-learning transformers pytorch ensemble-learning bert intent-classification huggingface llm prompt-engineering langchain prompt-evaluation prompt-metrics

Updated Dec 17, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the prompt-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the prompt-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prompt-evaluation

Here are 23 public repositories matching this topic...

loloMD / awesome_chainforge

thunderous77 / GLaPE

prompt-foundry / python-sdk

shinpr / rashomon

prompt-foundry / typescript-sdk

GuilhermeRuy97 / prompt-evaluation-langsmith

tryingET / pi-evalset-lab

DrixoT / Prompt-Optimization

syed-waleed-ahmed / LLM-as-Judge

deans-code / eval

prompt-foundry / ruby-sdk

othercodes / prompt-lab

MohsinCreed / LangfuseOllama

danielrosehill / LLM-Evaluation-Prompts

prompt-foundry / java-sdk

prompt-foundry / kotlin-sdk

Rickcau / ConsoleApp-Prompt-Testing

michellepace / anthropic-model-compare

prompt-foundry / dotnet-sdk

genaivitbcommunity / prompt_analyzer

Improve this page

Add this topic to your repo