Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/good-egg.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ jobs:
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
scoring-model: v2
skip-known-contributors: 'false'
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ pip install good-egg[mcp] # With MCP server support
- YAML config key for scoring parameters is `graph_scoring` (not "pagerank").
- Environment variable overrides use `GOOD_EGG_` prefix.
- The `[mcp]` optional extra adds the `mcp` dependency for the MCP server.
- `skip_known_contributors` (default `true`) skips full scoring for PR authors who already have merged PRs in the target repo. The CLI `--force-score` flag and MCP `force_score` parameter override this.

## Important Rules

Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ project. See [Methodology](https://github.com/2ndSetAI/good-egg/blob/main/docs/m
| **LOW** | Little to no prior contribution history -- review manually |
| **UNKNOWN** | Insufficient data to produce a meaningful score |
| **BOT** | Detected bot account (e.g. dependabot, renovate) |
| **EXISTING_CONTRIBUTOR** | Author already has merged PRs in this repo -- scoring skipped |

## Configuration

Expand Down
9 changes: 8 additions & 1 deletion action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,20 +25,26 @@ inputs:
scoring-model:
description: 'Scoring model to use (v1 or v2)'
required: false
skip-known-contributors:
description: 'Skip scoring for authors with merged PRs in the repo (true/false)'
required: false

outputs:
score:
description: 'Normalized trust score (0.0 - 1.0)'
value: ${{ steps.score.outputs.score }}
trust-level:
description: 'Trust level (HIGH, MEDIUM, LOW, UNKNOWN, BOT)'
description: 'Trust level (HIGH, MEDIUM, LOW, UNKNOWN, BOT, EXISTING_CONTRIBUTOR)'
value: ${{ steps.score.outputs.trust-level }}
user:
description: 'GitHub username that was scored'
value: ${{ steps.score.outputs.user }}
scoring-model:
description: 'Scoring model that was used (v1 or v2)'
value: ${{ steps.score.outputs.scoring-model }}
skipped:
description: 'Whether scoring was skipped for an existing contributor (true/false)'
value: ${{ steps.score.outputs.skipped }}

runs:
using: 'composite'
Expand Down Expand Up @@ -69,6 +75,7 @@ runs:
INPUT_CHECK-RUN: ${{ inputs.check-run }}
INPUT_FAIL-ON-LOW: ${{ inputs.fail-on-low }}
INPUT_SCORING_MODEL: ${{ inputs.scoring-model }}
INPUT_SKIP_KNOWN_CONTRIBUTORS: ${{ inputs.skip-known-contributors }}
run: |
cd ${{ github.action_path }}
uv run python -m good_egg.action
Expand Down
13 changes: 13 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,11 @@ works.
# Scoring model selection: v1 (default) or v2
scoring_model: v1

# Skip scoring for authors who already have merged PRs in the target repo.
# When true (the default), existing contributors get an EXISTING_CONTRIBUTOR
# trust level without the full scoring pipeline running.
skip_known_contributors: true

# Graph-based scoring algorithm parameters
graph_scoring:
alpha: 0.85 # Damping factor (0-1)
Expand Down Expand Up @@ -146,6 +151,13 @@ used when `scoring_model` is set to `v2`.
merge_rate_weight * merge_rate + account_age_weight *
log(account_age_days + 1))`.

### skip_known_contributors

When `true` (the default), Good Egg performs a lightweight pre-check before
running the full scoring pipeline. If the PR author already has merged pull
requests in the target repository, scoring is skipped and the trust level is
set to `EXISTING_CONTRIBUTOR`. Set to `false` to always run full scoring.

### graph_scoring

Controls the graph-based scoring algorithm. The `alpha` parameter is the
Expand Down Expand Up @@ -204,6 +216,7 @@ The following environment variables override individual config values:
| `GOOD_EGG_MEDIUM_TRUST` | `thresholds.medium_trust` | float |
| `GOOD_EGG_HALF_LIFE_DAYS` | `recency.half_life_days` | int |
| `GOOD_EGG_SCORING_MODEL` | `scoring_model` | str (`v1` or `v2`) |
| `GOOD_EGG_SKIP_KNOWN_CONTRIBUTORS` | `skip_known_contributors` | bool (`true`/`false`) |

## Programmatic Configuration

Expand Down
32 changes: 29 additions & 3 deletions docs/github-action.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,17 @@ This posts a trust score comment on each pull request:
| `check-run` | No | `false` | Create a check run with the trust score |
| `fail-on-low` | No | `false` | Fail the action if trust level is LOW |
| `scoring-model` | No | `v1` | Scoring model: `v1` (Good Egg) or `v2` (Better Egg) |
| `skip-known-contributors` | No | `true` | Skip scoring for authors with merged PRs in the repo |

## Outputs

| Output | Description |
|--------|-------------|
| `score` | Normalized trust score (0.0 - 1.0) |
| `trust-level` | Trust level: HIGH, MEDIUM, LOW, UNKNOWN, or BOT |
| `trust-level` | Trust level: HIGH, MEDIUM, LOW, UNKNOWN, BOT, or EXISTING_CONTRIBUTOR |
| `user` | GitHub username that was scored |
| `scoring-model` | Scoring model used: `v1` (Good Egg) or `v2` (Better Egg) |
| `skipped` | Whether scoring was skipped for an existing contributor (`true`/`false`) |

## Custom Configuration

Expand Down Expand Up @@ -107,9 +109,11 @@ jobs:
echo "::warning::Low trust PR author -- manual review required"

- name: Auto-approve high trust
if: steps.egg.outputs.trust-level == 'HIGH'
if: >-
steps.egg.outputs.trust-level == 'HIGH' ||
steps.egg.outputs.trust-level == 'EXISTING_CONTRIBUTOR'
run: |
echo "High trust author -- consider fast-tracking review"
echo "Trusted author -- consider fast-tracking review"
```

## Strict Mode
Expand Down Expand Up @@ -153,6 +157,28 @@ PRs and repository metadata. To stay within rate limits:
- The built-in cache (SQLite-backed) avoids refetching data that has not
changed. Cache TTLs are configurable.

## Skipping Known Contributors

By default, Good Egg performs a lightweight check before the full scoring
pipeline: if the PR author already has merged PRs in the target repository,
scoring is skipped and the trust level is reported as `EXISTING_CONTRIBUTOR`.
This avoids unnecessary API calls for established contributors.

To disable this behaviour and always run full scoring:

```yaml
jobs:
score:
runs-on: ubuntu-latest
steps:
- uses: 2ndSetAI/good-egg@v0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
skip-known-contributors: 'false'
```

You can check whether scoring was skipped via the `skipped` output.

## Using Better Egg (v2)

To use the v2 scoring model, set the `scoring-model` input:
Expand Down
36 changes: 32 additions & 4 deletions docs/library.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,14 @@ async def main() -> None:
)
print(f"User: {result.user_login}")
print(f"Trust level: {result.trust_level}")
print(f"Score: {result.normalized_score:.2f}")
print(f"Merged PRs: {result.total_merged_prs}")
print(f"Unique repos: {result.unique_repos_contributed}")

if result.flags.get("scoring_skipped"):
pr_count = result.scoring_metadata.get("context_repo_merged_pr_count", 0)
print(f"Scoring skipped -- {pr_count} merged PRs in repo")
else:
print(f"Score: {result.normalized_score:.2f}")
print(f"Merged PRs: {result.total_merged_prs}")
print(f"Unique repos: {result.unique_repos_contributed}")

asyncio.run(main())
```
Expand Down Expand Up @@ -65,6 +70,29 @@ async def score_pr_author(
| `token` | `str \| None` | GitHub API token; falls back to `GITHUB_TOKEN` env var |
| `cache` | `object \| None` | `Cache` instance for response caching (see [Cache Usage](#cache-usage)) |

## Skipping Known Contributors

By default, `score_pr_author` checks whether the user already has merged PRs
in the target repository. If so, it returns immediately with a trust level of
`EXISTING_CONTRIBUTOR` without running the full scoring pipeline. To force
full scoring:

```python
from good_egg import GoodEggConfig, score_pr_author

config = GoodEggConfig(skip_known_contributors=False)
result = await score_pr_author(
login="octocat",
repo_owner="octocat",
repo_name="Hello-World",
config=config,
)
```

When scoring is skipped, `result.flags["scoring_skipped"]` is `True` and
`result.scoring_metadata["context_repo_merged_pr_count"]` contains the
number of merged PRs found.

## Custom Configuration

Pass a `GoodEggConfig` to customize scoring behaviour:
Expand Down Expand Up @@ -150,7 +178,7 @@ the following fields:
| `context_repo` | `str` | Repository used as scoring context |
| `raw_score` | `float` | Raw graph score before normalization |
| `normalized_score` | `float` | Normalized score (0.0 - 1.0) |
| `trust_level` | `TrustLevel` | HIGH, MEDIUM, LOW, UNKNOWN, or BOT |
| `trust_level` | `TrustLevel` | HIGH, MEDIUM, LOW, UNKNOWN, BOT, or EXISTING_CONTRIBUTOR |
| `percentile` | `float` | Percentile rank (0.0 - 1.0) |
| `account_age_days` | `int` | Age of the GitHub account in days |
| `total_merged_prs` | `int` | Total number of merged pull requests |
Expand Down
3 changes: 3 additions & 0 deletions docs/mcp-server.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ Returns the full trust score as JSON, including all fields from the
| `username` | `string` | Yes | GitHub username to score |
| `repo` | `string` | Yes | Target repository in `owner/repo` format |
| `scoring_model` | `string` | No | Scoring model: `v1` (Good Egg, default) or `v2` (Better Egg) |
| `force_score` | `boolean` | No | Force full scoring even for known contributors (default: `false`) |

**Returns:** Full `TrustScore` JSON with all fields (user_login,
context_repo, raw_score, normalized_score, trust_level, percentile,
Expand All @@ -102,6 +103,7 @@ Returns a compact summary suitable for quick checks.
| `username` | `string` | Yes | GitHub username to check |
| `repo` | `string` | Yes | Target repository in `owner/repo` format |
| `scoring_model` | `string` | No | Scoring model: `v1` (Good Egg, default) or `v2` (Better Egg) |
| `force_score` | `boolean` | No | Force full scoring even for known contributors (default: `false`) |

**Returns (v1):**

Expand Down Expand Up @@ -142,6 +144,7 @@ Returns an expanded breakdown with contributions, flags, and metadata.
| `username` | `string` | Yes | GitHub username to analyse |
| `repo` | `string` | Yes | Target repository in `owner/repo` format |
| `scoring_model` | `string` | No | Scoring model: `v1` (Good Egg, default) or `v2` (Better Egg) |
| `force_score` | `boolean` | No | Force full scoring even for known contributors (default: `false`) |

**Returns (v1):**

Expand Down
4 changes: 4 additions & 0 deletions examples/.good-egg.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Good Egg configuration
# Copy this file to your repository root as .good-egg.yml

# Skip scoring for authors who already have merged PRs in the target repo.
# Set to false to always run full scoring.
# skip_known_contributors: true

# Graph-based scoring algorithm parameters
graph_scoring:
alpha: 0.85 # Damping factor (0-1)
Expand Down
11 changes: 8 additions & 3 deletions examples/library_usage.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,14 @@ async def main() -> None:
)
print(f"User: {result.user_login}")
print(f"Trust level: {result.trust_level}")
print(f"Score: {result.normalized_score:.2f}")
print(f"Merged PRs: {result.total_merged_prs}")
print(f"Unique repos: {result.unique_repos_contributed}")

if result.flags.get("scoring_skipped"):
pr_count = result.scoring_metadata.get("context_repo_merged_pr_count", 0)
print(f"Scoring skipped -- {pr_count} merged PRs in repo")
else:
print(f"Score: {result.normalized_score:.2f}")
print(f"Merged PRs: {result.total_merged_prs}")
print(f"Unique repos: {result.unique_repos_contributed}")


if __name__ == "__main__":
Expand Down
30 changes: 22 additions & 8 deletions src/good_egg/action.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from good_egg.formatter import format_check_run_summary, format_markdown_comment
from good_egg.github_client import GitHubClient
from good_egg.models import TrustLevel
from good_egg.scorer import TrustScorer
from good_egg.scorer import score_pr_author


async def run_action() -> None:
Expand All @@ -41,6 +41,10 @@ async def run_action() -> None:
should_comment = os.environ.get("INPUT_COMMENT", "true").lower() == "true"
should_check_run = os.environ.get("INPUT_CHECK-RUN", "false").lower() == "true"
fail_on_low = os.environ.get("INPUT_FAIL-ON-LOW", "false").lower() == "true"
skip_known_input = (
os.environ.get("INPUT_SKIP-KNOWN-CONTRIBUTORS")
or os.environ.get("INPUT_SKIP_KNOWN_CONTRIBUTORS")
)

if not token:
print("::error::GITHUB_TOKEN is required")
Expand Down Expand Up @@ -78,16 +82,25 @@ async def run_action() -> None:
)
if scoring_model_input and scoring_model_input in ("v1", "v2"):
config = config.model_copy(update={"scoring_model": scoring_model_input})
cache = Cache(ttls=config.cache_ttl.to_seconds())

async with GitHubClient(token=token, config=config, cache=cache) as client:
user_data = await client.get_user_contribution_data(
pr_author, context_repo=repository
if skip_known_input is not None:
config = config.model_copy(
update={"skip_known_contributors": skip_known_input.lower() in (
"true", "1", "yes",
)}
)
cache = Cache(ttls=config.cache_ttl.to_seconds())

scorer = TrustScorer(config)
score = scorer.score(user_data, repository)
score = await score_pr_author(
login=pr_author,
repo_owner=repo_owner,
repo_name=repo_name,
config=config,
token=token,
cache=cache,
)
skipped = score.flags.get("scoring_skipped", False)

async with GitHubClient(token=token, config=config, cache=cache) as client:
# Post/update PR comment
if should_comment:
comment_body = format_markdown_comment(score)
Expand Down Expand Up @@ -117,6 +130,7 @@ async def run_action() -> None:
_set_output("trust-level", score.trust_level.value)
_set_output("user", score.user_login)
_set_output("scoring-model", score.scoring_model)
_set_output("skipped", "true" if skipped else "false")

# Summary
pct = score.normalized_score * 100
Expand Down
9 changes: 9 additions & 0 deletions src/good_egg/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ def main() -> None:
default=None,
help="Scoring model (v1 or v2)",
)
@click.option(
"--force-score",
is_flag=True,
default=False,
help="Force full scoring even for known contributors",
)
def score(
username: str,
repo: str,
Expand All @@ -40,6 +46,7 @@ def score(
verbose: bool,
output_json: bool,
scoring_model: str | None,
force_score: bool,
) -> None:
"""Score a GitHub user's trustworthiness relative to a repository."""
if not token:
Expand All @@ -55,6 +62,8 @@ def score(
config = load_config(config_path)
if scoring_model is not None:
config = config.model_copy(update={"scoring_model": scoring_model})
if force_score:
config = config.model_copy(update={"skip_known_contributors": False})
cache = Cache(ttls=config.cache_ttl.to_seconds())

result = asyncio.run(
Expand Down
7 changes: 7 additions & 0 deletions src/good_egg/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ class V2Config(BaseModel):
class GoodEggConfig(BaseModel):
"""Top-level configuration composing all sub-configs."""
scoring_model: Literal["v1", "v2"] = "v1"
skip_known_contributors: bool = True
graph_scoring: GraphScoringConfig = Field(default_factory=GraphScoringConfig)
edge_weights: EdgeWeightConfig = Field(default_factory=EdgeWeightConfig)
recency: RecencyConfig = Field(default_factory=RecencyConfig)
Expand Down Expand Up @@ -213,4 +214,10 @@ def load_config(path: str | Path | None = None) -> GoodEggConfig:
if scoring_model is not None:
config_data["scoring_model"] = scoring_model

skip_known = os.environ.get("GOOD_EGG_SKIP_KNOWN_CONTRIBUTORS")
if skip_known is not None:
config_data["skip_known_contributors"] = skip_known.lower() in (
"true", "1", "yes",
)

return GoodEggConfig(**config_data)
Loading
Loading