Skip to content

Feat - Support self-hosted git sources#146

Open
momostallion wants to merge 8 commits into
LobsterTrap:mainfrom
StateFarmIns:fix-gitlab-self-hosted-sources
Open

Feat - Support self-hosted git sources#146
momostallion wants to merge 8 commits into
LobsterTrap:mainfrom
StateFarmIns:fix-gitlab-self-hosted-sources

Conversation

@momostallion
Copy link
Copy Markdown

@momostallion momostallion commented May 20, 2026

Summary

  • Support git+https:// and git+ssh:// URLs for lola market add — enables fetching marketplace catalogs from private/self-hosted git repos (GitLab, Gitea, etc.) by cloning via git instead of urlopen, inheriting the user's existing SSH keys and credential helpers
  • Expand GitSourceHandler to accept self-hosted git instances — previously only matched github.com, gitlab.com, and bitbucket.org; now accepts any HTTP(S) URL with a valid host as a potential git source
  • Auto-detect marketplace YAML in cloned repos, or accept explicit path via URL fragment (e.g. git+https://host/repo#path/to/market.yml); includes path traversal protection
  • Support SCP-style git URLs (git@host:org/repo.git) for marketplace sources — no git+ prefix required

Related Issues

Closes #125

Test Plan

  • pytest tests/test_marketplace_model.py — new tests for git-based marketplace fetch (git+https, git+ssh, SCP-style, fragment path, path traversal rejection)
  • pytest tests/test_sources.py — updated tests for self-hosted git source handling
  • pytest tests/test_marketplace_model.py::TestMarketplaceFromGitUrl::test_from_git_url_scp_style
  • Manual: lola market add my-market git+ssh://git@private-gitlab.example/org/marketplace.git clones and registers the marketplace
  • Manual: lola mod add git+https://private-gitlab.example/org/module.git works for self-hosted instances

Checklist

  • Tests pass (pytest)
  • Linting passes (ruff check src tests)
  • Type checking passes (ty check)

AI Disclosure

AI-assisted with GitHub Copilot (VS Code)

Summary by CodeRabbit

  • New Features

    • Support git-backed marketplace sources (git+https, git+ssh, SCP-style, and .git-suffixed HTTP(S))
    • Select a specific marketplace YAML via URL fragment
  • Behavior

    • Deterministic YAML discovery when no fragment is provided
    • Clearer errors for missing/ambiguous files and invalid schemes
    • Fragment path traversal is blocked; original git URL form preserved for updates
  • Tests

    • Added coverage for git sources, fragment selection, error paths, and self-hosted hosts

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Detect git-style URLs (git+, SCP, or .git-suffixed HTTP(S)), shallow/sparse-clone repositories, choose a marketplace YAML via fragment or deterministic discovery (with traversal guard), parse it into a Marketplace, broaden HTTP(S) git routing, and add tests for success, failure, and security cases.

Changes

Git Repository URL Support for Marketplace Loading

Layer / File(s) Summary
Marketplace git loading implementation
src/lola/models.py
Adds re import, _is_scp_style_git_url, updates Marketplace.from_url to route git+*, SCP-style, and .git-suffixed URLs to _from_git_url, expands invalid-scheme text, and implements _from_git_url (shallow/sparse clone, fragment handling with traversal guard, _pick_marketplace_yaml deterministic discovery, YAML parsing, cleanup).
Broadened Git URL detection
src/lola/parsers.py
GitSourceHandler.can_handle() now treats any HTTP(S) URL with a non-empty host (netloc) as a potential git source.
Marketplace git tests
tests/test_marketplace_model.py
Adds TestMarketplaceFromGitUrl with _mock_git_clone helper and tests for git+https/git+ssh/SCP success, HTTPS .git autodetection, fragment selection, clone failure, missing file, no YAML, ambiguous multiple YAMLs, fragment traversal blocking, and preservation of the git+ URL.
Git source handler tests
tests/test_sources.py
Adds tests for self-hosted HTTPS/git hosts and a negative test for URLs lacking a valid host; removes an older arbitrary-HTTP negative assertion.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant FromURL as Marketplace.from_url
  participant FromGit as Marketplace._from_git_url
  participant Git as subprocess.run_git
  participant Picker as Marketplace._pick_marketplace_yaml
  participant FS as Filesystem/YAML_Loader

  Caller->>FromURL: provide git+https:// / git+ssh:// / git@host:org/repo.git / https://... .git (`#fragment` optional)
  FromURL->>FromGit: delegate git-backed URL
  FromGit->>Git: git clone --depth 1 + sparse-checkout -> tempdir
  Git-->>FromGit: clone result (success/fail)
  alt fragment present
    FromGit->>FromGit: validate fragment (no ../) and resolve path
    FromGit->>FS: read YAML at fragment path
  else no fragment
    FromGit->>Picker: choose YAML (name.yml -> marketplace.yml -> single root YAML)
    Picker-->>FromGit: YAML path
    FromGit->>FS: read YAML at discovered path
  end
  FS-->>FromGit: parsed marketplace data
  FromGit->>FromGit: remove .git directory
  FromGit-->>Caller: Marketplace instance (marketplace.url retains original git+ prefix)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped to a repo, deep and wide,
cloned a branch where market secrets hide,
YAML found with careful stride,
fragments blocked from a sly tide,
now marketplaces bloom, git+ by my side.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main feature: support for self-hosted git sources is the primary change across multiple files.
Linked Issues check ✅ Passed All objectives from issue #125 are met: git repo linking [Marketplace.from_url], private repo auth via git [_from_git_url], and YAML auto-detection/fragment selection [_pick_marketplace_yaml].
Out of Scope Changes check ✅ Passed All changes align with issue #125 objectives: git URL handling in models.py/parsers.py, YAML selection logic, comprehensive test coverage, and subprocess robustness.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lola/models.py (1)

467-495: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SCP-style git URLs still get rejected here.

The new branch only catches git+... inputs. A marketplace URL like git@gitlab.internal:org/marketplace.git still falls through to urlparse(), which makes git@gitlab.internal look like the scheme, so Line 492 raises instead of cloning. That leaves one of the advertised URL forms unsupported.

Suggested direction
-        if url.startswith("git+"):
+        if url.startswith("git+") or _is_scp_style_git_url(url):
             return cls._from_git_url(url, name)
-        git_url = url[4:]
+        git_url = url[4:] if url.startswith("git+") else url

You'd need a small helper to recognize git@host:repo syntax and a matching test.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lola/models.py` around lines 467 - 495, The parser currently only handles
URLs starting with "git+" and treats scp-style git URLs (e.g.
git@host:org/repo.git) as an unknown scheme; add a small helper (e.g.
is_scp_git_url(url)) and call it alongside the existing git+ check so scp-style
URLs are routed to cls._from_git_url(url, name) (or normalized to a git+ssh://
form before calling _from_git_url). Update the branch in the code that checks
url.startswith("git+") to also detect scp-style syntax, and add a unit test
covering a git@host:org/repo.git marketplace URL to ensure cloning is attempted
rather than raising the "Marketplace URL must use ..." ValueError.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lola/models.py`:
- Around line 532-570: After cloning into repo_dir (the subprocess.run/clone_cmd
block) ensure .git is removed on all failure paths by moving the post-clone
processing (resolving yaml_file, calling cls._find_marketplace_yaml,
opening/parsing YAML, etc.) into a try/finally that always removes git_dir =
repo_dir / ".git" in the finally block (using shutil.rmtree(ignore_errors=True))
and re-raises any exception; this guarantees .git cleanup even if file_fragment
checks, yaml parsing, or file-not-found errors occur.

---

Outside diff comments:
In `@src/lola/models.py`:
- Around line 467-495: The parser currently only handles URLs starting with
"git+" and treats scp-style git URLs (e.g. git@host:org/repo.git) as an unknown
scheme; add a small helper (e.g. is_scp_git_url(url)) and call it alongside the
existing git+ check so scp-style URLs are routed to cls._from_git_url(url, name)
(or normalized to a git+ssh:// form before calling _from_git_url). Update the
branch in the code that checks url.startswith("git+") to also detect scp-style
syntax, and add a unit test covering a git@host:org/repo.git marketplace URL to
ensure cloning is attempted rather than raising the "Marketplace URL must use
..." ValueError.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fc82af7b-7c70-42dc-b319-fc46be468e0a

📥 Commits

Reviewing files that changed from the base of the PR and between 11e45d5 and 4039531.

📒 Files selected for processing (4)
  • src/lola/models.py
  • src/lola/parsers.py
  • tests/test_marketplace_model.py
  • tests/test_sources.py

Comment thread src/lola/models.py Outdated
@momostallion momostallion force-pushed the fix-gitlab-self-hosted-sources branch from f82abd0 to c471c22 Compare May 20, 2026 19:42
Copy link
Copy Markdown
Collaborator

@SecKatie SecKatie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it also possible for us to auto-detect non git+ prefixed urls like these?

  • git@github.com:LobsterTrap/lola.git
  • https://github.com/LobsterTrap/lola.git

This would fit with the lola mod add https://github.com/LobsterTrap/lola.git syntax. Maybe we can reuse the detector from that command to also detect this kind of url.

Comment thread src/lola/models.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/lola/parsers.py (1)

119-125: 💤 Low value

Broadened detection turns any HTTP(S) URL into a git source — consider the error UX.

With this change, GitSourceHandler.can_handle returns True for any HTTP(S) URL with a host that isn't a .zip/.tar* archive (since ZipUrlSourceHandler/TarUrlSourceHandler run first). That includes things like https://example.com/somefile, https://example.com/raw/market.yml, raw text files, HTML pages, etc., which will now be sent to git clone and fail with a generic Git clone failed: ... RuntimeError instead of UnsupportedSourceError.

That's likely intentional for self-hosted git support (you can't enumerate hosts), but two things worth confirming:

  1. The previous test_cannot_handle_arbitrary_http case (https://example.com/somefile) was removed rather than inverted — was that deliberate? It's the most visible behavior change of this PR for non-git HTTP URLs.
  2. The fallthrough error from a failed git clone on a non-git URL can be noisy (auth prompts, stderr from git). Consider whether fetch should detect "this clearly isn't a git repo" and re-raise as UnsupportedSourceError/SourceError with a clearer message, or whether the current behavior is acceptable since marketplace git+https:// / git+ssh:// schemes already give users an explicit opt-in path.

No change required if this is the desired UX; flagging so it's a conscious decision.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lola/parsers.py` around lines 119 - 125, GitSourceHandler.can_handle was
broadened to accept any http(s) URL with a host, causing non-git HTTP resources
to be handled by git and produce noisy errors; either revert the test change
(restore test_cannot_handle_arbitrary_http) or add a clear gate in fetch/git
handling: before calling git clone (in the code path that uses GitSourceHandler
and the fetch/git helper), run a lightweight git probe (e.g., git ls-remote or
equivalent remote-reachability check) and if it clearly isn’t a git repo, raise
UnsupportedSourceError/SourceError with a concise message; update
GitSourceHandler.can_handle or the fetch logic accordingly to keep archive
handlers unchanged (ZipUrlSourceHandler, TarUrlSourceHandler) and adjust tests
to assert the new behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lola/models.py`:
- Around line 562-566: The git subprocess calls (clone_cmd, sparse_checkout_cmd,
and ls_tree_cmd) must be hardened to avoid hanging: add a reasonable timeout
argument (e.g., timeout=30 or configurable) to each subprocess.run call, set
stdin to subprocess.DEVNULL to prevent interactive prompts, and pass an
environment that includes GIT_TERMINAL_PROMPT=0 (preserving existing env) so git
will fail instead of prompting for credentials; keep capture_output=True and
text=True and optionally add check=True if you want exceptions on non-zero exit.
Update the subprocess.run invocations that execute clone_cmd,
sparse_checkout_cmd, and ls_tree_cmd accordingly.

---

Nitpick comments:
In `@src/lola/parsers.py`:
- Around line 119-125: GitSourceHandler.can_handle was broadened to accept any
http(s) URL with a host, causing non-git HTTP resources to be handled by git and
produce noisy errors; either revert the test change (restore
test_cannot_handle_arbitrary_http) or add a clear gate in fetch/git handling:
before calling git clone (in the code path that uses GitSourceHandler and the
fetch/git helper), run a lightweight git probe (e.g., git ls-remote or
equivalent remote-reachability check) and if it clearly isn’t a git repo, raise
UnsupportedSourceError/SourceError with a concise message; update
GitSourceHandler.can_handle or the fetch logic accordingly to keep archive
handlers unchanged (ZipUrlSourceHandler, TarUrlSourceHandler) and adjust tests
to assert the new behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0da64b70-6431-455d-91cb-5cca29cd4061

📥 Commits

Reviewing files that changed from the base of the PR and between f82abd0 and 73c0931.

📒 Files selected for processing (4)
  • src/lola/models.py
  • src/lola/parsers.py
  • tests/test_marketplace_model.py
  • tests/test_sources.py

Comment thread src/lola/models.py
@momostallion momostallion requested a review from SecKatie May 21, 2026 18:07
Previously, GitSourceHandler.can_handle() only matched github.com,
gitlab.com, and bitbucket.org. This rejected self-hosted GitLab/GitHub/
Gitea instances.

Now accepts any HTTP(S) URL with a valid host as a potential git source.
Archive URLs (.zip, .tar) are still handled first by their dedicated
handlers in SOURCE_HANDLERS ordering, so non-git URLs fail cleanly at
the git clone step.
Adds support for fetching marketplace catalogs via git clone using
git+https://, git+ssh://, or git+user@host:path (SCP-style) URLs.
This enables authenticated access to private/self-hosted marketplace
repos by leveraging existing git credentials (SSH keys, credential
helpers, .netrc).

Auto-detects the marketplace YAML file in the repo, or accepts an
explicit path via URL fragment (e.g. git+https://host/repo#path/to/file.yml).
Includes path traversal protection on fragment input.
@momostallion momostallion force-pushed the fix-gitlab-self-hosted-sources branch from d8e38bd to e90e10d Compare May 23, 2026 14:14
@mrbrandao mrbrandao added enhancement New feature or request good first issue Good for newcomers labels May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request good first issue Good for newcomers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] - Add marketplace with link to Git repo instead of raw file

3 participants