Feat - Support self-hosted git sources#146
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughDetect git-style URLs (git+, SCP, or .git-suffixed HTTP(S)), shallow/sparse-clone repositories, choose a marketplace YAML via fragment or deterministic discovery (with traversal guard), parse it into a Marketplace, broaden HTTP(S) git routing, and add tests for success, failure, and security cases. ChangesGit Repository URL Support for Marketplace Loading
Sequence Diagram(s)sequenceDiagram
participant Caller
participant FromURL as Marketplace.from_url
participant FromGit as Marketplace._from_git_url
participant Git as subprocess.run_git
participant Picker as Marketplace._pick_marketplace_yaml
participant FS as Filesystem/YAML_Loader
Caller->>FromURL: provide git+https:// / git+ssh:// / git@host:org/repo.git / https://... .git (`#fragment` optional)
FromURL->>FromGit: delegate git-backed URL
FromGit->>Git: git clone --depth 1 + sparse-checkout -> tempdir
Git-->>FromGit: clone result (success/fail)
alt fragment present
FromGit->>FromGit: validate fragment (no ../) and resolve path
FromGit->>FS: read YAML at fragment path
else no fragment
FromGit->>Picker: choose YAML (name.yml -> marketplace.yml -> single root YAML)
Picker-->>FromGit: YAML path
FromGit->>FS: read YAML at discovered path
end
FS-->>FromGit: parsed marketplace data
FromGit->>FromGit: remove .git directory
FromGit-->>Caller: Marketplace instance (marketplace.url retains original git+ prefix)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lola/models.py (1)
467-495:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSCP-style git URLs still get rejected here.
The new branch only catches
git+...inputs. A marketplace URL likegit@gitlab.internal:org/marketplace.gitstill falls through tourlparse(), which makesgit@gitlab.internallook like the scheme, so Line 492 raises instead of cloning. That leaves one of the advertised URL forms unsupported.Suggested direction
- if url.startswith("git+"): + if url.startswith("git+") or _is_scp_style_git_url(url): return cls._from_git_url(url, name)- git_url = url[4:] + git_url = url[4:] if url.startswith("git+") else urlYou'd need a small helper to recognize
git@host:reposyntax and a matching test.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lola/models.py` around lines 467 - 495, The parser currently only handles URLs starting with "git+" and treats scp-style git URLs (e.g. git@host:org/repo.git) as an unknown scheme; add a small helper (e.g. is_scp_git_url(url)) and call it alongside the existing git+ check so scp-style URLs are routed to cls._from_git_url(url, name) (or normalized to a git+ssh:// form before calling _from_git_url). Update the branch in the code that checks url.startswith("git+") to also detect scp-style syntax, and add a unit test covering a git@host:org/repo.git marketplace URL to ensure cloning is attempted rather than raising the "Marketplace URL must use ..." ValueError.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lola/models.py`:
- Around line 532-570: After cloning into repo_dir (the subprocess.run/clone_cmd
block) ensure .git is removed on all failure paths by moving the post-clone
processing (resolving yaml_file, calling cls._find_marketplace_yaml,
opening/parsing YAML, etc.) into a try/finally that always removes git_dir =
repo_dir / ".git" in the finally block (using shutil.rmtree(ignore_errors=True))
and re-raises any exception; this guarantees .git cleanup even if file_fragment
checks, yaml parsing, or file-not-found errors occur.
---
Outside diff comments:
In `@src/lola/models.py`:
- Around line 467-495: The parser currently only handles URLs starting with
"git+" and treats scp-style git URLs (e.g. git@host:org/repo.git) as an unknown
scheme; add a small helper (e.g. is_scp_git_url(url)) and call it alongside the
existing git+ check so scp-style URLs are routed to cls._from_git_url(url, name)
(or normalized to a git+ssh:// form before calling _from_git_url). Update the
branch in the code that checks url.startswith("git+") to also detect scp-style
syntax, and add a unit test covering a git@host:org/repo.git marketplace URL to
ensure cloning is attempted rather than raising the "Marketplace URL must use
..." ValueError.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fc82af7b-7c70-42dc-b319-fc46be468e0a
📒 Files selected for processing (4)
src/lola/models.pysrc/lola/parsers.pytests/test_marketplace_model.pytests/test_sources.py
f82abd0 to
c471c22
Compare
There was a problem hiding this comment.
Is it also possible for us to auto-detect non git+ prefixed urls like these?
git@github.com:LobsterTrap/lola.githttps://github.com/LobsterTrap/lola.git
This would fit with the lola mod add https://github.com/LobsterTrap/lola.git syntax. Maybe we can reuse the detector from that command to also detect this kind of url.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/lola/parsers.py (1)
119-125: 💤 Low valueBroadened detection turns any HTTP(S) URL into a git source — consider the error UX.
With this change,
GitSourceHandler.can_handlereturnsTruefor any HTTP(S) URL with a host that isn't a.zip/.tar*archive (sinceZipUrlSourceHandler/TarUrlSourceHandlerrun first). That includes things likehttps://example.com/somefile,https://example.com/raw/market.yml, raw text files, HTML pages, etc., which will now be sent togit cloneand fail with a genericGit clone failed: ...RuntimeErrorinstead ofUnsupportedSourceError.That's likely intentional for self-hosted git support (you can't enumerate hosts), but two things worth confirming:
- The previous
test_cannot_handle_arbitrary_httpcase (https://example.com/somefile) was removed rather than inverted — was that deliberate? It's the most visible behavior change of this PR for non-git HTTP URLs.- The fallthrough error from a failed
git cloneon a non-git URL can be noisy (auth prompts, stderr from git). Consider whetherfetchshould detect "this clearly isn't a git repo" and re-raise asUnsupportedSourceError/SourceErrorwith a clearer message, or whether the current behavior is acceptable since marketplacegit+https:///git+ssh://schemes already give users an explicit opt-in path.No change required if this is the desired UX; flagging so it's a conscious decision.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lola/parsers.py` around lines 119 - 125, GitSourceHandler.can_handle was broadened to accept any http(s) URL with a host, causing non-git HTTP resources to be handled by git and produce noisy errors; either revert the test change (restore test_cannot_handle_arbitrary_http) or add a clear gate in fetch/git handling: before calling git clone (in the code path that uses GitSourceHandler and the fetch/git helper), run a lightweight git probe (e.g., git ls-remote or equivalent remote-reachability check) and if it clearly isn’t a git repo, raise UnsupportedSourceError/SourceError with a concise message; update GitSourceHandler.can_handle or the fetch logic accordingly to keep archive handlers unchanged (ZipUrlSourceHandler, TarUrlSourceHandler) and adjust tests to assert the new behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lola/models.py`:
- Around line 562-566: The git subprocess calls (clone_cmd, sparse_checkout_cmd,
and ls_tree_cmd) must be hardened to avoid hanging: add a reasonable timeout
argument (e.g., timeout=30 or configurable) to each subprocess.run call, set
stdin to subprocess.DEVNULL to prevent interactive prompts, and pass an
environment that includes GIT_TERMINAL_PROMPT=0 (preserving existing env) so git
will fail instead of prompting for credentials; keep capture_output=True and
text=True and optionally add check=True if you want exceptions on non-zero exit.
Update the subprocess.run invocations that execute clone_cmd,
sparse_checkout_cmd, and ls_tree_cmd accordingly.
---
Nitpick comments:
In `@src/lola/parsers.py`:
- Around line 119-125: GitSourceHandler.can_handle was broadened to accept any
http(s) URL with a host, causing non-git HTTP resources to be handled by git and
produce noisy errors; either revert the test change (restore
test_cannot_handle_arbitrary_http) or add a clear gate in fetch/git handling:
before calling git clone (in the code path that uses GitSourceHandler and the
fetch/git helper), run a lightweight git probe (e.g., git ls-remote or
equivalent remote-reachability check) and if it clearly isn’t a git repo, raise
UnsupportedSourceError/SourceError with a concise message; update
GitSourceHandler.can_handle or the fetch logic accordingly to keep archive
handlers unchanged (ZipUrlSourceHandler, TarUrlSourceHandler) and adjust tests
to assert the new behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0da64b70-6431-455d-91cb-5cca29cd4061
📒 Files selected for processing (4)
src/lola/models.pysrc/lola/parsers.pytests/test_marketplace_model.pytests/test_sources.py
Previously, GitSourceHandler.can_handle() only matched github.com, gitlab.com, and bitbucket.org. This rejected self-hosted GitLab/GitHub/ Gitea instances. Now accepts any HTTP(S) URL with a valid host as a potential git source. Archive URLs (.zip, .tar) are still handled first by their dedicated handlers in SOURCE_HANDLERS ordering, so non-git URLs fail cleanly at the git clone step.
Adds support for fetching marketplace catalogs via git clone using git+https://, git+ssh://, or git+user@host:path (SCP-style) URLs. This enables authenticated access to private/self-hosted marketplace repos by leveraging existing git credentials (SSH keys, credential helpers, .netrc). Auto-detects the marketplace YAML file in the repo, or accepts an explicit path via URL fragment (e.g. git+https://host/repo#path/to/file.yml). Includes path traversal protection on fragment input.
d8e38bd to
e90e10d
Compare
Summary
git@host:org/repo.git) for marketplace sources — nogit+prefix requiredRelated Issues
Closes #125
Test Plan
Checklist
pytest)ruff check src tests)ty check)AI Disclosure
AI-assisted with GitHub Copilot (VS Code)
Summary by CodeRabbit
New Features
Behavior
Tests