Problem
When APM installs a virtual subdirectory package (e.g., github/awesome-copilot/skills/review-and-refactor) via Artifactory, it currently downloads the entire repository archive (e.g., 5.9MB for awesome-copilot), extracts it to a temp directory, then copies only the target subdirectory. This is wasteful for large repos where the needed subdirectory is a tiny fraction of the total.
Proposed Solution
JFrog Artifactory supports Archive Entry Download — fetching individual files from inside a zip archive without downloading the whole archive.
API Reference: https://docs.jfrog.com/artifactory/reference/archiveEntryDownload
URL Pattern
GET https://<host>/artifactory/<repo-key>/<path/to/archive>.zip!/<path/inside/archive>
Examples
GitHub archive via Artifactory:
https://<artifactory-host>/artifactory/<repo-key>/github/awesome-copilot/archive/refs/heads/main.zip!/awesome-copilot-main/skills/review-and-refactor/SKILL.md
GitLab archive via Artifactory:
https://<artifactory-host>/artifactory/<repo-key>/<owner>/<repo>/-/archive/main/<repo>-main.zip!/<repo>-main/.apm/agents/design-reviewer.agent.md
Archive Root Prefix Convention
Both GitHub and GitLab archives contain a root directory prefix: {repo}-{ref}/
| Source |
Archive URL |
Root prefix |
| GitHub |
.../github/awesome-copilot/archive/refs/heads/main.zip |
awesome-copilot-main/ |
| GitLab |
.../<owner>/<repo>/-/archive/main/<repo>-main.zip |
<repo>-main/ |
The entry path must include this root prefix:
{archive_url}!/{repo}-{ref}/{path_inside_repo}
Implementation Approach
Where to Change
File: src/apm_cli/deps/github_downloader.py
Method: _download_subdirectory_from_artifactory() (line ~1658)
Current Flow (full archive download)
1. Download full archive zip (potentially many MB)
2. Extract to temp directory
3. Find subdirectory inside extracted files
4. Copy subdirectory to target path
5. Clean up temp directory
Proposed Flow (entry-level download)
1. Construct archive URL (already done by build_artifactory_archive_url())
2. Infer root prefix from convention: "{repo}-{ref}/"
3. For each file in subdirectory:
GET {archive_url}!/{root_prefix}/{subdir}/{file}
4. Write files directly to target path
Root Prefix Discovery
Option A — Infer from convention (preferred):
The root prefix is always {repo}-{ref}/. Both GitHub and GitLab follow this pattern. This avoids any extra HTTP calls.
root_prefix = f"{repo}-{ref}"
entry_url = f"{archive_url}!/{root_prefix}/{subdir_path}/{filename}"
Option B — Discovery via partial download:
Download first few bytes of the zip to read the central directory. More robust but adds latency.
File Listing Challenge
The archive entry API downloads individual files — it doesn't list directory contents. Options:
- Fetch the full archive file list via Artifactory's File List API:
GET /api/storage/{repo-key}/{path}?list&deep=1
- Fetch a manifest file first (e.g.,
apm.yml or SKILL.md) to validate, then fall back to full archive for extraction.
- Hybrid approach: Use archive entry download for known files, fall back to full archive only if needed.
- Accept full archive for subdirectory packages but use entry download for virtual file packages (single
.prompt.md, .agent.md files) — simplest and most common case.
Recommended Phased Approach
Phase 1: Virtual File Packages (Simplest)
For _download_file_from_artifactory() — currently downloads full archive to extract one file. Replace with single entry download:
def _download_file_from_artifactory(self, host, prefix, owner, repo, file_path, ref, scheme="https"):
archive_urls = build_artifactory_archive_url(host, prefix, owner, repo, ref, scheme=scheme)
root_prefix = f"{repo}-{ref}"
headers = self._get_artifactory_headers()
for archive_url in archive_urls:
entry_url = f"{archive_url}!/{root_prefix}/{file_path}"
try:
resp = self._resilient_get(entry_url, headers=headers)
if resp.status_code == 200:
return resp.content
except requests.RequestException:
continue
# Fall back to full archive download
return self._download_file_from_artifactory_full(...)
Savings: For a single .prompt.md file (~1KB), avoids downloading a multi-MB archive.
Phase 2: Subdirectory Packages (More Complex)
- Fetch the package manifest via entry download to validate (
apm.yml, SKILL.md)
- If subdirectory has few files, fetch each via entry download
- For large subdirectories, fall back to full archive download
Heuristic: If the manifest lists fewer than N primitives (e.g., 20 files), use entry-level download. Otherwise full archive is more efficient.
Phase 3: Smart Caching
Cache archive metadata (file list, root prefix) so subsequent installs of different subdirectories from the same repo don't re-discover.
Performance Impact
| Scenario |
Current |
Optimized |
Single virtual file (.prompt.md) from 6MB repo |
6MB download + unzip |
~1KB download |
| Skill subdirectory (5 files) from 6MB repo |
6MB download + unzip |
~5 small downloads (~50KB total) |
| Large subdirectory (100+ files) |
6MB download + unzip |
Full archive (same as current) |
Edge Cases
| Case |
Behavior |
Root prefix doesn't follow {repo}-{ref} convention |
Fall back to full archive download |
| Entry download returns 404 (file not in archive) |
Fall back to full archive download |
| Artifactory instance doesn't support archive entry API |
Graceful degradation to full archive |
| Archive is a tag (not branch) |
Root prefix uses tag name: {repo}-{tag}/ |
Testing
- Unit tests: Mock Artifactory responses for entry download URL pattern
- Integration tests: Verify against real Artifactory instance with both GitHub and GitLab remote repos
- Fallback tests: Simulate entry download failure → verify full archive fallback works
- Root prefix tests: Verify prefix construction for branches, tags, and commit SHAs
Dependencies
- Requires Artifactory server to support archive entry download (standard feature, not an add-on)
- No client-side library changes needed — uses standard HTTP GET
- Backward compatible — falls back to full archive download on any failure
Problem
When APM installs a virtual subdirectory package (e.g.,
github/awesome-copilot/skills/review-and-refactor) via Artifactory, it currently downloads the entire repository archive (e.g., 5.9MB forawesome-copilot), extracts it to a temp directory, then copies only the target subdirectory. This is wasteful for large repos where the needed subdirectory is a tiny fraction of the total.Proposed Solution
JFrog Artifactory supports Archive Entry Download — fetching individual files from inside a zip archive without downloading the whole archive.
API Reference: https://docs.jfrog.com/artifactory/reference/archiveEntryDownload
URL Pattern
Examples
GitHub archive via Artifactory:
GitLab archive via Artifactory:
Archive Root Prefix Convention
Both GitHub and GitLab archives contain a root directory prefix:
{repo}-{ref}/.../github/awesome-copilot/archive/refs/heads/main.zipawesome-copilot-main/.../<owner>/<repo>/-/archive/main/<repo>-main.zip<repo>-main/The entry path must include this root prefix:
Implementation Approach
Where to Change
File:
src/apm_cli/deps/github_downloader.pyMethod:
_download_subdirectory_from_artifactory()(line ~1658)Current Flow (full archive download)
Proposed Flow (entry-level download)
Root Prefix Discovery
Option A — Infer from convention (preferred):
The root prefix is always
{repo}-{ref}/. Both GitHub and GitLab follow this pattern. This avoids any extra HTTP calls.Option B — Discovery via partial download:
Download first few bytes of the zip to read the central directory. More robust but adds latency.
File Listing Challenge
The archive entry API downloads individual files — it doesn't list directory contents. Options:
apm.ymlorSKILL.md) to validate, then fall back to full archive for extraction..prompt.md,.agent.mdfiles) — simplest and most common case.Recommended Phased Approach
Phase 1: Virtual File Packages (Simplest)
For
_download_file_from_artifactory()— currently downloads full archive to extract one file. Replace with single entry download:Savings: For a single
.prompt.mdfile (~1KB), avoids downloading a multi-MB archive.Phase 2: Subdirectory Packages (More Complex)
apm.yml,SKILL.md)Heuristic: If the manifest lists fewer than N primitives (e.g., 20 files), use entry-level download. Otherwise full archive is more efficient.
Phase 3: Smart Caching
Cache archive metadata (file list, root prefix) so subsequent installs of different subdirectories from the same repo don't re-discover.
Performance Impact
.prompt.md) from 6MB repoEdge Cases
{repo}-{ref}convention{repo}-{tag}/Testing
Dependencies