feat: Artifactory Archive Entry Download Optimization for subdirectory/file packages

## Problem

When APM installs a virtual subdirectory package (e.g., `github/awesome-copilot/skills/review-and-refactor`) via Artifactory, it currently downloads the **entire repository archive** (e.g., 5.9MB for `awesome-copilot`), extracts it to a temp directory, then copies only the target subdirectory. This is wasteful for large repos where the needed subdirectory is a tiny fraction of the total.

## Proposed Solution

JFrog Artifactory supports **Archive Entry Download** — fetching individual files from inside a zip archive without downloading the whole archive.

**API Reference:** https://docs.jfrog.com/artifactory/reference/archiveEntryDownload

### URL Pattern

```
GET https://<host>/artifactory/<repo-key>/<path/to/archive>.zip!/<path/inside/archive>
```

### Examples

**GitHub archive via Artifactory:**
```
https://<artifactory-host>/artifactory/<repo-key>/github/awesome-copilot/archive/refs/heads/main.zip!/awesome-copilot-main/skills/review-and-refactor/SKILL.md
```

**GitLab archive via Artifactory:**
```
https://<artifactory-host>/artifactory/<repo-key>/<owner>/<repo>/-/archive/main/<repo>-main.zip!/<repo>-main/.apm/agents/design-reviewer.agent.md
```

### Archive Root Prefix Convention

Both GitHub and GitLab archives contain a root directory prefix: `{repo}-{ref}/`

| Source | Archive URL | Root prefix |
|--------|------------|-------------|
| GitHub | `.../github/awesome-copilot/archive/refs/heads/main.zip` | `awesome-copilot-main/` |
| GitLab | `.../<owner>/<repo>/-/archive/main/<repo>-main.zip` | `<repo>-main/` |

The entry path must include this root prefix:
```
{archive_url}!/{repo}-{ref}/{path_inside_repo}
```

## Implementation Approach

### Where to Change

**File:** `src/apm_cli/deps/github_downloader.py`

**Method:** `_download_subdirectory_from_artifactory()` (line ~1658)

### Current Flow (full archive download)

```
1. Download full archive zip (potentially many MB)
2. Extract to temp directory
3. Find subdirectory inside extracted files
4. Copy subdirectory to target path
5. Clean up temp directory
```

### Proposed Flow (entry-level download)

```
1. Construct archive URL (already done by build_artifactory_archive_url())
2. Infer root prefix from convention: "{repo}-{ref}/"
3. For each file in subdirectory:
   GET {archive_url}!/{root_prefix}/{subdir}/{file}
4. Write files directly to target path
```

### Root Prefix Discovery

**Option A — Infer from convention (preferred):**
The root prefix is always `{repo}-{ref}/`. Both GitHub and GitLab follow this pattern. This avoids any extra HTTP calls.

```python
root_prefix = f"{repo}-{ref}"
entry_url = f"{archive_url}!/{root_prefix}/{subdir_path}/{filename}"
```

**Option B — Discovery via partial download:**
Download first few bytes of the zip to read the central directory. More robust but adds latency.

### File Listing Challenge

The archive entry API downloads individual files — it doesn't list directory contents. Options:

1. **Fetch the full archive file list** via Artifactory's File List API:
   ```
   GET /api/storage/{repo-key}/{path}?list&deep=1
   ```
2. **Fetch a manifest file first** (e.g., `apm.yml` or `SKILL.md`) to validate, then fall back to full archive for extraction.
3. **Hybrid approach:** Use archive entry download for known files, fall back to full archive only if needed.
4. **Accept full archive for subdirectory packages** but use entry download for **virtual file packages** (single `.prompt.md`, `.agent.md` files) — simplest and most common case.

## Recommended Phased Approach

### Phase 1: Virtual File Packages (Simplest)

For `_download_file_from_artifactory()` — currently downloads full archive to extract one file. Replace with single entry download:

```python
def _download_file_from_artifactory(self, host, prefix, owner, repo, file_path, ref, scheme="https"):
    archive_urls = build_artifactory_archive_url(host, prefix, owner, repo, ref, scheme=scheme)
    root_prefix = f"{repo}-{ref}"
    headers = self._get_artifactory_headers()

    for archive_url in archive_urls:
        entry_url = f"{archive_url}!/{root_prefix}/{file_path}"
        try:
            resp = self._resilient_get(entry_url, headers=headers)
            if resp.status_code == 200:
                return resp.content
        except requests.RequestException:
            continue

    # Fall back to full archive download
    return self._download_file_from_artifactory_full(...)
```

**Savings:** For a single `.prompt.md` file (~1KB), avoids downloading a multi-MB archive.

### Phase 2: Subdirectory Packages (More Complex)

1. Fetch the package manifest via entry download to validate (`apm.yml`, `SKILL.md`)
2. If subdirectory has few files, fetch each via entry download
3. For large subdirectories, fall back to full archive download

**Heuristic:** If the manifest lists fewer than N primitives (e.g., 20 files), use entry-level download. Otherwise full archive is more efficient.

### Phase 3: Smart Caching

Cache archive metadata (file list, root prefix) so subsequent installs of different subdirectories from the same repo don't re-discover.

## Performance Impact

| Scenario | Current | Optimized |
|----------|---------|-----------|
| Single virtual file (`.prompt.md`) from 6MB repo | 6MB download + unzip | ~1KB download |
| Skill subdirectory (5 files) from 6MB repo | 6MB download + unzip | ~5 small downloads (~50KB total) |
| Large subdirectory (100+ files) | 6MB download + unzip | Full archive (same as current) |

## Edge Cases

| Case | Behavior |
|------|----------|
| Root prefix doesn't follow `{repo}-{ref}` convention | Fall back to full archive download |
| Entry download returns 404 (file not in archive) | Fall back to full archive download |
| Artifactory instance doesn't support archive entry API | Graceful degradation to full archive |
| Archive is a tag (not branch) | Root prefix uses tag name: `{repo}-{tag}/` |

## Testing

1. **Unit tests:** Mock Artifactory responses for entry download URL pattern
2. **Integration tests:** Verify against real Artifactory instance with both GitHub and GitLab remote repos
3. **Fallback tests:** Simulate entry download failure → verify full archive fallback works
4. **Root prefix tests:** Verify prefix construction for branches, tags, and commit SHAs

## Dependencies

- Requires Artifactory server to support archive entry download (standard feature, not an add-on)
- No client-side library changes needed — uses standard HTTP GET
- Backward compatible — falls back to full archive download on any failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Artifactory Archive Entry Download Optimization for subdirectory/file packages #417

Problem

Proposed Solution

URL Pattern

Examples

Archive Root Prefix Convention

Implementation Approach

Where to Change

Current Flow (full archive download)

Proposed Flow (entry-level download)

Root Prefix Discovery

File Listing Challenge

Recommended Phased Approach

Phase 1: Virtual File Packages (Simplest)

Phase 2: Subdirectory Packages (More Complex)

Phase 3: Smart Caching

Performance Impact

Edge Cases

Testing

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Source	Archive URL	Root prefix
GitHub	`.../github/awesome-copilot/archive/refs/heads/main.zip`	`awesome-copilot-main/`
GitLab	`.../<owner>/<repo>/-/archive/main/<repo>-main.zip`	`<repo>-main/`

Scenario	Current	Optimized
Single virtual file (`.prompt.md`) from 6MB repo	6MB download + unzip	~1KB download
Skill subdirectory (5 files) from 6MB repo	6MB download + unzip	~5 small downloads (~50KB total)
Large subdirectory (100+ files)	6MB download + unzip	Full archive (same as current)

Case	Behavior
Root prefix doesn't follow `{repo}-{ref}` convention	Fall back to full archive download
Entry download returns 404 (file not in archive)	Fall back to full archive download
Artifactory instance doesn't support archive entry API	Graceful degradation to full archive
Archive is a tag (not branch)	Root prefix uses tag name: `{repo}-{tag}/`

feat: Artifactory Archive Entry Download Optimization for subdirectory/file packages #417

Description

Problem

Proposed Solution

URL Pattern

Examples

Archive Root Prefix Convention

Implementation Approach

Where to Change

Current Flow (full archive download)

Proposed Flow (entry-level download)

Root Prefix Discovery

File Listing Challenge

Recommended Phased Approach

Phase 1: Virtual File Packages (Simplest)

Phase 2: Subdirectory Packages (More Complex)

Phase 3: Smart Caching

Performance Impact

Edge Cases

Testing

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions