-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Problem
During dereferencing, the RefSet holds every external file's full parsed ApiDOM tree
(ParseResultElement) in memory for the entire session. For large APIs with many external files
(e.g., DigitalOcean API with 1000+ files), this can consume 500 MB - 5 GB+ of memory.
Root cause
Reference.valueholds the entire parsed ApiDOM tree, not a fragment- With
immutable: true(default), each file is stored twice: mutable clone
(cloneDeep(parseResult)) + immutable original - No eviction policy — once parsed, a document stays in RefSet until
refSet.clean()at the very
end - No streaming — entire file must be fully parsed before any JSON Pointer fragment is extracted
- Recursive
$refresolution can pull in hundreds/thousands of transitive dependencies
Data flow
$ref encountered → check RefSet → miss → fetch file → parse → cloneDeep(parseResult) → add to
RefSet
→ hit → return cached Reference
After dereferencing completes, refSet.clean() releases everything at once.
Key files
packages/apidom-reference/src/ReferenceSet.ts— RefSet container (no size limit, no eviction)packages/apidom-reference/src/Reference.ts— holds fullParseResultElementinvaluepackages/apidom-reference/src/dereference/strategies/openapi-3-1/visitor.ts—toReference()
method (parse + clone + cache)packages/apidom-reference/src/dereference/strategies/openapi-3-1/index.ts— cleanup at end
Existing infrastructure we can leverage
HTTP response cache (HTTPResolverAxios)
The HTTP resolver already has an in-memory cache for raw Buffer responses. If a parsed document
is evicted from RefSet and needed again:
- Re-fetch cost: zero (HTTP cache hit)
- Re-parse cost: CPU only (no network I/O)
- Memory trade-off:
Buffer(raw bytes, small) stays cached,ParseResultElement(10-100x
larger) gets evicted
This makes eviction-based strategies practical — re-parsing from cached buffers is cheap.
consume: true refractor option
Parsing referenced files internally can use consume: true to reduce peak memory during each
file's refraction. Already available.
Remediation plan
Phase 1: Quick wins (low risk, high impact)
1.1 Use consume: true when parsing referenced files
In toReference(), pass consume: true to the parse/refract pipeline for external files. Each
file's refraction uses less peak memory.
1.2 Skip cloneDeep in mutable mode
When immutable: false, the mutable reference currently still deep-clones. Could take the parse
result directly.
1.3 Add maxRefSetSize option
Allow users to cap how many parsed documents RefSet holds. When exceeded, evict
least-recently-used entries. Re-parse from HTTP cache or file system if needed again.
Default: unlimited (backward compatible)
Recommended for large APIs: maxRefSetSize: 50 or similar
Phase 2: LRU eviction on RefSet (medium risk, high impact)
2.1 LRU cache for RefSet
Replace the simple array in ReferenceSet with an LRU cache:
- Track access order
- When capacity exceeded, evict least-recently-used
Reference - Store eviction metadata (URI + depth) to know how to re-parse if needed
2.2 Lazy re-parse on cache miss
When an evicted reference is needed again:
- Check HTTP cache for raw buffer → re-parse
- If not in HTTP cache, re-fetch from network/filesystem → parse
- Re-add to RefSet (may evict another entry)
Cost model:
- HTTP cache hit: ~10-100ms (parse only)
- HTTP cache miss: ~100-1000ms (fetch + parse)
- Memory saved per eviction: ~1-50 MB per document
Phase 3: Fragment-only retention (medium risk, highest impact)
3.1 After extracting JSON Pointer fragment, release full document
The dereference visitor uses JSON Pointer to extract a specific fragment from the parsed
document. After extraction:
- Keep only the fragment in memory
- Release the full
ParseResultElement - If another
$refpoints to a different fragment in the same file, re-parse from cache
This is the most aggressive optimization — reduces per-file memory from "entire document" to
"just the referenced fragment."
3.2 Fragment cache per URI
Cache at the fragment level: Map<string, Map<string, Element>> where outer key is URI, inner
key is JSON Pointer.
Phase 4: Immutable mode optimization (low risk)
4.1 Copy-on-write instead of upfront cloneDeep
Currently immutable: true deep-clones every referenced document upfront. Instead:
- Store only the immutable original
- Clone lazily when the mutable version is actually modified
- Many referenced documents are never modified — saves the clone entirely
4.2 Structural sharing
For documents that are mostly read-only with small modifications, use structural sharing (clone
only the modified path, share the rest).
Estimated impact
| Phase | Memory reduction | Complexity | Risk |
|---|---|---|---|
| Phase 1 (consume + maxRefSetSize) | 20-40% | Low | Low |
| Phase 2 (LRU eviction) | 50-70% | Medium | Medium |
| Phase 3 (fragment-only) | 80-90% | High | Medium |
| Phase 4 (copy-on-write) | 50% of immutable overhead | Medium | Low |
For a 1000-file API (estimated):
| Configuration | Memory |
|---|---|
| Current (no limit, immutable) | 2-5 GB |
| + Phase 1 (consume + cap at 50) | 500 MB - 1 GB |
| + Phase 2 (LRU, 50 entries) | 200-500 MB |
| + Phase 3 (fragments only) | 50-200 MB |