Problem
Glimmer v0.3 is a post-hoc documentation ledger: scripts run, then emit nodes (if glimmer:). Nodes record what + content-hash (derivative.output-path/output-hash/provenance-mode) but delegate where/how-many-copies to datalad/git-annex — so storage state is invisible in the research object. A multi-day compute-expensive SST once sat only in /tmp, discoverable only by asking. An output you can't locate / that has one copy is not reproducible, so durability belongs inside the RO model.
Proposed (v0.3.1, additive)
- Typed storage fields on derivative + dataset nodes: stored-at [{remote,uuid}], copies, numcopies-required, verified, storage-class (irreplaceable|generated).
- Archive-on-emit: GlimmerGraph.derivative()/.dataset() annex-add + copy --to durable remotes + record whereis. Creating the node = durably storing it. Refuse /tmp/outside-annex paths.
- Enforcement via config+hooks (not scripts): annex.numcopies in .gitattributes (3 irreplaceable / 2 generated); pre-push hook +
glimmer validate CI; teardown gates.
- First-class reporting: generic
glimmer report over the graph + live git annex whereis — storage never hidden.
Commit-tracking bares
Durable copy := a --bare git+annex tracking full commit history. Object-only stores (sftp/rsync boxes that can't run git) are adjuncts, not commit-tracking copies. Tiers: GitHub (git+pointers) + cheap object store (adjunct) + >=2 commit-tracking bares.
Multi-tenant self-provisioning
Glimmer should let users provision their own tenants + bares at dataset init. ads-glimmer is the first consumer and will migrate onto this once it lands.
Reference
Full design: ads-glimmer/docs/data/INFORMATION-ARCHITECTURE.md
Problem
Glimmer v0.3 is a post-hoc documentation ledger: scripts run, then emit nodes (
if glimmer:). Nodes record what + content-hash (derivative.output-path/output-hash/provenance-mode) but delegate where/how-many-copies to datalad/git-annex — so storage state is invisible in the research object. A multi-day compute-expensive SST once sat only in /tmp, discoverable only by asking. An output you can't locate / that has one copy is not reproducible, so durability belongs inside the RO model.Proposed (v0.3.1, additive)
glimmer validateCI; teardown gates.glimmer reportover the graph + livegit annex whereis— storage never hidden.Commit-tracking bares
Durable copy := a --bare git+annex tracking full commit history. Object-only stores (sftp/rsync boxes that can't run git) are adjuncts, not commit-tracking copies. Tiers: GitHub (git+pointers) + cheap object store (adjunct) + >=2 commit-tracking bares.
Multi-tenant self-provisioning
Glimmer should let users provision their own tenants + bares at dataset init. ads-glimmer is the first consumer and will migrate onto this once it lands.
Reference
Full design: ads-glimmer/docs/data/INFORMATION-ARCHITECTURE.md