This document tracks substantial topics identified for future discussion and potential implementation. Topics are organized by implementation timeline to provide clear guidance for development planning.
These topics must be completed for v1 release. They represent essential functionality gaps that prevent production readiness.
Priority: Critical for v1 Release
Current Gap: Framework focuses on basic property-level CRDTs but requires additional algorithms and RDF structure support for production readiness.
CRDT Algorithms to Evaluate for v1:
- Counter algorithms (G-Counter, PN-Counter): For numeric aggregation and collaborative counting use cases
- Sequence algorithms (RGA, Fractional Indexing): For ordered collections and collaborative text editing
- Advanced set and map variants (LWW-Map, OR-Map): For specialized dictionary use cases
- Multi-Value Registers (MV-Register): For preserving concurrent writes
- Trees: Hierarchical data structures (taxonomies, organizational charts)
RDF Structures to Address Before v1:
- rdf:List: Position-based vs content-based conflict resolution for ordered lists
- rdf:Seq/rdf:Bag/rdf:Alt: Merge semantics for RDF container types
- Complex Blank Node Graphs: Interdependent blank nodes (building on solved context-based identification)
- Property Paths: Multi-hop relationships and their CRDT implications
- Reification Chains: Nested reified statements and metadata
Architecture Assessment: Current infrastructure (Hybrid Logical Clocks, blank node identification via context, merge contracts) should accommodate most extensions. Sequence algorithms may require positional metadata extensions.
Implementation Strategy: These should integrate within existing framework without fundamental architectural changes.
Priority: Critical for v1 Release Core Issue: What happens when different versions of the framework try to collaborate in the same Pod?
Key Decisions Needed:
- Installation Version Declaration: How installations declare their framework version for compatibility checking
- Version Mismatch Handling: Graceful fallbacks when incompatible versions encounter each other
- Migration Path: Basic strategy for evolving formats (RDF reification, index structures, merge contracts) without breaking existing collaborations
v1 Scope: Define minimum viable compatibility strategy - not full migration automation, but clear policies for version conflicts and a path forward for format evolution.
Implementation Strategy: Build on existing installation document infrastructure to add version metadata and basic compatibility checks.
Specific Stability Case: What happens when an application depends on a merge contract document hosted on the internet, and that document gets corrupted or updated incorrectly?
Current Behavior (as of v1):
- Merge contracts are loaded from IRIs referenced in
sync:isGovernedBy - Validation errors (via
ValidationResultclass) are thrown during merge contract loading - Applications fail to start if referenced merge contracts are invalid
- Examples of validation errors that cause failure:
- Class mapping missing
mc:appliesToClass - Predicate rule missing
mc:predicate - Unknown CRDT algorithm types
- Cyclic imports in merge contract documents
- Class mapping missing
Problem Scenarios:
- Corrupted Update: A previously valid merge contract at
https://w3id.org/solid-crdt-sync/mappings/recipe-v1gets corrupted (e.g.,mc:appliesToClassaccidentally removed) - Breaking Change: Contract author pushes breaking changes without version bump
- Network Issues: Intermittent failures fetching remote contracts
- Availability Problems: External contract server goes down
Impact:
- Applications that previously worked suddenly break
- Users cannot sync or access their data
- No clear recovery path for end users
- Developers cannot fix the problem (external contract)
Potential Solutions to Explore:
Option 1: Contract Caching and Fallback
- Cache validated contracts locally
- On validation failure, use cached version with warning
- Allow applications to continue operating with last-known-good contract
- Trade-off: May miss intentional updates
Option 2: Contract Versioning Requirements
- Enforce versioned contract IRIs (e.g.,
mappings/recipe-v1.0.0instead ofmappings/recipe-v1) - Immutable contracts - breaking changes require new version
- Applications explicitly upgrade to new versions
- Trade-off: More complex version management
Option 3: Contract Pinning
- Applications can pin to specific contract hashes/snapshots
- Optional "latest" mode for development/testing
- Trade-off: Applications may miss bug fixes
Option 4: Graceful Degradation
- Parse contracts in "best-effort" mode
- Skip invalid class mappings but continue with valid ones
- Log validation warnings without blocking application startup
- Trade-off: May allow semantically incorrect merges
Option 5: Local Contract Override
- Applications can bundle backup contracts
- On fetch/validation failure, fall back to bundled version
- Trade-off: Duplication, potential staleness
Decision Deferred: This requires deeper analysis of real-world failure modes and user impact. Key questions:
- How often do remote contracts actually change?
- What's the acceptable failure rate for external dependencies?
- Should framework provide default strategies or let applications decide?
- How to balance safety with flexibility?
Related ValidationResult Class:
The ValidationResult class (in packages/locorda_core/lib/locorda_core.dart) provides structured error reporting for merge contract validation, including:
- Hierarchical error context (which document, which mapping, which rule)
- Detailed error messages with helpful suggestions
- Warnings vs. errors distinction
- Ability to accumulate multiple validation issues
This infrastructure is critical for any stability strategy - it provides the diagnostic information needed to:
- Detect when contracts are invalid
- Communicate problems clearly to developers
- Make informed decisions about fallback strategies
- Log detailed debugging information
Example Validation Errors:
- "Class mapping missing appliesToClass" - indicates malformed mapping that cannot be processed
- "Unknown CRDT type {IRI}" - references non-existent algorithm
- "Cyclic import in merge contract" - would cause infinite recursion
v1 Scope: Document current validation behavior and failure modes. Implement basic caching to improve resilience. Defer comprehensive stability strategy to v2+ when we have production experience with contract evolution patterns.
Specific Compatibility Case: Evolution from MD5 to alternative hash algorithms for identified blank node canonical fragments.
Current Design (see proposed-changes/005-identified-blank-node-canonical-iri.md):
- Fragment format:
#lcrd-ibn-md5-{hash} - Algorithm identifier explicitly included for self-describing documents
- MD5 sufficient for collision resistance in current domain
Future Evolution Scenario: If SHA-256 or other algorithms become necessary (stronger cryptographic properties, regulatory requirements, etc.):
Multi-Algorithm Coexistence Strategy:
-
Single blank node, multiple mappings: Same blank node can have fragment mappings for multiple hash algorithms
<doc> sync:hasBlankNodeMapping <doc#lcrd-ibn-md5-abc123...> , <doc#lcrd-ibn-sha256-def456...> . <doc#lcrd-ibn-md5-abc123...> sync:blankNode _:ingredient1 . <doc#lcrd-ibn-sha256-def456...> sync:blankNode _:ingredient1 .
-
Matching semantics: During merge, if any fragment identifier matches, blank nodes are considered identical
- Old installation recognizes only MD5, matches via
#lcrd-ibn-md5-abc123... - New installation recognizes both, can match via either fragment
- OR-Set semantics for
sync:hasBlankNodeMappingpreserves all mappings
- Old installation recognizes only MD5, matches via
-
Transition phases:
- Phase 1: All installations use MD5 only
- Phase 2: New installations generate both MD5 + SHA-256, old installations continue working (match via MD5)
- Phase 3: When all active installations upgraded, can deprecate MD5 generation
- Phase 4: Eventually remove MD5 support entirely
Benefits:
- ✅ No coordination required between installations during transition
- ✅ Graceful forward/backward compatibility
- ✅ No special migration logic needed
- ✅ Documents remain self-describing throughout evolution
Trade-offs:
⚠️ Storage overhead: duplicate mappings during transition period⚠️ Computation overhead: new installations must compute multiple hashes during compatibility mode
Implementation Decisions Deferred to v2+:
- Specific triggers for algorithm migration (security requirements, performance needs)
- Compatibility mode detection and configuration
- Deprecation timeline for old algorithms
- Multiple algorithm generation policies
Priority: Critical for v1 Release RESOLVED
Status: Successfully implemented W3ID.org permanent identifier service integration.
Implemented Solution: W3ID.org Permanent Identifier Service
- Final IRIs:
w3id.org/solid-crdt-sync/vocab/andw3id.org/solid-crdt-sync/mappings/ - Benefits Realized: Permanent identifiers with no maintenance burden, academic backing, designed specifically for this use case
- External Dependency: Managed through W3ID.org redirect service
Completed Implementation:
- ✅ Vocabulary IRIs: Permanent identifiers established for
crdt:,algo:,sync:,idx:, andmc:vocabularies - ✅ Mapping IRIs: Stable base for merge contract mappings (
core-v1.ttl, etc.) - ✅ Migration Completed: All RDF files, examples, and generated code updated to use W3ID.org IRIs
- ✅ Documentation Updated: Examples and specifications reflect final IRI decisions
Related: All vocabulary files in vocabularies/ directory and mapping files in mappings/ directory now use permanent W3ID.org identifiers.
These topics represent interesting research directions and framework improvements to explore after v1 is completed. Priority and timeline will be determined based on practical needs and research outcomes.
Status: Future Research (v2+) Current Limitation: Framework provides offline-first functionality and user-controlled storage but lacks end-to-end encryption (E2EE), which is essential for true local-first privacy guarantees.
Technical Challenge: Implementing E2EE to achieve true local-first guarantees while maintaining RDF semantic interoperability and CRDT merge capabilities presents several design challenges:
- RDF Query Compatibility: Encrypted RDF cannot be semantically queried or reasoned over
- CRDT Merge Operations: Conflict resolution requires access to plaintext data structures
- Index Generation: Sharded indices require plaintext access for semantic grouping and performance optimization
- Cross-Application Interoperability: Encrypted data cannot be shared between applications without key sharing
Potential Approaches:
- Hybrid Architecture: Store encrypted application data with plaintext metadata for indexing and CRDT operations
- Client-Side Decryption: Decrypt data locally for CRDT operations, re-encrypt for storage
- Homomorphic Operations: Limited CRDT algorithms that support encrypted operations (research area)
- Layered Encryption: Different encryption levels for different data sensitivity levels
Related Work:
- ANUSII approaches to E2EE RDF data
- Academic research on encrypted CRDT operations
- Solid OIDC integration for key management
Architecture Impact: E2EE support would require significant extensions to the current 4-layer architecture, particularly affecting the indexing layer and merge contract semantics.
Status: Future Research (v2+) Current Limitation: Framework focuses exclusively on RDF data synchronization but doesn't address binary files (images, documents, media) that applications often need to store and sync alongside structured data.
Use Case Scenarios:
- Photo Management App: Store image metadata as RDF while managing binary image files
- Document Collaboration: Sync PDF/Word documents with RDF annotations and version metadata
- Media Applications: Manage audio/video files with RDF-based playlists and metadata
Technical Challenges:
- Binary File Versioning: CRDTs work with structured data; binary files need different conflict resolution strategies
- Storage Efficiency: Large binary files require different sync strategies than small RDF documents
- Reference Integrity: Maintaining consistency between RDF references and binary file availability
- Bandwidth Management: Selective sync for large files vs always-sync for RDF metadata
Potential Approaches:
- Content-Addressed Storage: Use hash-based addressing for immutable binary files with RDF metadata
- Layered Sync Strategy: Fast RDF sync with optional binary file sync based on application needs
- External Storage Integration: RDF references to files stored in specialized binary storage services
- Version-Controlled Files: Git-LFS-like approach with RDF-tracked versions and metadata
Architecture Considerations:
- Binary files likely need separate storage patterns from RDF sharding strategies
- Application-level policies for bandwidth and storage management
- Integration with existing file storage APIs (cloud storage, CDNs)
Related Standards:
- Linked Data Platform (LDP) binary resource handling
- IPFS content-addressing approaches
- Solid Protocol non-RDF resource management patterns
Status: Future Research Current Limitation: Framework focuses on single-Pod CRDT synchronization but doesn't address applications that need to integrate data from multiple Pods, including Pods not owned by the user.
Use Case Scenario: Alice's Recipe Manager app wants to display:
- Alice's personal recipes from
https://alice.pod/data/recipes/ - Bob's shared recipes from
https://bob.pod/data/recipes/ - Carol's family recipes from
https://carol.pod/data/family-recipes/ - Community recipes from
https://community-pod.org/recipes/
Technical Integration Challenges:
Discovery and Connection Management:
- How do applications discover relevant Pods containing related data?
- Managing authentication/authorization across multiple independent Pods
- Handling different availability and connectivity states per Pod
- Coordinating sync processes across multiple concurrent Pod connections
Resource Identity and Semantic Relationships:
- IRIs are globally unique (no conflicts), but semantic relationships are complex
- Cross-Pod resource references:
owl:sameAs,schema:isVariantOf, custom relationships - Determining when resources from different Pods represent the same conceptual entity
- Handling conflicting semantic assertions across Pods (Alice says X, Bob says Y about same topic)
Index and Query Coordination:
- Should applications create separate indices per Pod or attempt federation?
- Cross-Pod search and discovery: querying multiple Pod indices efficiently
- Handling different indexing patterns and schema versions across Pods
- Performance implications of distributed query execution
Synchronization Architecture:
- Managing multiple independent sync processes without interference
- Batch operations and consistency across Pod boundaries
- Handling partial failures when some Pods are unavailable
- Cache coordination and invalidation across multiple data sources
User Experience Challenges:
- Presenting unified views of distributed data with clear source attribution
- Handling permission differences across Pods in consistent UI
- Conflict resolution when related data from different Pods disagrees
- Offline/online state management for multiple connection states
Schema Evolution Across Pods:
- Different Pods may use different framework versions or merge contracts
- Handling schema compatibility in federated scenarios
- Migration coordination when not all Pods upgrade simultaneously
- Graceful degradation when encountering incompatible schemas
Application Architecture Patterns:
Federated Query Pattern:
- Applications maintain separate sync state per Pod
- Cross-Pod queries executed as distributed operations
- UI aggregates results with clear source provenance
Local Integration Pattern:
- Applications sync data from multiple Pods into unified local store
- Semantic relationship resolution happens locally
- Trade-offs between storage overhead and query performance
Hybrid Pattern:
- Critical data synced locally, secondary data queried on-demand
- Application-specific policies for what data to integrate vs. reference
Open Design Questions:
- Should the framework provide multi-Pod orchestration primitives?
- How to handle semantic conflicts across Pod boundaries?
- What's the role of the framework vs. application-specific integration logic?
- Should there be standard vocabularies for cross-Pod relationships?
- How to balance performance, consistency, and user experience?
Implementation Scope: This represents a major expansion beyond single-Pod CRDT synchronization into distributed application orchestration, semantic web integration, and multi-source data management. Likely requires significant framework extensions and new architectural patterns.
Related: Builds on all current framework concepts but extends them into distributed, multi-authority scenarios that go beyond the current single-Pod collaborative model.
Status: Future Research
Current Approach: Uses RDF Reification for semantic correctness but with significant overhead.
Alternative Approaches:
- Custom Compact Format: Define framework-specific tombstone representation
Trade-offs to Analyze:
- Semantic correctness vs storage efficiency
- Interoperability vs performance
- Standard RDF tooling compatibility vs custom processing requirements
Related: Current RDF Reification approach in CRDT-SPECIFICATION.md sections 3.2, 3.3
Status: Future Research
Problem: Framework tracks basic causality through Hybrid Logical Clocks but doesn't provide rich provenance information for auditing and compliance needs.
Core Questions:
- Provenance granularity: Should tracking focus on installations, users, processes, or individual operations?
- Storage trade-offs: How much provenance overhead is acceptable for different use cases?
- Privacy considerations: How to balance audit requirements with user privacy?
- Integration approach: Should provenance extend existing causality tracking or operate separately?
Use Cases to Consider:
- Compliance auditing requiring detailed change history
- Debugging collaborative workflows and conflict resolution
- Business process analysis and optimization
- Trust and transparency in multi-party collaboration
Design Considerations:
- Different provenance standards (PROV-O, custom vocabularies) have different capabilities
- Provenance information may need different retention policies than data
- Cross-installation provenance requires coordination and trust mechanisms
- Integration with existing Hybrid Logical Clock system for consistency
Related: Hybrid Logical Clock mechanics in ARCHITECTURE.md Section 5.2.3 and CRDT-SPECIFICATION.md.
Status: Future Research
Problem: Framework requires new data to be CRDT-managed from creation, but many users have existing Solid data.
Core Challenge: How to bring existing traditional Solid data into framework management without breaking existing workflows or data integrity?
Key Design Questions:
- Discovery approach: How to identify existing resources suitable for import?
- Data preservation: Should imports create copies, wrappers, or migrate in-place?
- Relationship handling: How to maintain references between imported and existing resources?
- User control: What level of granular selection and rollback should be provided?
- Schema compatibility: How to handle traditional RDF that doesn't map cleanly to CRDT semantics?
Potential Approaches:
- Wrapper strategy: Create managed documents that reference originals
- Migration strategy: Convert traditional resources to CRDT format
- Hybrid strategy: Copy frequently-edited data, reference read-only data
Related: Integration with Type Index discovery patterns in ARCHITECTURE.md sections 4.4 and 6.1.
Status: Future Research
Problem: Framework assumes access control is handled externally, but production systems may need proactive permission checking to improve user experience.
Core Questions:
- Integration depth: Should the framework check permissions before attempting operations, or handle failures gracefully?
- Permission discovery: How can applications efficiently determine what resources are accessible?
- Sync behavior: When encountering access restrictions, should sync skip resources, fail entirely, or provide partial results?
- Performance trade-offs: What's the cost of permission checking vs. handling failed operations?
Design Considerations:
- Multiple access control systems (WAC, ACP, custom) may need support
- Caching strategies for permission information to minimize overhead
- Graceful degradation when permissions change during sync operations
- Integration with existing error handling and retry mechanisms
Related: Error handling patterns in ERROR-HANDLING.md and sync workflow in ARCHITECTURE.md Chapter 7.
Status: Future Research
Problem: Framework performs CRDT merge operations without semantic validation, potentially allowing invalid data states.
Core Questions:
- Validation timing: Should validation happen before merge, after merge, or both?
- Failure handling: When validation fails, should operations be blocked, flagged, or logged?
- Validation scope: Should validation apply to individual properties, complete resources, or cross-resource relationships?
- Performance impact: How to balance data quality with sync performance?
Design Considerations:
- Different validation technologies (SHACL, custom rules, application logic) have different trade-offs
- Validation conflicts between installations may require consensus mechanisms
- Schema evolution must account for validation rule changes over time
- Integration with merge contract system for consistent validation policies
Open Approaches:
- Pre-merge validation to prevent invalid merges
- Post-merge validation with rollback capabilities
- Progressive validation during resource access
- Hybrid approaches with different strictness levels
Related: Merge contract fundamentals in ARCHITECTURE.md Section 5.2 and error handling in ERROR-HANDLING.md.
Status: Future Research (Low Priority)
Current Approach: Framework uses only the Public Type Index, making all CRDT-managed resources discoverable by other applications.
Core Questions:
- Policy decisions: Should applications default to public or private Type Index registration for CRDT resources?
- User control: How should users control which resource types are registered privately vs. publicly?
- Setup UX: Should apps suggest privacy settings during setup, or should this be a user-driven decision?
- Discovery implications: How do applications handle mixed public/private resource scenarios?
Use Cases:
- Personal data not intended for collaboration (private journals, notes)
- Development/testing resources separate from production data
- Business data requiring access control but still needing CRDT capabilities
Design Considerations:
- Solid provides both Public and Private Type Index documents with different access controls
- Registration in Private Type Index requires the application to have privileged access
- Migration between public/private registration may be needed as resource usage evolves
- Mixed visibility scenarios require clear policies for collaboration boundaries
Related: Current Public Type Index usage in ARCHITECTURE.md section 4.2
v1 Critical Requirements (2 remaining topics):
- Extended CRDT Algorithm and RDF Structure Support
- Framework Version Compatibility Strategy
Permanent IRI Strategy✅ COMPLETED
v2+ Future Research (9 topics):
- Local-First Upgrade: End-to-End Encryption Support
- Non-RDF Binary File Support
- Multi-Pod Application Integration
- Custom Tombstone Format Optimization
- Provenance and Audit Trail Support
- Legacy Data Import
- Proactive Access Control Integration
- Data Validation Integration
- Private Type Index Support
When identifying new topics:
- Clearly describe the current limitation or opportunity
- Outline potential approaches or solutions
- Identify trade-offs and risks that need discussion
- Reference related sections in existing specifications
- Assign appropriate version target (v1/v2/v3+)
Topics graduate to active development when:
- Problem scope is well-defined
- Solution approaches are compared
- Implementation plan is developed
- Backwards compatibility is addressed
- Version timeline is confirmed