Skip to content

Implement consistent data protection encryption across all user data collections #4271

@smian1

Description

@smian1

Problem

Data protection encryption is inconsistently implemented across backend collections. Only 3 out of 8 collections containing sensitive user data have encryption support when users enable enhanced data protection.

This creates a security gap where some user data is protected while other equally sensitive data is stored in plaintext.

Current State

✅ Collections WITH Data Protection

Collection File Encrypted Fields Implementation
Memories backend/database/memories.py content ✅ Full decorators + helpers
Conversations backend/database/conversations.py transcript_segments ✅ Full decorators + helpers
Chat Messages backend/database/chat.py text ✅ Full decorators + helpers

❌ Collections MISSING Data Protection

Collection File Sensitive Data Impact
Action Items backend/database/action_items.py Task descriptions, due dates, completion status ❌ High - personal tasks exposed
Daily Summaries backend/database/daily_summaries.py Headline, overview, highlights, people mentioned, memorable moments, sentiment analysis Critical - comprehensive daily activity exposed
Knowledge Graph backend/database/knowledge_graph.py Node labels, aliases, relationships ❌ High - personal knowledge/relationships exposed
Goals backend/database/goals.py Goal descriptions, progress tracking ❌ Medium - personal aspirations exposed
Wrapped backend/database/wrapped.py Yearly recap, statistics, summaries ❌ Medium - annual activity summary exposed

Why This Matters

  1. User Trust: Users enabling "Enhanced Data Protection" expect ALL their data to be encrypted, not just some of it
  2. Regulatory Compliance: Incomplete encryption could violate data protection regulations (GDPR, CCPA)
  3. Security Best Practices: Inconsistent security patterns create attack surface and confusion
  4. Feature Parity: Newer features (added Aug-Dec 2025) were implemented AFTER the data protection system existed but didn't adopt it

Technical Details

The existing implementation in memories.py, conversations.py, and chat.py uses a decorator-based pattern:

from .helpers import set_data_protection_level, prepare_for_write, prepare_for_read

# Encryption helpers
def _encrypt_data(data: Dict[str, Any], uid: str) -> Dict[str, Any]:
    # Encrypt sensitive fields
    
def _decrypt_data(data: Dict[str, Any], uid: str) -> Dict[str, Any]:
    # Decrypt sensitive fields

def _prepare_data_for_write(data: Dict[str, Any], uid: str, level: str) -> Dict[str, Any]:
    if level == 'enhanced':
        return _encrypt_data(data, uid)
    return data

def _prepare_data_for_read(data: Optional[Dict[str, Any]], uid: str) -> Optional[Dict[str, Any]]:
    if not data:
        return None
    level = data.get('data_protection_level')
    if level == 'enhanced':
        return _decrypt_data(data, uid)
    return data

# Apply decorators to CRUD operations
@set_data_protection_level(data_arg_name='data')
@prepare_for_write(data_arg_name='data', prepare_func=_prepare_data_for_write)
def create_item(uid: str, data: dict):
    # ... create logic

@prepare_for_read(decrypt_func=_prepare_data_for_read)
def get_items(uid: str):
    # ... read logic

Required Changes

All 5 collections need data protection implementation:

Implementation Checklist (for each collection)

  • Add encryption helper functions (_encrypt_*, _decrypt_*)
  • Add _prepare_*_for_write and _prepare_*_for_read functions
  • Apply @set_data_protection_level decorator to create/update operations
  • Apply @prepare_for_write decorator to create/update operations
  • Apply @prepare_for_read decorator to read operations
  • Identify which fields contain sensitive data that should be encrypted
  • Add data_protection_level field to data model
  • Test encryption/decryption round-trip
  • Test migration path for existing data (users upgrading protection level)

Collections to Update

  1. backend/database/action_items.py - Encrypt task descriptions and related sensitive fields
  2. backend/database/daily_summaries.py - Encrypt headline, overview, highlights, people_mentioned, memorable_moments
  3. backend/database/knowledge_graph.py - Encrypt node labels, aliases, edge labels
  4. backend/database/goals.py - Encrypt goal descriptions and progress notes
  5. backend/database/wrapped.py - Encrypt yearly summary data and statistics

References

  • Existing implementation: backend/database/memories.py (lines 17-56)
  • Helper decorators: backend/database/helpers.py
  • Encryption utilities: backend/utils/encryption.py
  • Data protection migration: backend/database/memories.py (lines 289-342)

Impact Analysis

Without this fix:

  • Users with enhanced data protection enabled have incomplete protection
  • ~62.5% of sensitive user data collections lack encryption (5 out of 8)
  • Recent features bypass security infrastructure that was already in place

With this fix:

  • Consistent security across all user data
  • Users get the protection they expect when enabling enhanced mode
  • Future features have clear patterns to follow
  • Compliance with data protection regulations

Priority: High - This is a security gap affecting user privacy and data protection compliance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions