Skip to content

Auto-detect username from export files instead of requiring manual directory renaming #1

@ethanriverpage

Description

@ethanriverpage

Current behavior

Right now users have to manually rename their export directories to platform-username-YYYYMMDD before processing. This is annoying and unnecessary since the username is already in the export files.

Required naming today:

google-john.doe-20250526/
instagram-jane_doe-2023-02-20/
snapchat-user123-2023-03-10/

What we already have

Some processors already do this:

  • Google Chat - reads user_info.json
  • Snapchat Messages - falls back to chat_history.json (checks IsSender field)

Everything else just parses the directory name.

Proposed solution

Build a unified username detection system that:

  1. Tries to auto-detect from export metadata first
  2. Falls back to directory name parsing if that fails
  3. Prompts user for manual input if all detection methods fail
  4. Works consistently across all processors

Detection sources by platform

Platform Detection Method Source Files (in priority order)
Google (all services) 1. Parse email from profile JSON
2. Extract from account HTML filenames
3. Parse directory name
1. Profile/Profile.jsonemails[0].value
2. Google Account/{username}.ChangeHistory.html → filename
3. Directory: google-{username}-{date}
Google Chat Already implemented via user_info.json Takeout/Google Chat/Users/{user}/user_info.json
Google Photos Share detection with other Google services Use Google account detection (see above)
Google Voice Share detection with other Google services Use Google account detection (see above)
Instagram Messages Parse from personal info HTML personal_information/personal_information/personal_information.html
Instagram Public Media Parse from personal info HTML personal_information/personal_information/personal_information.html
Instagram Old Format 1. Parse from JSON data
2. Extract from filename
1. {username}__{userid}.json.xz → decompress and read username field
2. Filename pattern: {username}__{userid}.json.xz
Snapchat Memories Share with Snapchat Messages or store in downloader metadata chat_history.json (shared) or downloader-stored metadata
Snapchat Messages Already partially implemented chat_history.json → find messages where IsSender: true

Implementation ideas

  • Add platform-specific detection functions to common/utils.py
  • Keep backward compat with directory name parsing (don't break existing workflows)
  • Google exports: Try detection methods in order of reliability:
    1. Profile/Profile.json (most reliable, but not always present in partial exports)
    2. Google Account/*.html filenames
    3. Directory name (fallback for partial exports like Photos-only)
    • Once detected, share username across all Google services in that export
  • Instagram old format: Prefer reading from JSON data over filename parsing
  • Snapchat: Share username between Memories and Messages when both present
  • Manual input prompt: If all detection methods fail, prompt user to enter username before processing starts
    • This is better UX than using "unknown" as the username
    • Allows processing to continue without manual directory renaming
    • Could cache/remember usernames for repeated processing
  • Log which detection method worked for debugging

Why this matters

  • Better UX - just extract and run, no manual renaming
  • Fewer errors - no more typos in directory names
  • Accurate - username comes directly from the export
  • Less friction - one less thing users have to remember
  • Graceful degradation - even if auto-detection fails, user can still provide username interactively

Notes

  • Some Google exports are partial (e.g., only Google Chat or Photos) and won't have Profile/ directory
  • For partial Google exports, directory name may be the only reliable source
  • All detection methods should be tried in order with graceful fallbacks
  • Manual prompt should only appear if running interactively (not in batch/scripted mode)

Files to update

  • common/utils.py - main detection logic
  • processors/google_chat/preprocess.py - already has a working example
  • processors/snapchat_messages/preprocess.py - fallback detection example
  • All processors currently calling extract_username_from_export_dir()

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions