Current behavior
Right now users have to manually rename their export directories to platform-username-YYYYMMDD before processing. This is annoying and unnecessary since the username is already in the export files.
Required naming today:
google-john.doe-20250526/
instagram-jane_doe-2023-02-20/
snapchat-user123-2023-03-10/
What we already have
Some processors already do this:
- Google Chat - reads
user_info.json
- Snapchat Messages - falls back to
chat_history.json (checks IsSender field)
Everything else just parses the directory name.
Proposed solution
Build a unified username detection system that:
- Tries to auto-detect from export metadata first
- Falls back to directory name parsing if that fails
- Prompts user for manual input if all detection methods fail
- Works consistently across all processors
Detection sources by platform
| Platform |
Detection Method |
Source Files (in priority order) |
| Google (all services) |
1. Parse email from profile JSON 2. Extract from account HTML filenames 3. Parse directory name |
1. Profile/Profile.json → emails[0].value 2. Google Account/{username}.ChangeHistory.html → filename 3. Directory: google-{username}-{date} |
| Google Chat |
Already implemented via user_info.json |
Takeout/Google Chat/Users/{user}/user_info.json |
| Google Photos |
Share detection with other Google services |
Use Google account detection (see above) |
| Google Voice |
Share detection with other Google services |
Use Google account detection (see above) |
| Instagram Messages |
Parse from personal info HTML |
personal_information/personal_information/personal_information.html |
| Instagram Public Media |
Parse from personal info HTML |
personal_information/personal_information/personal_information.html |
| Instagram Old Format |
1. Parse from JSON data 2. Extract from filename |
1. {username}__{userid}.json.xz → decompress and read username field 2. Filename pattern: {username}__{userid}.json.xz |
| Snapchat Memories |
Share with Snapchat Messages or store in downloader metadata |
chat_history.json (shared) or downloader-stored metadata |
| Snapchat Messages |
Already partially implemented |
chat_history.json → find messages where IsSender: true |
Implementation ideas
- Add platform-specific detection functions to
common/utils.py
- Keep backward compat with directory name parsing (don't break existing workflows)
- Google exports: Try detection methods in order of reliability:
Profile/Profile.json (most reliable, but not always present in partial exports)
Google Account/*.html filenames
- Directory name (fallback for partial exports like Photos-only)
- Once detected, share username across all Google services in that export
- Instagram old format: Prefer reading from JSON data over filename parsing
- Snapchat: Share username between Memories and Messages when both present
- Manual input prompt: If all detection methods fail, prompt user to enter username before processing starts
- This is better UX than using "unknown" as the username
- Allows processing to continue without manual directory renaming
- Could cache/remember usernames for repeated processing
- Log which detection method worked for debugging
Why this matters
- Better UX - just extract and run, no manual renaming
- Fewer errors - no more typos in directory names
- Accurate - username comes directly from the export
- Less friction - one less thing users have to remember
- Graceful degradation - even if auto-detection fails, user can still provide username interactively
Notes
- Some Google exports are partial (e.g., only Google Chat or Photos) and won't have
Profile/ directory
- For partial Google exports, directory name may be the only reliable source
- All detection methods should be tried in order with graceful fallbacks
- Manual prompt should only appear if running interactively (not in batch/scripted mode)
Files to update
common/utils.py - main detection logic
processors/google_chat/preprocess.py - already has a working example
processors/snapchat_messages/preprocess.py - fallback detection example
- All processors currently calling
extract_username_from_export_dir()
Current behavior
Right now users have to manually rename their export directories to
platform-username-YYYYMMDDbefore processing. This is annoying and unnecessary since the username is already in the export files.Required naming today:
google-john.doe-20250526/
instagram-jane_doe-2023-02-20/
snapchat-user123-2023-03-10/
What we already have
Some processors already do this:
user_info.jsonchat_history.json(checksIsSenderfield)Everything else just parses the directory name.
Proposed solution
Build a unified username detection system that:
Detection sources by platform
2. Extract from account HTML filenames
3. Parse directory name
Profile/Profile.json→emails[0].value2.
Google Account/{username}.ChangeHistory.html→ filename3. Directory:
google-{username}-{date}user_info.jsonTakeout/Google Chat/Users/{user}/user_info.jsonpersonal_information/personal_information/personal_information.htmlpersonal_information/personal_information/personal_information.html2. Extract from filename
{username}__{userid}.json.xz→ decompress and readusernamefield2. Filename pattern:
{username}__{userid}.json.xzchat_history.json(shared) or downloader-stored metadatachat_history.json→ find messages whereIsSender: trueImplementation ideas
common/utils.pyProfile/Profile.json(most reliable, but not always present in partial exports)Google Account/*.htmlfilenamesWhy this matters
Notes
Profile/directoryFiles to update
common/utils.py- main detection logicprocessors/google_chat/preprocess.py- already has a working exampleprocessors/snapchat_messages/preprocess.py- fallback detection exampleextract_username_from_export_dir()