Skip to content

F3-Nation/f3-data-migration-toolkit

Repository files navigation

F3 Region Data Integrator

F3 National Administrators: Please refer to the National Administrator Guide for security and coordination best practices.

This project contains a suite of modular Python scripts designed to harvest, clean, and consolidate local F3 region data (like WordPress backblasts or legacy Google Sheets) into a clean, standardized format suitable for the F3 National Database.

Modularity

Because different F3 regions use different tools, these scripts are built modularly. You don't have to use the entire pipeline.

  • If your region only has a WordPress XML export and nothing else, you can run just convert.py.
  • If your region only manages data via Google Sheets, you can primarily rely on generate_user_reports.py and extract_missing_qs.py.

Use the tools that fit your region's historical tracking methods.

Directory Structure

import/ (Your Source Data)

This is where you place your raw data.

  • user_master.csv: The authoritative master list of users from the F3 National Database. Auto-generated by running fetch_master_users.py.
  • locations.csv: Your region's AOs (Area of Operations).
  • F3region.wordpress.com...xml (or similar): Your WordPress XML export containing backblasts.
  • legacy_pax_directory.csv / legacy_master_directory.csv: Exports of your old Google Sheets user data.
  • legacy_q_schedule.csv: Export of your old Q schedule spreadsheet.
  • manual_aliases.json: (Optional) A manually created file to override logic for difficult name matches. See samples/ for an example.
  • aliases.json & display_aliases.json: (Auto-generated by build_alias_map.py) Mapping dictionaries to link old aliases to canonical F3 National IDs.

output/ (The Final Generated Data)

This is where the scripts will drop the cleaned, formatted data ready for national integration.

  • {REGION_NAME}_wordpress_backblasts.csv: The master backblast repository linking Dates, AOs, Qs, and PAX attendees from WordPress exports.
  • {REGION_NAME}_missing_users.csv: Users assigned a TMP_ID_X because they could not be found directly.
  • {REGION_NAME}_qschedule_nobackblast.csv: Q schedule events that have no corresponding backblast in the National DB or WordPress extracts.
  • {REGION_NAME}_qschedule_nonworkoutevents.csv: Social events, Q-Sources, or other non-workout events separated from the main deficiency list.
  • event_overview.md: An automated summary report of your migration data health.
  • my_users.csv: A unified master roster of every single PAX extracted from legacy files, and WordPress.
  • my_users_output.csv: The official output received from the National BulkUserCreate script containing definitive database IDs.
  • users_alias.csv & users_downrange.csv: Audit logs documenting exactly how the scripts intelligently matched your old regional aliases to actual F3 National accounts.

samples/ (Dummy Data Templates)

This folder contains structurally accurate but anonymized examples of every possible input and output file.

  • samples/input/: Review these files if you need to know exactly how to format your region's raw data for the scripts to successfully parse them.
  • samples/output/: An empty structure showing where the final merged data files will be placed.

Execution Pipeline

To run the full suite, execute these processes in the following exact order:

0. (Optional) python fetch_master_users.py

Purpose: Connects directly to the F3 National PostgreSQL Database to download the latest global roster into import/user_master.csv. This prevents having to manually download the CSV file. Note: This assumes you have access to a .env file with database credentials.

Important

This script connects to the production environment and requires specific database credentials. Most regions will not have direct access to run this script. If you are a Regional Admin, you should work with a National Admin to either have them run this script for you or provide you with an updated user_master.csv export from the National Database.

1. python build_alias_map.py

Purpose: Scrapes all legacy files and WordPress XML data to find unrecognized user names. It uses intelligent algorithms (exact email, first/last name matches, and heuristic Regex scrubbing) to map these stray aliases back to authoritative users in user_master.csv.

2. python generate_user_reports.py

Purpose: Cross-references your legacy user directories, WordPress Authors, and PAXminer users to dump a completely unified user base formatted strictly to the National Guidelines bulk import schema. Outputs:

  • output/my_users.csv

3. Run the National BulkUserCreate Script

Purpose: Run the official F3-Nation/database-helpers user creation script against your newly generated my_users.csv to officially insert your region's pax into the national database. Command: python import_users.py my_users.csv Action Needed: Move the resulting my_users_output.csv straight into your output/ directory so our data conversion scripts can read the brand new database IDs!

4. Configure Region (config.py)

Purpose: Setup your specific region name so that the file outputs are accurately prefixed.

5. python convert.py & python extract_missing_qs.py

Purpose: Parses the WordPress XML feed and the legacy Q schedule array. convert.py directly translates the WP XML backasts into output/{REGION_NAME}_wordpress_backblasts.csv. extract_missing_qs.py parses legacy Qs and cross references the national DB to output cleanly formatted skipped backblasts to output/{REGION_NAME}_qschedule_nobackblast.csv. Outputs:

  • output/{REGION_NAME}_wordpress_backblasts.csv
  • output/{REGION_NAME}_qschedule_nobackblast.csv
  • output/{REGION_NAME}_qschedule_nonworkoutevents.csv
  • output/{REGION_NAME}_missing_users.csv
  • output/event_overview.md

About

This project contains a suite of modular Python scripts designed to harvest, clean, and consolidate local F3 region data (like WordPress backblasts or legacy Google Sheets) into a clean, standardized format suitable for the **F3 National Database**.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages