The Smart Data Manager project demonstrates a Python-based ETL pipeline for managing and processing business data.
This script extracts data from a SQL Server database, cleans and transforms it, and loads aggregated summary tables back into the database.
These summary tables can be used for reporting, analytics, or integration with Power Apps and Power BI.
- Extract: Reads data from multiple SQL Server tables:
ProgramsProjectsProgressTeamsTeam_MembersMembers
- Transform: Cleans and standardizes data:
- Converts dates and numeric fields
- Handles missing columns safely
- Aggregates data for program, team, and member-level summaries
- Load: Writes processed summary tables to SQL Server:
Program_Summary_ReportTeam_Performance_ReportMember_Progress_Report
- Logging: Tracks ETL process in
etl_pipeline.log.
- Python 3.13+
- Libraries:
pandasSQLAlchemypyodbc
- Access to SQL Server with the following environment variables:
SQL_SERVERSQL_DBSQL_USERSQL_PASSWORDODBC_DRIVER(e.g., "ODBC Driver 17 for SQL Server")
- Clone the repository
git clone <your-repo-url> cd <repo-folder> ## Data Flow
flowchart TD
A[SQL Server Tables] --> B[ETL Script - Python: Extract, Transform, Load]
B --> C[Summary Tables in SQL Server]
C --> D[Power BI / Power Apps / Reports]