This repository contains an Ordinary Least Squares (OLS) analysis of the relationship between economic growth and human development in BRIC countries (Brazil, Russia, India, and China) from 2000 to 2019.
The project analyzes two causal chains:
-
Chain A: Economic Growth → Human Development
- Variables: GDP growth, HDI, education expenditure, health expenditure
- Lagged variables: 5-year lags for GDP growth and HDI
-
Chain B: Human Development → Economic Growth
- Variables: GDP growth, HDI, gross capital formation (gcf)
- Lagged variables: 5-year las for GDP growth and HDI
.
├── data/ # Data files
│ ├── raw/ # Raw data files
│ └── processed/ # Processed data files
├── src/ # Source code
│ ├── s00_main.py # Main execution script
│ ├── s01_load_data.py # Loads and combines BRIC data from Excel
│ ├── s02_clean_data.py # Cleans and preprocesses data
│ ├── s03_visualize.py # Visualization functions
│ ├── s04_ols.py # OLS regression analysis
│ ├── s05_ols_diagnostics.py # OLS model diagnostics
│ ├── s06_classify_cycles.py # Development cycle classification
│ ├── s07_tables.py # Table generation functions
│ └── utils.py # Utility functions (lag creation)
├── figures/ # Generated figures
│ ├── descriptive/ # Descriptive analysis plots
│ └── diagnostics/ # OLS diagnostics plots
├── tables/ # Generated tables
│ ├── descriptive_stats.md
│ ├── trend_analysis.md
│ ├── regression_summary.md
│ ├── regression_coefficients.md
│ ├── durbin_watson_tests.md
│ ├── cycle_distribution.md
│ ├── cycle_analysis.md
│ └── performance_comparison.md
├── outputs/ # Analysis outputs
│ └── regression_r2_summary.csv
├── results/ # Analysis results
│ └── cycle_analysis.md
└── requirements.txt # Python dependencies
The project requires the following Python packages:
- pandas: For data manipulation and analysis
- openpyxl: For Excel file handling
- statsmodels: For statistical models and tests
- matplotlib: For creating visualizations
- seaborn: For enhanced visualizations
- scipy: For statistical functions
- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtRun the main script to perform the full analysis:
python s00_main.pyThis will execute the following steps in sequence:
- Load BRIC data from Excel files (s01_load_data.py)
- Clean and preprocess the data (s02_clean_data.py)
- Perform descriptive analysis and generate visualizations (s03_visualize.py)
- Create lagged variables (utils.py)
- Run OLS regression analysis (s04_ols.py)
- Generate OLS diagnostics (s05_ols_diagnostics.py)
- Classify development cycles (s06_classify_cycles.py)
- Generate analysis tables (s07_tables.py)
The script provides real-time progress updates with standardized logging messages for each step.
- Loads data from Excel files for each BRIC country
- Combines data for both chains (A and B)
- Standardizes column names and formats
- Removes missing values
- Handles outliers
- Separates data by chain type
- Removes unused variables (e.g., gcf from Chain A)
- Generates descriptive statistics
- Creates time series plots
- Produces correlation heatmaps
- Generates distribution plots
- Performs OLS regression for both chains
- Handles lagged and non-lagged variables
- Generates regression outputs and plots
- Performs VIF analysis
- Generates QQ plots
- Creates residuals vs fitted plots
- Calculates Cook's distance
- Analyzes development cycles
- Classifies countries into development typologies
- Generates development cycle plots
- Generates descriptive statistics tables
- Creates trend analysis tables
- Produces regression summary tables
- Generates cycle analysis tables
- Creates performance comparison tables
The analysis generates:
-
Processed data in Excel format:
data/processed/bric_regression_data.xlsx- Separate sheets for each chain and country
-
Visualizations in the figures directory:
- Descriptive analysis plots
- OLS diagnostic plots
- Development cycle plots
-
Analysis tables in the tables directory:
- Descriptive statistics
- Trend analysis
- Regression summaries
- Cycle analysis
- Performance comparisons
-
Analysis results:
- Regression summaries in outputs/
- Cycle analysis in results/
- R² summary in outputs/
Feel free to submit issues and enhancement requests.
This project is open source and available under the MIT License.