I tested whether economic distress or draft policy drives U.S. military enlistment (1947–1962). While no variable was statistically significant due to small N (n=16), the model explained 76% of variation (Model 1). Random Forest confirmed that draft inductions are the top predictor, which suggests that conscription, not economic hardship, was the primary driver.
What are the major drivers for armed forces enlistment?
My hypothesis is that as unemployment rates rise, GNP falls, and draft induction rates increase, the overall number of army enlistments would increase due to fewer people being able to find work yet still needing to find some sort of livelihood.
For this project I conducted an analysis in R using the Longley data set, which I enriched with additional draft induction data from: https://www.sss.gov/history-and-records/induction-statistics/?utm_source=chatgpt.com.
I conducted multiple log-log regression analyses, as well as a random forest analysis, to check my hypothesis.
- Based on the results of the first model, it’s clear that the model has significant prediction power (R-squared = 0.7596); however, none of the variables were statistically significant.
- Following this result, I ran a VIF analysis to check for multicollinearity with the variables. GNP and population returned values of around 80 and 70, respectively. VIF > 5 = Multicollinearity.
- With this new information in mind, I ran the model again, this time dropping the population variable.
-
The resulting model had a slightly lower R-squared value of 0.722; however, the significance of the Unemployed and GNP variables is better highlighted.
-
A 1% increase in GNP resulted in a 0.86% increase in Armed Forces enlistment rates, and a 1% increase in unemployment resulted in a 0.56% decrease in Armed Forces enlistment rates.
-
Q-Q Residuals: Results demonstrate that the residuals are approximately normally distributed, supporting the validity of the hypothesis test.
-
Residuals vs Fitted: The residuals are relatively flat, meaning the log-log model was appropriate for this dataset.
-
Residuals vs. Leverage: The majority of the data points are within the Cook’s distance, except for one outlier. Likely influencing the model results. However, since this is a personal project, I will leave it be. Moving forward, it should be addressed to improve model performance.
-
Scale-Location: The scale-location shows a downward trend, hinting at possible homoskedasticity within the model.
-
The Random Forest Model was able to predict 74% of the variation within the dataset, which is slightly greater than the results of the log-log regression model.
-
Key Variables for the Random Forest Model: GNP and Inductions. GNP being the most important variable.
This project is limited by the small sample size (n=16). Moving forward, for better predictions and analysis, it would be advised to use cross-panel data from multiple nations during this time period to see if these results are robust.
- The log-log regression model was able to explain 72% of the variation within the dataset, with unemployment rate and GNP being the most important factors within that model.
- A 1% increase in GNP resulted in a 0.86% increase in Armed Forces enlistment rates, and a 1% increase in unemployment resulted in a 0.56% decrease in Armed Forces enlistment rates.
- The Random Forest Model was able to explain 74% of the variation within the dataset (slightly better than the log-log model)
- This model also corroborated the hypothesis that economic factors (GNP) and policy factors (draft induction rates) play a crucial role in army enlistments.
- However, I was expecting that unemployment would play a larger role. The low importance of the unemployment rate could be due to a number of things, like the concentration of those unemployed could be outside of the range of this fit to serve in the military, or lack of access to military recruitment sites (lack of access to transportation).