A data-driven exploration of the relationship between SAD severity and family history of behavioral problems.
This project was developed as part of the Data Analysis and Interpretation Specialization from Wesleyan University on Coursera. It explores the relationship between Social Anxiety Disorder (SAD) and a family history of behavioral problems using data from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC). The analysis applies statistical hypothesis testing, including ANOVA, Chi-Square test, Pearson Correlation, and moderation analysis (Gender as a moderator) to assess these relationships.
- Source: National Epidemiologic Survey on Alcohol and Related Conditions (NESARC)
- File:
source/NESARC Dataset.csv - Size: 252.61 MB (tracked with Git LFS)
- Key Variables:
| Variable | Type | Description |
|---|---|---|
SAD_score |
Numerical | Composite score derived from SAD-related survey responses. |
SAD_spectrum |
Categorical | Low (≤2), Medium (2-5), High (>5) severity categories. |
behavior_problems_count |
Numerical | Number of relatives with behavioral problems. |
relatives_with_problems |
Binary (Y/N) | Presence of ≥1 relative with behavioral problems. |
- Primary: Determine if family history of behavioral problems correlates with higher SAD severity.
- Secondary: Assess if gender moderates this relationship.
- Cleaning: Removed missing values and standardized variables.
- Feature Engineering:
- Created
SAD_scorefrom symptom-related survey responses. - Binned
SAD_scoreinto Low/Medium/High categories. - Derived
relatives_with_problemsfrombehavior_problems_count.
- Created
A one-way ANOVA revealed significant differences in behavior problems across SAD severity groups:
| Source | Sum Sq | df | F | p-value |
|---|---|---|---|---|
| SAD_spectrum | 125.78 | 2 | 22.24 | 2.48e-10 |
| Residual | 10919.14 | 3862 | – | – |
Post-hoc Tukey's HSD: All group pairs showed significant differences (p < 0.05).
- χ² = 34.56 (p = 3.13e-08)
- Cramér's V = 0.095 (small effect size)
Conclusion: Significant association between family history and SAD severity.
- r = 0.08 (p = 3.82e-07)
- r² = 0.0067 (0.67% variance explained)
Conclusion: Weak but statistically significant correlation.
An ANOVA with interaction terms tested if gender moderates the SAD-behavior relationship
Key Findings:
- Interaction term p = 0.187 → Gender does not significantly moderate the relationship.
- Main effects of
SAD_spectrumremain significant.
NESARC_research/
├── source/ # Raw data (tracked via Git LFS)
│ └── NESARC Dataset.csv
├── .gitattributes # Git LFS configuration
├── DMV.ipynb # Jupyter Notebook (full analysis)
├── LICENCE # MIT Licence
├── README.md # Project documentation
└── requirements.txt # Python dependencies
git clone https://github.com/PogloLopez/nesarc_research.git
cd nesarc_researchgit lfs install # Set up Git LFS
git lfs pull # Download datasetpython -m venv .venv
source .venv/bin/activate # Mac/Linux
.\.venv\Scripts\activate # Windowspip install -r requirements.txtLaunch Jupyter Notebook:
jupyter notebook DMV.ipynbThis project is licensed under the MIT License. See LICENSE for details.
💡 For questions or collaborations, contact Pablo López or connect on LinkedIn.