Interpretable Statistical Modeling of Student Depression Risk
This project evaluates and compares GLM and LASSO logistic regression models for predicting depression risk in ~27,900 students. The study emphasizes interpretability, cross-validated performance, and feature-level analysis to identify robust psychological and lifestyle predictors.
Model Performance Summary:
• Accuracy: ~84.7%
• AUC: ~0.92
• LASSO → Higher Sensitivity
• GLM → Higher Specificity
• Stable 5-fold CV performance
Cross-Validation Stability Analysis:
This 5-fold cross-validation analysis evaluates model stability and predictive performance.
• Accuracy remains consistently high (~84.7%)
• AUC demonstrates strong discrimination (~0.92)
• LASSO shows higher sensitivity
• GLM shows higher specificity
Performance stability across folds indicates low overfitting and robust generalization.
Figure: Cross-validated comparison of GLM and LASSO across Accuracy, Sensitivity, Specificity, and AUC.
Figure: Top 10 predictors selected by the LASSO logistic regression model, ranked by absolute coefficient magnitude.
Top Predictors of Depression Risk
Feature selection using LASSO identified the most influential predictors associated with depression risk:
1. Suicidal thoughts
2. Financial stress (Multiple category)
3. Dietary habits
4. Academic pressure
These predictors were consistently supported across interpretability analyses, strengthening model reliability.
GLM Odds Ratio Interpretation
The GLM model provides interpretable odds ratios with confidence intervals, allowing clear statistical interpretation of depression risk factors.
• OR > 1 → Increased depression risk
• OR < 1 → Protective effect
• Confidence intervals indicate statistical reliability
Figure: GLM-derived odds ratios highlighting statistically significant risk-increasing and protective factors for depression.
