Interpretable Statistical Modeling of Student Depression Risk

This project evaluates and compares GLM and LASSO logistic regression models for predicting depression risk in ~27,900 students. The study emphasizes interpretability, cross-validated performance, and feature-level analysis to identify robust psychological and lifestyle predictors.

Model Performance Summary:

• Accuracy: ~84.7%
• AUC: ~0.92
• LASSO → Higher Sensitivity
• GLM → Higher Specificity
• Stable 5-fold CV performance

Cross-Validation Stability Analysis:

This 5-fold cross-validation analysis evaluates model stability and predictive performance.

• Accuracy remains consistently high (~84.7%)


• AUC demonstrates strong discrimination (~0.92)


• LASSO shows higher sensitivity


• GLM shows higher specificity

Performance stability across folds indicates low overfitting and robust generalization.

Figure: Cross-validated comparison of GLM and LASSO across Accuracy, Sensitivity, Specificity, and AUC.

Figure: Top 10 predictors selected by the LASSO logistic regression model, ranked by absolute coefficient magnitude.

Top Predictors of Depression Risk​

Feature selection using LASSO identified the most influential predictors associated with depression risk:

1. Suicidal thoughts

2. Financial stress (Multiple category)

3. Dietary habits

4. Academic pressure

These predictors were consistently supported across interpretability analyses, strengthening model reliability.

GLM Odds Ratio Interpretation

The GLM model provides interpretable odds ratios with confidence intervals, allowing clear statistical interpretation of depression risk factors.

• OR > 1 → Increased depression risk


• OR < 1 → Protective effect


• Confidence intervals indicate statistical reliability

Figure: GLM-derived odds ratios highlighting statistically significant risk-increasing and protective factors for depression.