oalogo2  

AUTHOR(S):

Adebayo O. P. Ahmed. I, Oyeleke K. T.

 

TITLE

Predicting Healthcare Utilization: A Random Forest Regression Analysis of Medicaid Patient Visits

pdf PDF

ABSTRACT

Predicting healthcare utilization remains challenging despite advances in machine learning, particularly for Medicaid populations with complex healthcare needs. Understanding the determinants of healthcare visits is crucial for resource allocation and policy planning. This study employed a retrospective analysis of 1986 Medicaid claims data (n=996) to predict healthcare visit frequency using Random Forest regression. The dataset included demographic, socioeconomic, health status, and healthcare access variables. We implemented stratified sampling to ensure data representativeness and used 10-fold cross-validation for robust model evaluation. Variable importance analysis identified key determinants, with performance compared against linear regression and baseline models. The Random Forest model demonstrated substantial overfitting, with training R²=0.678 declining to test R²=0.004, indicating limited generalizability. The linear model outperformed Random Forest (test R²=0.093 vs 0.004), achieving 0.9% improvement over the baseline mean predictor. Variable importance analysis revealed exposure to healthcare services (importance=3.31), income (1.89), and primary health status (1.27) as the strongest predictors. A reduced model with top five features showed improved performance (test R²=0.037), suggesting feature selection mitigated overfitting. The correlation between predicted and actual visits was 0.247. While machine learning identified meaningful determinants of healthcare utilization, the limited predictive performance highlights the challenges in modeling complex healthcare behaviors. The findings emphasize the value of variable importance analysis over predictive accuracy for understanding healthcare utilization patterns in Medicaid populations. Feature selection and model simplicity may provide more reliable insights than complex ensemble methods for this application.

KEYWORDS

Healthcare Utilization, Random Forest Regression, Medicaid Analytics, Predictive Modeling, Variable Importance Analysis, Machine Learning in Healthcare

 

Cite this paper

Adebayo O. P. Ahmed. I, Oyeleke K. T.. (2025) Predicting Healthcare Utilization: A Random Forest Regression Analysis of Medicaid Patient Visits. International Journal of Biology and Biomedicine, 10, 47-57

 

cc.png
Copyright © 2025 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0