Survival analysis as a classification problem
In this paper, we explore a method for treating survival analysis as a classification problem. The method uses a "stacking" idea that collects the features and outcomes of the survival data in a large data frame, and then treats it as a classification problem. In this framework, various statistical learning algorithms (including logistic regression, random forests, gradient boosting machines and neural networks) can be applied to estimate the parameters and make predictions. For stacking with logistic regression, we show that this approach is approximately equivalent to the Cox proportional hazards model with both theoretical analysis and simulation studies. For stacking with other machine learning algorithms, we show through simulation studies that our method can outperform Cox proportional hazards model in terms of estimated survival curves. This idea is not new, but we believe that it should be better known by statistiicians and other data scientists.
READ FULL TEXT