Machine Learning and Data Science approach towards trend and predictors analysis of CDC Mortality Data for the USA
The research on mortality is an active area of research for any country where the conclusions are driven from the provided data and conditions. The domain knowledge is an essential but not a mandatory skill (though some knowledge is still required) in order to derive conclusions based on data intuition using machine learning and data science practices. The purpose of conducting this project was to derive conclusions based on the statistics from the provided dataset and predict label(s) of the dataset using supervised or unsupervised learning algorithms. The study concluded (based on a sample) life expectancy regardless of gender, and their central tendencies; Marital status of the people also affected how frequent deaths were for each of them. The study also helped in finding out that due to more categorical and numerical data, anomaly detection or under-sampling could be a viable solution since there are possibilities of more class labels than the other(s). The study shows that machine learning predictions aren't as viable for the data as it might be apparent.
READ FULL TEXT