Medical Diagnosis Models Of Stroke Prediction Using Electronic Health Records
Abstract
Stroke is one of the most serious life-threatening medical conditions that occurs when the blood
supply to part of the brain is cut-off. Stroke itself has so many sequelae. It can lead to lasting
brain damage, long-term disability, or even more dangerous sequelaes including death. Despite
the advances in medical technology, early detection and prediction of stroke remain difficult
due to its complex nature and multitude of risk factors involved.
This thesis presents a novel approach to the field of stroke prediction with the aim of applying
machine learning techniques for electronic health records (EHR) in order to determine the early
symptoms of stroke. The main focus of this research is about data preprocessing techniques
which may be useful for the EHR data and improve the performance of machine learning
models in this type of data. This research applies the domain knowledge of stroke in
conjunction with various data preprocessing techniques to the electronic health records (EHR)
data. The processing techniques include using domain knowledge in the medical field about
stroke to extract the most important features from the dataset, encoding categorical variables,
categorizing the data, handling missing values, among others. These techniques are crucial in
the part of preparing the data for effective modeling and ensuring the reliability of the studies.
In order to check the effectiveness of the preprocessing techniques, a variety of machine
learning models are used to evaluate the performance. These models span different categories,
including tree-based models like Decision Trees and Random Forests, distance-based models
such as K-Nearest Neighbors (KNN), and probabilistic models like Naive Bayes. Each model’s
performance is assessed using metrics including accuracy, precision, recall, and F1-Score. This
comprehensive evaluation allows us to understand the impact of preprocessing techniques on
the performance of different types of models.
The findings provide valuable insights into the role of data preprocessing in stroke prediction
and can guide future research in this area. It is believed that the approach can be generalized to
other medical conditions, paving the way for the development of robust and reliable predictive
models in healthcare. Future work will involve exploring more advanced preprocessing
techniques and machine learning models to further improve stroke prediction.