Heart Failure Prediction using AzureML

In this project, we demonstrate how to use the Azure ML Python SDK to train a model to predict mortality due to heart failure using Azure AutoML and Hyperdrive services. After training, we are going to deploy the best model and evaluate the model endpoint by consuming it.

This trained and deployed predictive model can potentially impact clinical practice, becoming a new supporting tool for physicians when assessing the increased risk of mortality among heart failure patients.

Project Set Up and Installation
Dataset
Automated ML
- Results
- Improvements for AutoML
Hyperparameter Tuning
- Results
- Improvements for Hyperparameter Tuning
Automated ML and Hyperparameter Tuning Comparison
Model Deployment
Screen Recording
Future Improvements
Standout Suggestions
Citation

Project Set Up and Installation

To set this project, we require access to Azure ML Studio. The application flow for the project design is as follows:

Create an Azure ML workspace with a compute instance.
Create an Azure ML compute cluster.
Upload the Heart Failure prediction dataset to Azure ML Studio from this repository.
Import the notebooks and scripts attached in this repository to the Notebooks section in Azure ML Studio.
All instructions to run the cells are detailed in the notebooks.

Dataset

Overview

The Heart Failure Prediction dataset is used for assessing the severity of patients with heart failure. It contains the medical records of 299 heart failure patients collected at the Faisalabad Institute of Cardiology and at the Allied Hospital in Faisalabad (Punjab, Pakistan), during April–December 2015. The patients, who are aged 40 years and above, comprise of 105 women and 194 men who have all previously had heart failures.

The dataset contains 13 features, which report clinical, body, and lifestyle information and is use as the training data for predicting heart failure risks. Regarding the dataset imbalance, the survived patients (death event = 0) are 203, while the dead patients (death event = 1) are 96.

Additional information about this dataset can be found in the original dataset curators publication.

Task

The task here is to predict mortality due to heart failure. Heart failure is a common event caused by Cardiovascular diseases (CVDs), and it occurs when the heart cannot pump enough blood to meet the needs of the body. The main reasons behind heart failure include diabetes, high blood pressure, or other heart conditions or diseases. By applying machine learning procedure to this analysis, we will have a predictive model that assesses the severity of patients with heart failure.

The objective of the task is to train a binary classification model that predict the target column DEATH_EVENT, which indicates if a heart failure patient will survive or not before the end of the follow-up period. This is based on the information provided by the 11 clinical features (or risk factors). The time feature is dropped before training since we cannot get a time value for new patients after deployment. The predictors variables are as follows:

Age: age of patient (years)
Anaemia: Decrease of red blood cells or hemoglobin. It has a value of 1 or 0 with 1 being the patient does have this condition
Creatinine Phosphokinase: Level of the CPK enzyme in the blood (mcg/L)
Diabetes: Is a 1 or 0 - whether the patient suffers from diabetes or not
Ejection Fraction: Percentage of blood leaving the heart at each contraction (percentage)
High Blood Pressure: Is a 1 or 0 - If the patient has hypertension
Platelets: Platelets in the blood (kiloplatelets/mL)
Serum Creatinine: Level of serum creatinine in the blood (mg/dL)
Serum Sodium: Level of serum sodium in the blood (mEq/L)
Sex: Woman or man (binary)
Smoking: If the patient smokes or not
Time: Follow-up period (days)

Target variable - Death Event: If the patient died during the follow-up period

Death Event = 1 for dead patients and Death Event = 0 for survived patients

Access

The data for this project can be accessed in our workspace through the following steps:

Download the data from UCI Machine learning repository or the uploaded dataset in this GitHub repository
Register the dataset either using AzureML SDK or AzureML Studio using a weburl or from local files.
For this project, we registered the dataset in our workspace using a weburl in Azure SDK and retrieve the data from the csv file using the TabularDatasetFactory Class.

Automated ML

We have used following configuration for AutoML.

automl_settings = {
    "experiment_timeout_minutes": 30,
    "max_concurrent_iterations": 5,
    "primary_metric" : 'AUC_weighted'
}
automl_config = AutoMLConfig(
        compute_target=compute_target,
        task="classification",
        training_data=dataset,
        label_column_name="DEATH_EVENT",
        n_cross_validations=5,
        debug_log="automl_errors.log",
        **automl_settings
)

As shown in above code snippet, the AutoML settings are:

The task for this machine learning problem is classification
The primary_metric used is AUC weighted, which is more appropriate than accuracy since the dataset is moderately imbalanced (67.89% negative elements and 32.11% positive elements).
n_cross_validation of 5 folds rather than 3 is used which gives a better performance.
An experiment_timeout_minutes of 30 is specified to constrain usage.
The max_concurrent_iterations to be executed in parallel during training is set to 5 so the process is completed faster.

Results

The Best model is VotingEnsemble with an AUC value of 0.9229042081949059

Model hyper-parameters used for VotingEnsemble are shown below:

The parameters for the model VotingEnsemble are described in the table below:

Improvements for AutoML

Increase experiment timeout to allow for model experimentation.
Remove some features from our dataset that are collinear or not important in making the decision.

AutoML Run Widget provides information about logs recorded in Run

AutoML experiment in Completed state with some model details

Best Model Run Id

Best Model is VottingEnsemble with an AUC value of 0.92290

Hyperparameter Tuning

We use the SKLearn inbuilt Support Vector Machines (SVMs) for classification since it is capable of generating non-linear decision boundaries, and can achieve high accuracies. It is also more robust to outliers than Logistic Regression. This algorithm is used with the Azure ML HyperDrive service for hyperparameter tuning.

The hyperparameters tuned are inverse regularization strength -C and the kernel type -kernel with the search space defined for C as [0.5,1.0] and kernel as [linear,rbf,poly,sigmoid]. We used Random Parameter Sampling method to sample over discrete kernel types and returns a C value whose logarithm is uniformly distributed. Random sampling can serve as a benchmark for refining the search space to improve results.

Parameter search space and Hyperdrive configuration.

param_sampling = RandomParameterSampling( {
        "--kernel": choice('linear', 'rbf', 'poly', 'sigmoid'),
        "--C": loguniform(0.5, 1.0)
})
hyperdrive_run_config = HyperDriveConfig(
                            run_config=estimator,
                            hyperparameter_sampling=param_sampling,
                            policy=early_termination_policy,
                            primary_metric_name='AUC_weighted',
                            primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                            max_total_runs=20,
                            max_concurrent_runs=5
)

We applied a bandit early termination policy to evaluate our benchmark metric (AUC_weighted). The policy is chosen based on slack factor, avoids premature termination of first 5 runs, and then subsequently terminates runs whose primary metric fall outside of the top 10%. This helps to stop the training process after it starts degrading the AUC_weighted with increased iteration count, thereby improving computational efficiency.

Results

The SVM model achieved an AUC value of 0.8333333333333334 with the following parameters:

Improvements for Hyperparameter Tuning

We could improve this model by performing more feature engineering during data preparation phase.
Adding more hyperparameters to be tuned can increase the model performance.
Increasing max total runs to try a lot more combinations of hyperparameters, though this could have an impact on cost and training duration.

Hyperdrive Run Widget provides information about logs recorded in the Run

Hyperdrive experiment in Completed state with AUC value for each iteration

Best model: After successfully running the experiment, we have the best model with kernel type as Sigmoid and C value of 2.521

Automated ML and Hyperparameter Tuning Comparison

As shown in diagram, the VotingEnsemble model of AutoML performed better with an AUC value of 0.9226 compared to 0.8167 in Support Vector Machines through HyperDrive. So we will deploy the AutoML model.

Model Deployment

The following steps are required to deploy a model using Azure SDK:

Register the dataset using SDK
Find the best model using Automl
Use the environment of automl’s best_run or create a custom environment
Use the score.py file generated when the model is trained for deployment and evaluation. The scoring script describes the input data the model endpoint accepts.
Deploy the model using any of the deployment choices - ACI, AKS or local. For our project, we deploy the model as webservice using Azure Container Instance with cpu_cores = 1, memory_gb = 1 and application insights enabled.
For inferencing, pass the sample test data in json format to model endpoint to test the webservice. This will be processed by the score.py file to make successful rest api call.

Deployed model

Successful model deployment using ACI (Azure Container Instance) and Application Insights enabled

Sample input data to query the endpoint

data = {
    "data":
    [
        {
            'Age':75,
            'anaemia':0,
            'creatinine_phosphokinase':582,
            'diabetes':0,
            'ejection_fraction':20,
            'high_blood_pressure':1,
            'platelets':265000,
            'serum_creatinine':1.9,
            'serum_sodium':130,
            'sex':1,
            'smoking':0
        }
    ]
}

Response from webservice: When we make an API call to our endpoint with sample data, we will see the inference output of the model

Screen Recording

https://youtu.be/m4giyTylWzU

Future Improvements

A better performing AutoML model can be detected if the experiment timeout is increased.
Addressing the dataset imbalance by applying Synthetic Minority Oversampling Technique (SMOTE) can improve the performance of Hyperdrive model.
Converting the model into platform supported formats such as ONNX, TFLITE etc. will help optimize inference or model scoring and achieve scalability.

Standout Suggestions

Enabled application insights during model deployment in order to log useful data about the requests being sent to the webservice.

Citation

Davide Chicco, Giuseppe Jurman: “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone”. BMC Medical Informatics and Decision Making 20, 16 (2020) Article.

Heart Failure Prediction using AzureML

Table of Contents

Project Set Up and Installation

Dataset

Overview

Task

Access

Automated ML

Results

Improvements for AutoML

Hyperparameter Tuning

Results

Improvements for Hyperparameter Tuning

Automated ML and Hyperparameter Tuning Comparison

Model Deployment

Deployed model

Screen Recording

Future Improvements

Standout Suggestions

Citation