In this project, we use a dataset external to Azure ML ecosystem to train and deploy models using AutoML and HyperDrive services.
In this project, we demonstrate how to use the Azure ML Python SDK to train a model to predict mortality due to heart failure using Azure AutoML and Hyperdrive services. After training, we are going to deploy the best model and evaluate the model endpoint by consuming it.
This trained and deployed predictive model can potentially impact clinical practice, becoming a new supporting tool for physicians when assessing the increased risk of mortality among heart failure patients.
To set this project, we require access to Azure ML Studio. The application flow for the project design is as follows:
The Heart Failure Prediction dataset is used for assessing the severity of patients with heart failure. It contains the medical records of 299 heart failure patients collected at the Faisalabad Institute of Cardiology and at the Allied Hospital in Faisalabad (Punjab, Pakistan), during April–December 2015. The patients, who are aged 40 years and above, comprise of 105 women and 194 men who have all previously had heart failures.
The dataset contains 13 features, which report clinical, body, and lifestyle information and is use as the training data for predicting heart failure risks. Regarding the dataset imbalance, the survived patients (death event = 0) are 203
, while the dead patients (death event = 1) are 96
.
Additional information about this dataset can be found in the original dataset curators publication.
The task here is to predict mortality due to heart failure. Heart failure is a common event caused by Cardiovascular diseases (CVDs), and it occurs when the heart cannot pump enough blood to meet the needs of the body. The main reasons behind heart failure include diabetes, high blood pressure, or other heart conditions or diseases. By applying machine learning procedure to this analysis, we will have a predictive model that assesses the severity of patients with heart failure.
The objective of the task is to train a binary classification model that predict the target column DEATH_EVENT, which indicates if a heart failure patient will survive or not before the end of the follow-up period. This is based on the information provided by the 11 clinical features (or risk factors). The time feature is dropped before training since we cannot get a time value for new patients after deployment. The predictors variables are as follows:
Target variable - Death Event: If the patient died during the follow-up period
Death Event = 1
for dead patients and Death Event = 0
for survived patients
The data for this project can be accessed in our workspace through the following steps:
Download the data from UCI Machine learning repository or the uploaded dataset in this GitHub repository
Register the dataset either using AzureML SDK or AzureML Studio using a weburl or from local files.
For this project, we registered the dataset in our workspace using a weburl in Azure SDK and retrieve the data from the csv file using the TabularDatasetFactory Class.
We have used following configuration for AutoML.
automl_settings = {
"experiment_timeout_minutes": 30,
"max_concurrent_iterations": 5,
"primary_metric" : 'AUC_weighted'
}
automl_config = AutoMLConfig(
compute_target=compute_target,
task="classification",
training_data=dataset,
label_column_name="DEATH_EVENT",
n_cross_validations=5,
debug_log="automl_errors.log",
**automl_settings
)
As shown in above code snippet, the AutoML settings are:
The Best model is VotingEnsemble
with an AUC value of 0.9229042081949059
Model hyper-parameters used for VotingEnsemble are shown below:
The parameters for the model VotingEnsemble are described in the table below:
StandardScalerWrapper
Parameters | Values |
| ——————- | ——————-
class_name | StandardScaler
copy | True
module_name | sklearn.preprocessing._data
with_mean | True
with_std | False
GradientBoostingClassifier
Parameters | Values |
| ——————- | ——————-
ccp_alpha | 0.0
criterion | mse
init | None
learning_rate | 0.021544346900318822
loss | deviance
max_depth | 8
max_features | 0.5
max_leaf_nodes | None
min_impurity_decrease | 0.0
min_impurity_split | None
min_samples_leaf | 0.01
min_samples_split | 0.38473684210526315
min_weight_fraction_leaf | 0.0
n_estimators | 400
n_iter_no_change | None
presort | deprecated
random_state | None
subsample | 0.43157894736842106
tol | 0.0001
validation_fraction | 0.1
verbose | 0
warm_start | False
AutoML Run Widget provides information about logs recorded in Run
AutoML experiment in Completed state with some model details
Best Model Run Id
Best Model is VottingEnsemble with an AUC value of 0.92290
We use the SKLearn inbuilt Support Vector Machines (SVMs) for classification since it is capable of generating non-linear decision boundaries, and can achieve high accuracies. It is also more robust to outliers than Logistic Regression. This algorithm is used with the Azure ML HyperDrive service for hyperparameter tuning.
The hyperparameters tuned are inverse regularization strength -C and the kernel type -kernel with the search space defined for C as [0.5,1.0]
and kernel as [linear,rbf,poly,sigmoid]
. We used Random Parameter Sampling method to sample over discrete kernel types and returns a C value whose logarithm is uniformly distributed. Random sampling can serve as a benchmark for refining the search space to improve results.
Parameter search space and Hyperdrive configuration.
param_sampling = RandomParameterSampling( {
"--kernel": choice('linear', 'rbf', 'poly', 'sigmoid'),
"--C": loguniform(0.5, 1.0)
})
hyperdrive_run_config = HyperDriveConfig(
run_config=estimator,
hyperparameter_sampling=param_sampling,
policy=early_termination_policy,
primary_metric_name='AUC_weighted',
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
max_total_runs=20,
max_concurrent_runs=5
)
We applied a bandit early termination policy to evaluate our benchmark metric (AUC_weighted). The policy is chosen based on slack factor, avoids premature termination of first 5 runs, and then subsequently terminates runs whose primary metric fall outside of the top 10%. This helps to stop the training process after it starts degrading the AUC_weighted with increased iteration count, thereby improving computational efficiency.
The SVM model achieved an AUC value of 0.8333333333333334
with the following parameters:
Hyperparameter | Value |
| ——————- | ——————-
Regularization Strength (C) | 2.521868105479297
Kernel | sigmoid
Hyperdrive Run Widget provides information about logs recorded in the Run
Hyperdrive experiment in Completed state with AUC value for each iteration
Best model: After successfully running the experiment, we have the best model with kernel type as Sigmoid and C value of 2.521
Key | AutoML | Hyperdrive
| ——————- | ——————- | ——————-
AUC_weighed | 0.92290 | 0.83333
Best Model | VotingEnsemble | SVM
Duration | 39.16 minutes | 91.21 minutes
As shown in diagram, the VotingEnsemble model of AutoML performed better with an AUC value of 0.9226 compared to 0.8167 in Support Vector Machines through HyperDrive. So we will deploy the AutoML model.
The following steps are required to deploy a model using Azure SDK:
cpu_cores = 1
, memory_gb = 1
and application insights enabled.Successful model deployment using ACI (Azure Container Instance) and Application Insights enabled
Sample input data to query the endpoint
data = {
"data":
[
{
'Age':75,
'anaemia':0,
'creatinine_phosphokinase':582,
'diabetes':0,
'ejection_fraction':20,
'high_blood_pressure':1,
'platelets':265000,
'serum_creatinine':1.9,
'serum_sodium':130,
'sex':1,
'smoking':0
}
]
}
Response from webservice: When we make an API call to our endpoint with sample data, we will see the inference output of the model
A better performing AutoML model can be detected if the experiment timeout is increased.
Addressing the dataset imbalance by applying Synthetic Minority Oversampling Technique (SMOTE) can improve the performance of Hyperdrive model.
Converting the model into platform supported formats such as ONNX, TFLITE etc. will help optimize inference or model scoring and achieve scalability.
Enabled application insights during model deployment in order to log useful data about the requests being sent to the webservice.
Davide Chicco, Giuseppe Jurman: “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone”. BMC Medical Informatics and Decision Making 20, 16 (2020) Article.