Predicting Demand in Primary Health Care Centers in Lebanon: Insight from Syrian Refugees Crisis
Lebanon is a middle-income Middle Eastern country that has been hosting around 1.5 million Syrian Refugees, a figure representing one of the largest concentrations of refugees per capita in the world. The enormous influx of displaced refugees remains a daunting challenge for the national health services in a country whose own population is at 4 million, a problem that is exacerbated by the lack of national funds to allow respective municipalities to sufficiently balance its own services between host and refugee communities alike, prompting among the Lebanese population a sense of being disadvantaged by the presence of refugees
Our manuscript henceforth addresses the following question: can we analyse the spatiotemporal surge in demand recorded by primary health care centers through data provided by the Lebanese Ministry of Health in light of the peaks in events emanating from the Syrian war, and further model it in order to yield reasonably accurate predictions that can assist policy makers in their readiness to act on surges in demand within prescribed proximity to the locations in Syria where the peaks have taken place?
To this end, we embark on a process that analyses data from the Lebanese ministry of public health, representing primary health care demand, and augment it with data from the Syrian Violations Data Center, a leading repository for documenting casualties in the Syrian war. The objective of this study is to analyse the surge in demand on primary health care centers using data from MoPH in reference to the peaks in the Syrian war derived from the VDC, to produce both pointwise as well as probabilistic forecasting of the demand using a suite of statistical and machine learning models, to improve the recall values of surges in demand using utility based regression for capturing rare events – in our context, instances when the demand surges due to large scale events in Syria – and to reveal the rules and interactions between major input features that led these models to their prediction, using machine learning interpretability techniques.
All of the text entries in this dataset were in Arabic and so were translated using dictionaries that we developed in our capacity as native speakers. Dictionaries for translations could be found in here: translations.py
date
columndate
columnWe have done a suite of Machine learning models with cross validation. We have used 10-folds-10-reperats for cross validation.
dataset | best_model | r2 | rmse | mse | mae |
---|---|---|---|---|---|
all_columns | linear_svr | 0.825409 | 37.4547 | 1402.855 | 25.91299 |
all_columns_minus_weather | linear_svr | 0.824745 | 37.52578 | 1408.184 | 25.84897 |
all_columns_minus_weather_minus_lags | ada_boost | 0.40074 | 69.3909 | 4815.098 | 46.28747 |
all_columns_minus_weather_minus_vdc | linear_svr | 0.823036 | 37.70834 | 1421.919 | 25.84334 |
all_columns_minus_weather_minus_distance | linear_svr | 0.824562 | 37.54541 | 1409.658 | 25.87975 |
all_columns_minus_weather_minus_civilians | linear_svr | 0.822958 | 37.71668 | 1422.548 | 25.86042 |
all_columns_minus_weather_minus_lags_minus_distance | gradient_boost | 0.50776 | 62.89025 | 3955.183 | 43.24637 |
all_columns_minus_weather_minus_lags_minus_civilians | ada_boost | 0.386509 | 70.20998 | 4929.441 | 47.02291 |
univariate | linear_svr | 0.818153 | 38.22504 | 1461.153 | 26.03276 |
We have an imbbalanced regression, were rare events, which are mainly high demand values, are poorly present in the data and they do add confusion to the machine learning models and degrade performance. We have used Branco’s approach in the paper SMOGN: a Pre-processing Approach for Imbalanced Regression and forked their repository. We have used their R codes and called some appropriate helper functions from our python code using python’s rpy2 module
We have done probabilistic forecasts in both Ubr and non-Ubr mode
Ubr
Non Ubr
The experimental design was implemented in both Python and R language. Both code and data are in a format suitable for Python and R environment.
In order to replicate the experiments in CodeUbr you will need a working installation of R. Check [https://www.r-project.org/] if you need to download and install it.
You must have R 3.6.x
In your R installation you also need to install the following additional R packages:
All the above packages, with the exception of uba package, can be installed from CRAN Repository directly as any “normal” R package. Essentially you need to issue the following commands within R:
install.packages(c("DMwR", "performanceEstimation", "UBL", "operators", "class", "fields", "ROCR"))
install.packages("Hmisc")
Before you install the uba package, you need to have the latest version of R tools. Check https://cran.r-project.org/bin/windows/Rtools/
Additionally, you will need to install uba package from a tar.gz file that you can download from http://www.dcc.fc.up.pt/~rpribeiro/uba/.
For installing this package issue the following command within R:
install.packages("uba_0.7.7.tar.gz",repos=NULL,dependencies=T)
Other than R, in order to run the remaining experiments in CodeUbr as well as the experiments in Code you need the following python modules