Predicting Infection of Organization Endpoints by Cybersecurity Threats using Ensemble Machine Learning
The proliferation of the malware industry is the result of the large volume of personal and confidential information shared on individual and public network. This has widened the scope of the organizations being vulnerable to malware - driven cybercrime. Such organized and distributed cyber-attacks can compromise the confidentiality, integrity and availability of any organization’s valuable data and resources. The endpoints (Desktops, Laptops, Mobiles, Servers, etc.) are more vulnerable and hence mainly targeted by the cyber criminals. The aim of this study is to determine the probability of such endpoints being affected by cybersecurity threats, based upon certain characteristics of the particular endpoint. Using the machine learning techniques applied in this study, like missing data analysis and imputation (Multiple Imputation), ensemble learning algorithms (Bagging and Boosting), it can be predicted that which devices/systems in an organization are likely to be infected by malwares, ransomwares or other such threats. Based on such findings, proactive measures can be taken, and cyber security strategies can be devised which can help organizations prevent losses to the tune of millions of dollars and become cyber resilient.
Vishakha Bhattacharjee (MS in Business Analytics, Columbia University)
Piyush Beri (MBA in Business Analytics, Symbiosis Centre for Management & Human Resource Development)