Scoring Model for Data Breaches forecasting and prevention

SCORING MODEL FOR DATA BREACHES FORECASTING AND PREVENTION

Machine Learning for prediction accuracy

Situation

We live in the “era of data”. Everyday companies face a compelling number of interactions with data source, and the more data is being produced, the more risks of this data to be used maliciously arise. Our Client, European start-up had an idea of developing a solution for the forecasting of spear-phishing of personal data for a wide scope of organizations located in various countries. This type of solution is excessively sought-after nowadays  – especially combined with ML approach for striking prediction accuracy. Knowing about Unicsoft expertise in that domain and having previous successful track record with us, Client decided to choose us as an authorized solution developer.
 

Solution

Firstly we defined main datasources: the pinnacle was VCDB database (catalog of data security incidents using VERIS framework). Second source used in the analysis is FT500 data set in 2016. Then, after merging these two data sets we obtained set for predicting data breach probability for a particular company. Key ML models were GLM and Random Forest models with RF prevailing because of its higher precision. Moreover, after implementation of Monte-Carlo simulation methods we added prediction of incident density within particular timeframe. As an additional perk we predicted expected loss in USD in case of attack for various industries.
 

Result

Solution was accepted positively right after its demonstration by various investors and got a lot of encouragement for the further development into more complex product with extended functionality. Due to model precision that amounted to 83% and the fact of Monte – Carlo simulation techniques implementation one may be confident that abovementioned solution has a huge potential and versatility.

TECHNOLOGY & TOOLS: MLlibPythonRMachine Learning

Platform: Web