Leveraging internal data has been around for years. With the explosion of digital and social media, Organizations realized that by tapping all the data that streams into their businesses, they can significantly enhance the value they get from data. The insights from analytics and the ability of machines to crunch voluminous data has led to ‘data-driven’ decisions, be it – strategic, enhancing customer experience, increasing revenue, developing efficient systems & processes, risk management et al.
Predictive Analytics – What’s the Problem?
Analytics, itself have evolved. Predictive and prescriptive analytics have become mainstream. In the banking sector, scoring models help firms determine whether to extend a loan, amount of credit, interest rate etc., based on an applicant’s profile, credit history, repaying capacity, among many parameters. Similarly, an ecommerce firm captures a consumer’s intention, past purchases of the buyer and similar buyers, past click-through behavior, and the SKUs at its disposal to show product matches to help the buyer complete the sale.
The insights so generated are based on past data. The problem lies in there. The models essentially works on the belief that the future is more likely to playout as in the past. What was true yesterday, will be true tomorrow.
Let’s take the case of a loan applicant, who happens to enjoy an excellent credit score, positive net worth, and sufficient repayment capacity. The bank’s scoring model assumes that a good customer in the past is likely to be good customer in the future. In all probability, the bank is likely to extend the loan to the applicant.
However, in the practical world, the macro variables brings in a whole lot of uncertainties.
Improved decision making with external data
It seems intuitively obvious that macroeconomic conditions have a bearing on outcomes. Let’s come back to the case of loan applicant, who happens to be a manufacturer of spares for the tractor industry. The sales of the applicant fluctuates with the end-use markets and the demographics of its customers. A study of the demographic data shows a steady migration of young workforce from rural regions to cities. Will such a migration have an impact on the overall sales of tractors? Does such migration provide a lead indicator for farm equipment sales with a positive bias (assuming a negative correlation between availability of manpower and sale of tractors)? If this hypothesis is true, the bank’s loan scoring models, for long term loans (4 – 7 years), could improve significantly by considering not only the micro factors of the applicant but also the lead macro factors.
Taking the example further, the loan applicant who is a tyre manufacturer could benefit from data on vehicle registrations in a certain region. The tyre manufacturer knows the average life span of a tractor tyre is 5 – 6 years. If x no. of tractors have been registered (across manufacturers) in that region 5 years ago, the manufacturer with some additional confidence can build a predictive sales model (PSM) to predict the market demand. He can adjust the forecast numbers to try and get various scenarios of the PSM, based on varying macroeconomic conditions.
George Box once said "All models are wrong but some are useful".
Incorporating external data into your model can make it more useful.