Prediction versus explanation

Neil Cantle, principal at Milliman, poses the question ‘What do you want most from big data?’ and offers some answers

Modern business is highly complex and the globally interconnected nature of “everything” means trends can emerge quickly and disappear just as fast. Businesses used to be able to rely upon persistent trends in consumer behaviours and supplier dynamics, for example, but this is no longer the case and will only get worse.

Most businesses have to manage a network of interconnected parts which help them to build their products or services and reach their customers. An ability to anticipate the behaviours of some of the key elements of this network, such as suppliers and customers, is rapidly moving beyond “advantageous” to become rather “necessary”. Resilience requires an ability to adapt and good indicators about what might happen in the future to aid preparedness. Big data has an important role to play here.

So, what is big data all about? Well, it depends. To some, it is simply about applying new processing techniques that enable you to run queries over very large datasets. This can be a useful thing to do, but the key point is: “What questions does your analysis seek to answer?”

There are two main responses to this. First, is “prediction” – trying to find “reliable” similarities between the behaviours of some subset of factors in your “big” dataset and the outcome you want to “predict”. This can be useful if the relationships uncovered happen to make sense and persist over time. But it is always possible to find some variables somewhere which, for a period of time at least, behave similarly to the one you are interested in without having any real relationship between them whatsoever. Eventually that apparent relationship will disappear and your “predictions” are suddenly not very good, but you don’t know why.

The era of big data is taking us into a new paradigm of decision-making and learning, and there is no going back

The second type of analysis is arguably a more satisfying one – seeking “explanation” not just “prediction”. Studying large sets of information to learn about the underlying mechanism driving the outputs you see brings insight and meaning, helping you to find out more about “why” things are related, not just that they move in apparently similar ways. Of course, this second path is not as easy.

The study of complex systems tells us that historical development is one important factor in understanding the possible future paths they might follow. In a complex dynamic environment, past data can only be of limited help in speaking about emerging trends because they are not sufficiently developed in that data yet and small discrepancies in our understanding soon compound to become large.

However, those new trends are things that people might have imagined or begun to suspect. This knowledge is often captured in an unstructured form (media, discussions, images, and so on). New techniques are already beginning to blend the power of data-driven analytics with expert judgments about future developments – machines effectively learning with the benefit of human wisdom, as well as failings, and imagination, as well as theories developed from vast amounts of historical data. This creates an interesting relationship – machines learning from us and providing insights that we, in turn, learn from and create new information to feed back into the machines.

The era of big data is taking us into a new paradigm of decision-making and learning, and there is no going back. It will be interesting to see where it takes us, but hopefully it will be more about insight and explanation than black-box predictions.