“Computers are useless. They can only give you answers,” said artist Pablo Picasso back in 1968.
Forty five years on and his point is still valid. Advances in information technology mean that computers can now give us answers to more complex questions faster, but they are still reliant on humans to provide them with questions to ask and tasks to perform.
These days, the people asking the most interesting questions of big data are data scientists. They are highly trained, super-smart people, typically from a maths or statistics background and usually with a track record in university-based research to doctoral level and beyond. They also have the computer programming skills required to build massively complex analytical models of both structured and unstructured data.
They are a different breed, however, from those specialists who focus primarily on the technology aspects of big data: the experts in implementing Hadoop and NoSQL platforms for analysis, for example, or in coding queries in languages, such as Hive, to run against big data stores.
In fact, data science has historically been a pretty arcane field, largely confined to university laboratories and commercial R&D departments. But, suddenly, data scientists are in huge demand among all sorts of commercial organisations, says Gavin Badcock, director at recruitment company Esynergy Solutions. These include banks, telecos, retailers, manufacturers and life sciences companies.
Working as a freelance contractor, an experienced data scientist can command a day rate of between £650 and £900. Daily rates of £1,000, meanwhile, “are not unheard of in this market”, Mr Badcock adds, and permanent salaries range between £100,000 and £170,000.
A lesser role, such as a data analyst responsible for data cleansing and structuring, and writing computer programmes in languages, such as R or Python, to produce reports and graphs for business users, might fetch £350 to £400 a day or permanent salaries of £40,000 to £70,000.
The vast majority have PhDs in maths, statistics, physics or computer science and are comfortable coding in several different computer languages
Those wages need to be viewed, however, within the context of the service that data scientists perform. “If you think of the value these people have the potential to create for the companies they work for, these rates of pay are actually a drop in the ocean, they’re nothing really,” Mr Badcock says.
“Data scientists have the skills and expertise to explore customer behaviour patterns, for example, so that companies can invent new products or launch new services that they know their customers will buy and they can be confident of selling those products in greater volumes, making loads more money in the process.
“In a sense, data scientists are the new inventors. Employers who recognise that fact are happy to pay high salaries. They see it as a sensible investment to make.”
But there’s more to the job satisfaction that comes from being a data scientist than getting a fat pay cheque. “For me, it’s about trying to understand customers and helping our clients give customers what they want,” says Giles Pavey, chief data scientist at customer science specialist DunnHumby, the company best known for its work with supermarket giant Tesco on its Clubcard loyalty programme.
“I can look anyone in the eye and say that the work I do is absolutely not about manipulating people; it’s about better understanding and meeting their needs,” he says.
Mr Pavey heads a team of 30 data scientists at DunnHumby, the vast majority of whom have PhDs in maths, statistics, physics or computer science and are comfortable coding in several different computer languages.
It’s an environment where challenging intellectual rigour and a strong commitment to problem-solving prevail. “I love the ‘hackathon’ culture in data science, where a team will work all day and all night on a difficult problem, and solving it gives an enormous sense of achievement,” he says. “The work is intellectually stimulating but, at the same time, you’re working on real-life problems. You’re a customer yourself. You understand the context.”
The problems that data scientist Fran Bennett prefers to work on, meanwhile, are social and environmental ones. After working in data science roles at internet companies Google and Ask.com, she teamed up last year with a friend, Bruce Durling, to establish data science consultancy Mastodon C.
“We wanted to focus on areas we feel are generally underserved by big data expertise so, wherever we get the chance, we like to work with charities and the public sector,” says Ms Bennett. Another area of expertise is working with sustainability data and the company conducts all its data processing in zero-carbon data centres as a point of principle.
“There are plenty of data science jobs in internet companies and investment banks, but we wanted to do something more entrepreneurial that also makes a meaningful contribution to issues we think are important,” she adds.
In a recent project, conducted in collaboration with another start-up Open Health Care UK and science writer Ben Goldacre, Mastodon C analysed a vast dataset relating to the prescription by UK GPs of cholesterol-lowering statin drugs.
By understanding better the regional patterns in which either branded or generic statins were prescribed, the research was able to show whereabouts in the country GPs were prescribing more expensive statins unnecessarily and how much excess cost that meant for the NHS in 2011-12 – around half its total statins bill of £400 million, in fact.
The skill of the data scientist lies in their ability to develop the techniques that make patterns in big data stand out and then explore these patterns more deeply to gain new insights.
In that respect it’s a bit like detective work, says Michael Natusch, a former Capgemini consultant who subsequently founded his own data science consultancy, Cumulus Analytics, before accepting a role earlier this year as head of data science for Europe, the Middle East and Africa at Pivotal, the big data spin-off company originally formed as a collaboration between technology vendors EMC and VMware.
There’s a real pleasure, he says, in playing around with algorithms to see how the same techniques might be applied to different-use cases in different industries. In a recent Pivotal project, for example, Mr Natusch and his team applied a technique originally developed by the company’s bioinformatics team to model human genomic data to a new project that aims to improve the in-car navigation systems being developed by a German automotive brand.
But the best feeling by far, he says, comes from stumbling across the unexpected secrets that big data can hold, preferably something that no other data scientist has unearthed before. “That’s a real thrill,” he says. “And when it happens, I honestly think I have the best job in the world.”