Predictive coding technology can slash workloads and costs through computerised sifting of the ever-increasing number of documents and other information sources
We live in the era of information overload. Every day, we generate more and more data, much of it meaningless. With the click of a button, you can e-mail 50 people with a 50-megabyte attachment. We rack up data on our PCs, laptops, smartphones and tablets. According to IBM, we create 2.5 quintillion bytes of data every day – and 90 per cent of the data in the world was actually created in the last two years. But we rarely bother to delete any of it.
In our day-to-day lives, this does not really matter. Who cares how much useless information we are storing? But for corporates faced with a law suit, it suddenly becomes a problem. Within that mushrooming expanse of data, there will be files that must be handed over to the opposing party through the legal process of disclosure and finding these is an increasingly daunting task.
In terms of disclosure, the data being generated on Facebook, Twitter, Instagram, LinkedIn and others is the next big challenge
The problem is often compounded in the corporate world by regulatory rules, which do not allow many files to be deleted, with investment firms, for example, obliged to retain large amounts of data, including mobile phone calls made by traders. Indeed, these days much of the data stored is not in traditional document form at all, but is video, audio and text, and increasingly on social media.
In terms of disclosure, the data being generated on Facebook, Twitter, Instagram, LinkedIn and others is the next big challenge.
Chris Dale, founder of the e-Disclosure Information Project, says: “Lawyers have just got their heads round the fact that e-mail and Word files are discoverable, but they have not yet applied their minds to all the non-traditional data sources. Even if they are thinking about social media, they are only looking at their duty to disclose it, but are not seeing its potential value as evidence.”
According to Mr Dale, this type of data could be important to the litigation strategy, for example if a witness claimed not to have been at a certain place, but a photo downloaded on to Facebook via their smartphone suggests otherwise.
“It might be unlikely to turn the case, but it could be useful, for example in undermining the credibility of a witness,” he says.
Technology created the problem of suffocating data, but it also holds the solution. An e-disclosure technique known as predictive coding can reduce the disclosure pile from what could be millions of files down to a manageable number.
As Jonathan Maas, senior director at e-disclosure consultancy Huron Legal, explains, predictive coding technology is like an “eager puppy”. It first completes a series of “training runs” on smallish samples of documents – not more than 2,000 – in which a lawyer will tell it what is important and what to discard. When it is ready, the lawyer then throws it a bone and off the puppy bounces to perform the same trick across the entire data set.
Litigation may have been the driver behind this new technology, but there is growing recognition it could be a handy tool in many other fields
The end result is a manageable parcel of files, neatly tied with a metaphorical bow and presented to the lawyers to be reviewed. Contrary to misconception, predictive coding does not mean handing any documents to a litigation opponent without a lawyer having eyes on them first.
For the lawyers, it does involve a leap of faith because the initial sifting has been done by computer, rather than the traditional team of exhausted fee-earners in crumpled suits. But lawyers have a duty to keep their costs proportionate, which means examining every document by hand is simply not an option now that the volume of data has been supersized.
As Vince Neicho, litigation support manager at City law firm Allen & Overy, puts it: “Using technology means that you will miss documents, but then so will a fatigued lawyer sitting in a room.”
Litigation may have been the driver behind this new technology, but there is growing recognition it could be a handy tool in many other fields; basically any task that involves pulling information from very large amounts of data.
Mr Maas explains: “The tech could be used in a number of theatres of war; for example, investigations by regulatory authorities. It could help in internal investigations – say, for insider dealing or IT theft. Or it could simply be used for information governance generally, for example where an organisation may have countless copies of the same thing. At the very least, you could use it to identify all the duplicates, storing the master document in a clearly labelled way, and get rid of all the rest.
“One of the big by-products of litigation is that you always end up with a spanking clean filing system with all your data in order.”
In the mergers and acquisitions field, if you have purchased a company, you will normally acquire a large amount of its data, often uncategorised and unsorted. E-disclosure technology can be deployed to find any intellectual property value tucked away in that mass and also the business risks that might be lurking within.
In situations where you know something is not quite right, but you do not know exactly what you are looking for, the latest clustering technology, which can be based on concepts rather than keywords, can provide the solution.
“Say you have bought a company that operates in Russia, which is a high-risk area in terms of the Bribery Act,” says Mr Dale. “You have 15 salesmen out there and it’s a good idea to find out what they are up to. Or say secrets are leaking out of your organisation or you start to think that something doesn’t smell right in a branch office. This is where clustering can help.”
Clustering is a way of grouping documents together according to their content, to create a high-level visual map of brightly coloured clusters. These can then be dismissed or, if something looks out of place, investigated further until you reach document level.
Mr Dale gives the concept of “Labrador” as an example. The technology would separate data into clusters about a place in the north east of the United States, a dog and the Spanish for worker. If it finds a lot of things about dogs, it will then group these again, say into dog, canine, Labrador, poodle and so forth.
“That’s an example of it telling you it has found a lot of documents that are similar documents, derived from the text within them and from the metadata, not from words you have fed it,” he explains.
The more technology forms a part of our work and personal lives, the faster the data mass will expand and multiply; at least until we learn to store it in a more organised way and delete duplicated or obsolete files. But also growing is the ability of e-disclosure technology, which began in the litigation context, but is about to spread its wings far wider to adapt to handle those volumes, boosted by ever-increasing processing power.
As Mr Neicho concludes: “This problem is not going to go away, so we need to deal with it.”
THE RIGHT TO BE FORGOTTEN
Next month sees the deadline for European Union member states to reach a possible agreement which could impact businesses across Europe – the General Data Protection Regulation.
The new regulation is a much needed update, given that Europe is currently operating under a set of rules created in 1995 when there were only around 23,500 websites on the internet, and social media and cloud computing did not even exist. The final version of the rules look set to be agreed in December, following many years of consultation, and will become law in two years’ time.
It will apply not just to EU-based companies, but any business – including US corporates – that “touches” the data of an EU citizen.
Focus is very much on the data privacy rights of the citizen. But some are concerned about the potential impact on business, particularly relating to the so-called “right to be forgotten” contained in article 17 of the rules, allowing EU individuals to demand the erasure of their personal data.
David Moseley from Veritas explains: “The regulation needed to happen. It will harmonise how we work together, but organisations need to improve their information management, governance and discovery of data or their IT department will become the bottleneck if still using manual processes to provide requested information.
“You need to develop systems to remove personally identifiable information, and streamline your retention and classification policies. A company could be inundated with requests under the proposed ruling, and if you’re dealing with legacy archives and fragmented locations, the IT department could easily be buried.”
This is where e-disclosure technology, with its ability to search through a galaxy of data at warp speed, could make all the difference.
Mr Moseley adds: “E-discovery tools will become a critical business competitive edge. It is about having an automated workflow. If you have 100 people asking for the same thing, why have a manual process?
“IT departments are being expected to do more with less. Unless you bring in the e-tools, unfortunately you will suffer the consequences.”