Teaching computers to see the world like us

Building computers with enough intelligence to perform tasks that are as complex as those carried out by the human body is difficult; obviously it is. As we now strive to bring the advantages of computer vision to modern IT systems, we are faced with some tough challenges as we attempt to teach computers to actually “see” the world around us.

Over and above near-perfect levels of speech recognition and so-called natural language understanding, we have also fine-tuned optical character recognition (OCR) tools and text-to-speech technologies. But despite massive progress in image processing and object recognition, computer vision remains a comparatively wild frontier in terms of tech intelligence.

Computer vision is already present everywhere in our lives, from driving your car, to using your favourite search engine

As Fei-Fei Li, computer scientist and director of Stanford Vision Lab, remarks on the complicated nature of human vision and its intersection point with computer vision: “Just like to hear is not the same as to listen, to take pictures is not the same as to see.” Getting computers to really “see” what they’re seeing is a lot more complex than capturing an image, in any scenario.

Building eyes in our machines

To appreciate how computer vision will now develop, we first need to know how computers process visual information and understand how this differs from the core functions of the human eye and human brain. David Talaga, senior product manager at data integration company Talend, points out that computer vision systems don’t learn in the same way as humans. He says the process is much more data intensive.

“A human infant, for example, might only need to see four or five dogs to be able to recognise in the future that a newly sighted animal is also a dog. Training a computer to recognise a dog in an image, and eradicate false positives, is likely to require large data sets of hundreds of thousands of images,” says Mr Talaga.

There’s only so much we can do to bring down this reliance on data if we want to maintain high levels of accuracy in computer vision systems and keep errors to an absolute minimum. Deeper still, we have to ensure data sets have the highest possible levels of integrity to ensure they haven’t been compromised or fail to adhere to appropriate data governance processes.

“One advantage of the deep-learning through exhaustive data ingestion method is that if we expect that a human child will make mistakes – for example, it may mistake a fox for a dog – then we wouldn’t expect a highly trained computer vision system to make such a basic error. In fact, there are plenty of applications where such an error would prove actively dangerous. Consider an autonomous vehicle which misclassifies a cyclist as a motorcyclist and makes incorrect assumptions about how that road user will behave as a result,” says Mr Talaga.

Human eye limitations

As we now seek to engineer vision technology and computer vision algorithms into our lives, we need to understand our own limitations to decide where we should apply these technologies. Luka Crnkovic-Friis, chief executive of operational artificial intelligence (AI) platform provider Peltarion, says humans are good generalists when it comes to vision, but aren’t great when it comes to specialised visual tasks.

“For example, it takes many years of training for a radiologist to be able to identify a tumour in an MRI scan accurately. Currently, many hospitals are sitting on image backlogs. In these understaffed facilities, scans may not be analysed for weeks. Tumours can actually move in this time, rendering treatment ineffective and causing unnecessary exposure to radiation. Deep neural networks, on the other hand, can be trained on a specialised data set and can exceed human accuracy while being orders of magnitude faster,” says Mr Crnkovic-Friis.

He points out that computer vision can be applied to machinery, water pipes or any type of hardware to detect for signs of wear and tear. We have even seen the use of satellite images to identify areas of crop disease before they spread.

“Computer vision is already present everywhere in our lives, from driving your car, to using your favourite search engine, or looking to buy something from a retailer online,” says Mr Crnkovic-Friis.

“Generally speaking, the typical application of this technology is tasked with similarity matching (if you’ve indicated that you like a particular sock, a computer vision system will find similar socks and display them), classification tasks (when you search your photo library for ‘dog’, it finds all images with dogs) and semantic segmentation (an image taken from the camera of a car needs to separate out the road, other cars and so on).”

Computer vision you already use

Jabe Wilson, consulting director of text and data analytics at scientific and medical information company Elsevier, says computer vision has been used for many years in lots of diverse use-cases. Dr Wilson says computers have been reading handwritten UK postcode characters for some years now.

In addition, he points to the use of systems to identify people in CCTV records, OCR systems that read number plates to help locate where you parked in a car park, as well as technologies for scanning a house key and cutting a new one in a hardware store vending machine. Deep-learning computer vision techniques can also be used to represent non-visual data in image form, for example, when tagging music on a streaming service or building a map to predict the weather.

In the immediate future, we can look forward to new smart product offerings as AI works in union with computer vision to help automate the mundane tasks in our lives. Computers will never really see like we do, but their ability to classify the world around us into definable shapes and textures is becoming more advanced every day.