Home » Uncategorized » Computer Vision, Then And Now

About

The Yale Ledger is a student-led magazine showcasing content from around the Yale community.

If you are affiliated with the Yale student community and have an article you want to share, please email Layla Winston.

If you notice any spam or inappropriate content, please contact us so we can remove it.

Computer Vision, Then And Now

Artificial Intelligence is one of the most widely-discussed and debated technologies of the 21st century. Within that rapidly-evolving field, the most exciting developments are happening under the umbrella of “Computer Vision”.

When a human being sees something, the raw light entering their eyes only constitutes the first layer of meaning. The human brain can quickly turn that data into individual objects and scenes, allowing them to label and concretely identify what they’re seeing. Through this process, a human can determine where they are, what object they’re interacting with, what person they’re talking to, and so on. This highly-complex process of interpretation happens instinctually, without much conscious thought by any of us.

Computer vision is a branch of scientific research that seeks to give computers this same ability to interpret and derive meaning from videos or images.

Research into computer vision began in earnest in the 1960s, as part of a larger effort to develop machines that thought and reasoned like human beings. Researchers were incredibly optimistic during this time period, making grand claims that artificial intelligence would be the equal of any human being within a decade or so. Unfortunately, they underestimated the difficulty of achieving such a task, and despite several breakthroughs, interest and funding largely dried up by the mid-70s due to a lack of tangible results.

It was not all grim, though. In the 70s, optical character recognition (OCR) technology was introduced, which can distinguish printed or handwritten text characters inside digital images, such as scanned documents. It soon found its way in a variety of applications: recognizing vehicle plates, processing paperwork, and automatically translating text, among others. In 1979, an artificial neural network (initially intended for handwriting recognition) was first proposed by Japanese researcher Kunihiko Fukushima. Such a system would become the inspiration for later neural networks used to train today’s computer visions.

Apart from a brief revival during the 90s, interest in artificial intelligence (and computer vision along with it) entered a fallow period, until the victory of the chess-playing computer Deep Blue against Grandmaster Gary Kasparov reignited hopes of a reasoning machine.

Progress in computer vision carried on from there. In more recent years, ImageNet, a massive virtual database containing millions of images in a wide variety of subjects, was released for use by researchers to help train visual-based neural networks.

Today, computer vision development is one of the most rapidly-evolving fields of computing today. For those curious about the current state of computer vision (and artificial intelligence in general), the Youtube channel Two Minute Papers is an excellent way to keep up with the latest developments in this exciting field of research. Viewing the almost miraculous demonstrations on the channel, it becomes hard to remember that this is not magic, but very real technology. Among the things that can be done now include: Generating convincing images from text descriptions, converting an image into a variety of different art styles, filling in the missing portions of an incomplete image, and creating 3D models of objects from a photo. It is only because computers can now recognise and categorise the different objects they see, that such feats are possible.

While these examples are certainly entertaining, the real game-changer would be when we come to rely on them in our day-to-day lives. Case in point: self-driving cars. Early versions of these vehicles are already being developed by almost every tech developer and car company you can think of. These use sensors that constantly scan the roads, keeping themselves in their lanes without human input, and automatically steering or braking to avoid imminent collisions.

One of the most common causes of accidental death in our modern society are car accidents. Humans are easily distracted, and prone to taking unnecessary risks, such as driving when inebriated or sleepy. A computer driver can react to situations faster, and they never get tired or distracted. Automated driving could save thousands of lives, by rendering one of the major causes of unnecessary death in our age far less likely to happen.

Many cynics and naysayers are already bemoaning this coming AI-produced image revolution. They complain that such technology will make photographic and video evidence inadmissible in court, and that impersonation and identity-theft will be far easier than ever. While some of their fears are valid (the rise of deepfakes certainly shows that such technology won’t only be used for good), they forget the fact that law enforcement can just as easily use it to protect and serve. While deepfakes are now a common sight online, deepfake-detecting technology has also been recently developed.

Just like any other technology, artificial intelligence and computer vision are not inherently good or bad. It is up to us, the public, to encourage positive and productive uses of the technology, while calling out and punishing destructive uses of it. Managed properly, it could be a game changer on-par with the emergence of the internet.


Leave a comment

Your email address will not be published. Required fields are marked *