The Benefits of Peripheral Vision for Machinery | MIT News
Maybe computer vision and human vision have more in common than meets the eye?
Research from MIT suggests that some type of robust computer vision model perceives visual representations the same way humans do using peripheral vision. These models, known as robust face-to-face models, are designed to overcome subtle elements of noise that have been added to the image data.
The way these models learn to transform images is similar to some elements involved in human peripheral processing, the researchers found. But because machines lack a visual periphery, little work on computer vision models has focused on peripheral processing, says lead author Arturo Deza, a postdoctoral fellow at the Center for Brains, Minds, and Machines.
“It seems that peripheral vision, and the textural representations that take place there, have proven to be quite useful for human vision. So our thinking was, OK, maybe there could be uses in machinery too,” says lead author Anne Harrington, a graduate student in the Department of Electrical Engineering and Computer Science.
The results suggest that designing a machine learning model to include some form of peripheral processing could allow the model to automatically learn visual representations that are robust to certain subtle manipulations in the image data. This work could also help shed light on the goals of peripheral processing in humans, which are still not well understood, Deza adds.
the research will be presented at the International Conference on Representations of Learning.
Humans and computer vision systems both have what is called foveal vision, which is used to peer into very detailed objects. Humans also possess peripheral vision, which is used to organize a vast spatial scene. Typical computer vision approaches attempt to model foveal vision — which is how a machine recognizes objects — and tend to ignore peripheral vision, Deza says.
But foveal computer vision systems are vulnerable to adversarial noise, which is added to image data by an attacker. In an adversarial attack, a malicious agent subtly alters images so that every pixel has been altered ever so slightly – a human wouldn’t notice the difference, but the noise is enough to fool a machine. For example, an image might look like a car to a human, but if it’s been affected by contradictory noise, a computer vision model can confidently misclassify it as, say, a cake, which could have serious implications in an autonomous vehicle.
To overcome this vulnerability, the researchers do what’s called adversarial training, where they create images that have been manipulated with adversarial noise, feed them to the neural network, then correct its errors by relabeling the data, then reforming the model.
“Just doing this extra process of relabeling and training seems to give a lot of perceptual alignment with human processing,” says Deza.
He and Harrington wondered if these adversarially formed networks are robust because they encode representations of objects similar to human peripheral vision. So they designed a series of psychophysical human experiments to test their hypothesis.
They started with a set of images and used three different computer vision models to synthesize representations of those images from noise: a “normal” machine learning model, one that had been trained to be robust against to the adversary and one that had been specially designed to account for certain aspects of human peripheral processing, called Texforms.
The team used these generated images in a series of experiments where participants were asked to distinguish between the original images and the representations synthesized by each model. Some experiments have also allowed humans to differentiate between different pairs of randomly synthesized images from the same patterns.
Participants kept their eyes focused on the center of a screen while images were projected onto the far sides of the screen, at different locations around their periphery. In one experiment, participants had to identify the odd image in a series of images that were only flashed for a few milliseconds at a time, while in the other they had to match an image presented to their fovea, with two candidate model images placed in their periphery. .
When the synthesized images were shown at the far periphery, participants were largely unable to tell the difference between the original for the tough opponent model or the Texform model. This was not the case for the standard machine learning model.
However, what is perhaps the most striking result is that the pattern of errors that humans make (depending on where stimuli land at the periphery) is strongly aligned with all experimental conditions that use the stimuli derived from the Texform model and the adversarial model. robust model. These findings suggest that adversary-robust models capture some aspects of human peripheral processing, Deza says.
The researchers also calculated specific machine learning experiments and image quality assessment measures to study the similarity between the images synthesized by each model. They found that those generated by the adversarially robust model and the Texforms model were the most similar, suggesting that these models compute similar image transformations.
“We shine a light on this alignment about how humans and machines make the same kinds of mistakes, and why,” Deza says. Why does contradictory robustness occur? Is there a biological equivalent for adversary toughness in machines that we haven’t yet discovered in the brain? »
Deza hopes these results will inspire further work in this area and encourage computer vision researchers to consider building more biologically inspired models.
These results could be used to design a computer vision system with some sort of emulated visual periphery that could make it automatically robust to adversarial noise. The work could also inform the development of machines capable of creating more accurate visual representations using some aspect of human peripheral processing.
“We could even learn more about human vision by trying to extract certain properties from artificial neural networks,” adds Harrington.
Previous work had shown how to isolate “robust” parts of images, where forming patterns on those images made them less susceptible to adversarial failures. These robust images look like scrambled versions of real images, says Thomas Wallis, professor of perception at the Institute of Psychology and Center for Cognitive Science at the Technical University of Darmstadt.
“Why do these robust images look the way they do? Harrington and Deza use careful human behavioral experiments to show that people’s ability to see the difference between these images and the original photographs at the periphery is qualitatively similar to that of images generated from biologically inspired models of peripheral processing. information in humans,” says Wallis, who was not involved in this research. “Harrington and Deza propose that the same mechanism of learning to ignore certain visual input changes in the periphery may be why robust images look the way they do, and why training on robust images reduces contradictory susceptibility. This intriguing hypothesis deserves further investigation and could represent another example of synergy between research in biological intelligence and artificial intelligence.
This work was supported, in part, by the MIT Center for Brains, Minds, and Machines and Lockheed Martin Corporation.