AI Program Taught Itself How To 'Cheat' Its Human Creators

When most people think about the potential risks of artificial intelligence and machine learning, their minds immediately jump to "the Terminator" - a future where robots, according to a dystopian vision once articulated by Elon Musk, would march down suburban streets, gunning down every human in their path.

But in reality, while AI does have the potential to sow chaos and discord, the manner in which this might happen is much more pedestrian, and far less exciting than a real-life "Skynet". If anything, risks could arise from AI networks that can create fake images and videos - known in the industry as "deepfakes" - that are indistinguishable from the real think.

AI

Who could forget this video of President Obama? This never happened - it was produced by AI software - but it's almost indistinguishable from a genuine video.

Well, in the latest vision of AI's capabilities in the not-so-distant future, a columnist at TechCrunch highlighted a study that was presented at a prominent industry conference back in 2017. In the study, researchers explained how a Generative Adversarial Network - one of the two common varieties of machine learning agents - defied the intentions of its programmers and started spitting out synthetically engineered maps after being instructed to match aerial photographs with their corresponding street maps.

GAN

The intention of the study was to create a tool that could more quickly adapt satellite images into Google's street maps. But instead of learning how to transform aerial images into maps, the machine-learning agent learned how to encode the features of the map onto the visual data of the street map.

The intention was for the agent to be able to interpret the features of either type of map and match them to the correct features of the other. But what the agent was actually being graded on (among other things) was how close an aerial map was to the original, and the clarity of the street map.

So it didn’t learn how to make one from the other. It learned how to subtly encode the features of one into the noise patterns of the other. The details of the aerial map are secretly written into the actual visual data of the street map: thousands of tiny changes in color that the human eye wouldn’t notice, but that the computer can easily detect.

In fact, the computer is so good at slipping these details into the street maps that it had learned to encode any aerial map into any street map! It doesn’t even have to pay attention to the “real” street map — all the data needed for reconstructing the aerial photo can be superimposed harmlessly on a completely different street map, as the researchers confirmed:

The agent's actions represented an inadvertent breakthrough in the capacity for machines to create and fake images.

This practice of encoding data into images isn’t new; it’s an established science called steganography, and it’s used all the time to, say, watermark images or add metadata like camera settings. But a computer creating its own steganographic method to evade having to actually learn to perform the task at hand is rather new. (Well, the research came out last year, so it isn’t new new, but it’s pretty novel.)

Instead of finding a way to complete a task that was beyond its abilities, the machine learning agent developed its own way to cheat.

One could easily take this as a step in the “the machines are getting smarter” narrative, but the truth is it’s almost the opposite. The machine, not smart enough to do the actual difficult job of converting these sophisticated image types to each other, found a way to cheat that humans are bad at detecting. This could be avoided with more stringent evaluation of the agent’s results, and no doubt the researchers went on to do that.

And if even these sophisticated researchers nearly failed to detect this, what does that say about our ability to differentiate genuine images from those that were fabricated by a computer simulation?