Natural Sciences

To Catch a Photoshopper

Teaching computers to spot altered images

On July 9, 2008, the Los Angeles Times, Chicago Tribune and other major papers ran an alarming picture on their front pages: four Iranian test missiles streaking skyward, an image that had been obtained from one of Iran’s media websites.

But there was a problem—the picture apparently had one too many streaking missiles.

Soon after the image had been widely published, the Associated Press distributed a nearly identical photo from the same news source—but in this picture, one of the missiles is still on the ground, attached to a launcher. A French news agency retracted the first image, calling it “apparently digitally altered” and saying the fourth missile appeared to have been “added in digital retouch to cover a grounded missile that may have failed during the test.”

Had the photo been Photoshopped? Experts said it appeared so.

Given the sheer number of digital images in circulation today, some say it’s unrealistic to expect news organizations, governments and others to catch every fraudulent photo. That’s why Robert Macy (right) is teaching computers to spot image alterations.

The math and computer science major helped develop software that predicts the likelihood that a digital image has been changed by adding, removing or copying content.

Macy had been looking for experience with “machine learning”—programming computers to learn from data and make predictions. Daniel Lowd, an associate professor in computer and information science who specializes in machine learning, added him to a team working on an image inspection project in the summer of 2016.

Using mathematical methods called algorithms, scientists can analyze an image’s pixels—the dots that are the smallest controllable elements of a picture on a computer screen. They can define the characteristics of a manipulated pixel with numerical values, and then run calculations that compare pixels and predict the probability that an image has been altered.

Working with other institutions that analyze images for specific changes such as the addition of content, the software designed by Macy’s team collects all assessments into a “unified” prediction.

“We combine our partners’ methods to come up with a better prediction than any of the supporting predictions individually,” he said.

The tool developed by Macy’s team can be adjusted to ignore changes that aren’t important. A news organization would presumably be interested in any alterations to an image, Macy said, but a research publication might want to be more selective. An image for science might include labels and other manipulations that help with understanding; the computer can be programmed to disregard these changes and focus on areas that have been changed in an attempt to deceive.

The big benefit of the software, of course, is speed—the computer runs the analyses with the press of a button. Some organizations work with millions of images, Macy said, and it would be infeasible for a person to have an active role in authenticating each of them.

“Some users have too many images coming in too fast to process, others have databases going back 30 years—it would be impossible for an expert to review them all,” said Macy, who graduated in June. “These automated methods help prevent more forgeries from slipping through the cracks.”

—Matt Cooper

 

Gallery of photos at top of page, from left to right:

1. Robert Macy helped design software that predicts the probability that content in a digital image has been added, removed or duplicated. Consider this image of a sea lion.

2. In this version, a second sea lion has been added by copying the first. The image is run through more than a dozen computer analyses that compare the picture’s individual points or pixels through mathematical calculations.

3. Macy’s software produces a “heat map” that predicts which areas of the image have been manipulated and in what manner, with darker colors indicating a higher degree of confidence that an alteration has occurred. The heat map shows two sea lion images because it detected that one of them had probably been copied to make the other. The shading in the upper-right corner was an incorrect prediction, with less confidence that this area had also been manipulated. The complete analysis takes about an hour.