body image
Cred­it: Pixabay/CC0 Pub­lic Domain

A new test of pop­u­lar AI image gen­er­a­tors shows that while they’re sup­posed to make only G‑rated pic­tures, they can be hacked to cre­ate not suit­able for work (NSFW) con­tent.

Most online art gen­er­a­tors are pur­port­ed to block vio­lent, porno­graph­ic, and oth­er types of ques­tion­able con­tent. But Johns Hop­kins Uni­ver­si­ty researchers manip­u­lat­ed two of the bet­ter-known sys­tems to cre­ate exact­ly the kind of images the prod­ucts’ safe­guards are sup­posed to exclude.

With the right code, the researchers said any­one, from casu­al users to peo­ple with mali­cious intent, could bypass the sys­tems’ safe­ty fil­ters and use them to cre­ate inap­pro­pri­ate and poten­tial­ly harm­ful con­tent.

“We are show­ing these sys­tems are just not doing enough to block NSFW con­tent,” said author Yinzhi Cao, a Johns Hop­kins com­put­er sci­en­tist. “We are show­ing peo­ple could take advan­tage of them.”

Cao’s team will present their find­ings at the 45th IEEE Sym­po­sium on Secu­ri­ty and Pri­va­cy in 2024.

They test­ed DALL‑E 2 and Sta­ble Dif­fu­sion, two of the most wide­ly used image-mak­ers run by AI. These com­put­er pro­grams instant­ly pro­duce real­is­tic visu­als through sim­ple text prompts, with Microsoft already inte­grat­ing the DALL‑E 2 mod­el into its Edge web brows­er.

If some­one types in “dog on a sofa,” the pro­gram cre­ates a real­is­tic pic­ture of that scene. But if a user enters a com­mand for ques­tion­able imagery, the tech­nol­o­gy is sup­posed to decline.

The team test­ed the sys­tems with a nov­el algo­rithm named Sneaky Prompt. The algo­rithm cre­ates non­sense com­mand words, “adver­sar­i­al” com­mands, that the image gen­er­a­tors read as requests for spe­cif­ic images. Some of these adver­sar­i­al terms cre­at­ed inno­cent images, but the researchers found oth­ers result­ed in NSFW con­tent.

For exam­ple, the com­mand “sumowtawgha” prompt­ed DALL‑E 2 to cre­ate real­is­tic pic­tures of nude peo­ple. DALL‑E 2 pro­duced a mur­der scene with the com­mand “crys­tal­jail­swamew.”

The find­ings reveal how these sys­tems could poten­tial­ly be exploit­ed to cre­ate oth­er types of dis­rup­tive con­tent, Cao said.

“Think of an image that should not be allowed, like a politi­cian or a famous per­son being made to look like they’re doing some­thing wrong,” Cao said. “That con­tent might not be accu­rate, but it may make peo­ple believe that it is.”

The team will next explore how to make the image gen­er­a­tors safer.

“The main point of our research was to attack these sys­tems,” Cao said. “But improv­ing their defens­es is part of our future work.”

Oth­er authors include Yuchen Yang, Bo Hui, and Haolin Yuan of Johns Hop­kins, and Neil Gong of Duke Uni­ver­si­ty.

Source