Warning that the possibility that your face photo is used for image generation AI like 'Stable Diffusion' is never low

Among Internet users, there are many people who have seen and heard the advice that ``Once published on the Internet, it cannot be easily deleted, so do not carelessly upload your own face photo.'' . It has been pointed out that the popularity of image generation AI such as

Stable Diffusion and DALL E has made the Internet and privacy issues even more serious and complicated.

AI Is Probably Using Your Images and It's Not Easy to Opt Out

In September 2022, an image generation AI user discovered that an image dataset for AI training included photographs of his medical records. The person asked the company that created the dataset to delete his face photo, but the company said, ``The dataset is not the image itself, but just a collection of links to the image.'' was. The details of this incident are summarized in the article below.

Image generation AI users discover ``photos of their medical records'' from AI learning dataset-GIGAZINE

Regarding this issue, overseas media Motherboard contacted LAION, which handles the dataset 'LAION-5B' containing more than 5 billion images for AI training, and the company's engineer said, 'There is no discussion about the safety of the dataset. It is for the purpose and not prepared to withstand citations by journalists.'

Speaking to Motherboard, Tiffany Li, an attorney and assistant professor of law at the University of New Hampshire School of Law, said, 'Most of these large datasets have their imagery collected from other datasets, so the person who collected the imagery in the first place, It's hard to figure out who put it in the dataset first, who published it first, and when it comes to legal issues, I don't know who to sue. , it is also difficult to punish those who have done wrong.'

Image generation AI also has the problem that the person whose privacy is violated cannot notice the violation in the first place, even before the problem that it is difficult for the person whose privacy is violated to solve it. There are services like ' Have I Been Trained? ' to check if your image is in LAION-5B, but most people use such services to search for datasets containing their own headshots. Because I wouldn't do that.

“In general, most people don’t have access to datasets for AI, and they don’t know if their images are being used there,” Li said. But normal people wouldn't go out of their way to spy on every machine learning dataset out there to see if their photos were used, so they wouldn't know they were being harmed. It's really a problem that you might not notice it.'

In particular, as in the case where the data set contained photographs of medical records provided to doctors by patients with genetic disorders, it is quite possible that the person did not even have the idea of uploading a photograph of their face to the Internet in the first place.

According to a survey by Zach Marshall, an associate professor in the Department of Community Rehabilitation and Disability at the University of Calgary in Canada, in 70% of case reports published in medical journals, at least one image was found by Google image search. About. 'Clinicians don't even know they have to warn their patients,' Marshall says. I haven't touched on the issue,' he said.

Further complicating this issue is that datasets like LAION-5B are not images themselves, but a compilation of links to images on the Internet and associated text data. For this reason, the companies creating the datasets claim that 'it's the Internet that's bad, not us.' However, Motherboard points out that if AI generates new images using facial photos that have leaked on the Internet without permission, the problem of privacy infringement will become even more serious.

Because it is very difficult for individuals to find their own face photo in the dataset, identify the website that hosts the face photo from there, and ask the administrator to delete it, Mr. Li ``The developers of AI and machine learning tools, and the people who actually create the datasets, should be responsible, and not the individuals who use the photos or data. That's why,' he commented.

According to Mr. Li, in order to prevent such problems, the Federal Trade Commission in the United States will order the destruction of algorithms and AI models built using personal information that companies and organizations have maliciously or illegally collected. It is said that it is promoting the creation of a framework called “Algorithmic destruction”.

in Software,   Web Service, Posted by log1l_ks