When you post photos of yourself or friends and family online, you may not imagine they could be used to develop facial-recognition systems that can identify individuals offline. A new site hopes to raise awareness of this issue by offering a rare window into how a fraction of our pictures are used.
Training a facial-recognition system to identify people requires a slew of photos of faces — photos that are often gathered from the internet. It’s usually impossible to figure out if images you’ve uploaded are among them.
Unveiled in January, lets you know whether photos you’ve posted to image-sharing site Flickr have been used to advance this controversial application of artificial intelligence by allowing you to search more than 3.6 million photos in six facial-recognition image datasets. It’s a small number in comparison to the millions of photos spread across countless facial datasets, but plenty of people will still be surprised to find their photos — and faces — included
“It’s easiest to understand when it becomes more personal,” said Adam Harvey, an artist and researcher, who created the site with fellow artist and programmer Jules LaPlace, and in collaboration with the non-profit Surveillance Technology Oversight Project (STOP). “Sometimes it helps to have visual proof.”
The tip of the iceberg
To use the site, you must type in your Flickr username, the URL of a specific Flickr photo, or a hashtag (such as “#wedding”) to find out whether your photos are included. If photos are found, Exposing.ai will show you a thumbnail of each, along with the month and year that they were posted to your Flickr account and the number of images that are in each dataset.
A search of this author’s Flickr username turned up nothing. However, a search for some common hashtags yielded tons of results, but for unknown people: “#wedding” returned more than 103,000 photos used in facial-recognition datasets, while searches for “#birthday” and “#party” yielded tens of thousands of included images, with children’s faces in many of the first results.
As Harvey is quick to point out, Exposing.ai examines just a smidgen of the facial data that’s in use, as many companies don’t publicly reveal how they obtained the data used to train their facial-recognition systems. “It’s the tip of the iceberg,” he said.
For years researchers and companies have turned to the internet to collect and annotate photos of all kinds of objects — including many, many faces — in hopes of making computers better able to make sense of the world around them. This frequently includes using images from Flickr that carry Creative Commons licenses — these are special kinds of copyright licenses that clearly state the terms under which such images and videos can be used and shared by others — as well as pulling images from Google Image search, snagging them from public Instagram accounts or other methods (some legitimate, some perhaps not).
Many of the resulting datasets are destined for academic work, such as training or testing a facial-recognition algorithm. But facial recognition is increasingly moving out of the labs and into the domain of big business, as companies such as Microsoft, Amazon, Facebook, and Google stake their futures on AI. Facial recognition software is becoming pervasive in its usage — by police, at airports, and even on smartphones and doorbells.
Amid a broader reckoning over the use of individuals’ online data, datasets for training facial-recognition software have become a flash point for privacy concerns and a future where surveillance may be more commonplace. Facial-recognition systems themselves are also increasingly scrutinized for concerns about their accuracy and underlying racial bias due, in part, to the data on which they were trained.