IBM used millions of online photos without people’s permission to train its facial recognition systems, NBC reported.
Sources told NBC that the photographers had asked them before taking their pictures, but did not specify that the images would be used to train algorithms.
The photos were not originally collected by IBM, but former Flicker owner Yahoo had put together the collection for research purposes. They were a part of the YFCC100M, a dataset that consisted of 99.2 million photos and 0.8 million videos from Flickr. The images were protected under a Creative Commons license.
According to The Verge, the license did not permit the use of the images for facial recognition programmes to profile them by ethnicity. It is difficult for people to know whether their photos were used for the research as IBM keeps the dataset private.
The company had announced in June 2018 that it will be creating a facial analysis dataset to help remove bias. It said that the software only worked accurately on white people’s faces, and was working to create diversified data sets.