|A screenshot of illustrated portraits shared on IBM’s Diversity in Faces dataset website.|
Earlier this week, Flickr started taking heat across the web after it was specifically mentioned in a report from NBC News that took a deep dive into the ‘dirty little secret’ of using Creative Commons images to help train facial recognition algorithms.
The report mentioned multiple datasets used to help companies train machine learning algorithms to better comprehend diversity in facial recognition programs, but one dataset in particular was emphasized and elaborated on: IBM’s ‘Diversity in Faces‘ set that was derived and iterated upon from more than 100 million Creative Common images gathered by Yahoo and released for research purposes back in 2014.
Almost immediately, users around the web started raining down critical comments. Others, such as Flickr’s own Don MacAskill, chimed in as well to help clarify the situation.
The issue isn’t that Flickr is handing over your photos for free to corporations looking to train their artificial intelligence algorithms. It’s that users are sharing their photos under various Creative Commons licenses without fully comprehending what those licenses entail
After the dust settled from the initial publishing of the report and the subsequent commentary across social media, one thing became clear: the issue isn’t that Flickr is handing over your photos for free to corporations looking to train their artificial intelligence algorithms. It’s that users are sharing their photos under various Creative Commons licenses without fully comprehending what all those licenses entail, a concern Flickr specifically referenced just recently in their announcement to save all Creative Commons photos on its servers.
After all, IBM didn’t sneakily pull private photos off of Flickr to use and Flickr didn’t just hand over millions of protected photos, despite the overtone NBC News’ article might give off. The photos IBM used to build up its database were the same photos any one of us can find when searching for public, Creative Commons photos on Flickr.
Don MacAskill, SmugMug Chief Executive and head of Flickr, shared his take on the situation in a conversation with Olivia Solon, the author of the NBC News article, explaining that no ‘scraping’ of Flickr images was done, as the photos were opt-in Creative Commons licensed photos. Below was MacAskill’s first response, but the entire thread is worth the read.
Photos were not “scraped … from @Flickr“. IBM is very clear that their dataset was not “scraped” but originates from opt-in @CreativeCommons licensed photos supplied in the @Flickr public research dataset. Factually incorrect. Your article needs corrections. /cc @NBCNews
— Don MacAskill (@DonMacAskill) March 12, 2019
Ryan Merkley, CEO of Creative Commons, even chimed in on the conversation with an official response on Creative Common’s blog. In it, Merkley addresses the concerns of Flickr users and went so far as to contact IBM ‘to understand their use of the images, and to share the concerns of our community.’
In it, Merkley writes (emphasis ours):
While we do not have all the facts regarding the IBM dataset, we are aware that fair use allows all types of content to be used freely, and that all types of content are collected and used every day to train and develop AI. CC licenses were designed to address a specific constraint, which they do very well: unlocking restrictive copyright. But copyright is not a good tool to protect individual privacy, to address research ethics in AI development, or to regulate the use of surveillance tools employed online. Those issues rightly belong in the public policy space, and good solutions will consider both the law and the community norms of CC licenses and content shared online in general.
The overarching theme that stands out amongst this ongoing debate is that it’s not always clear to users, especially those who aren’t as engrained in the online world of photography, what Creative Commons licenses cover and fair use actually is. Flickr doesn’t shy away from explanations and links out at various stages throughout the upload process and in its FAQ, but even the Creative Commons website lacks clear definition — something it’s already addressing with new FAQ pages that it will continue to update.
“Copyright is not a good tool to protect individual privacy, to address research ethics in AI development, or to regulate the use of surveillance tools employed online.”
Ultimately, the current copyright system that’s intended to prevent other people profiting from creative works, wasn’t necessarily designed to protect your images from this type of use. Those images don’t end up in devices, nor is anyone directly profiting from your creations, so existing rules don’t necessarily offer any protection, whatever rights you assert. The cost of your camera or smartphone getting that bit smarter might just be that your photos are the ones being used to train it.