In the latest move by a US company to deprive Chinese authorities access to tools that are integral to he creation of the government's vast security apparatus.
According to the FT, the database, known as MS Celeb, was first published in 2016. At the time, it was described by the company as the largest publicly available facial recognition data set in the world, containing more than 10 million images of nearly 100,000 individuals.
Those whose images were included in the database did not give their permission for the images to be used. Instead, their images were scraped off the web and off search engines.
Two other data sets have also been taken down since the FT exposed how they were being used by the Chinese. They include: the Duke MTMC surveillance data set built by Duke University researchers and a Stanford University data set called Brainwash.
Going by professional citations, Microsoft's MS Celeb has been used by a handful of corporations, including Sensetime and Megvii, two suppliers of the growing state security apparatus in China's Northwestern Xinjiang province.
Microsoft’s MS Celeb data set has been used by several commercial organisations, according to citations in AI papers, including IBM, Panasonic, Alibaba, Nvidia, Hitachi, Sensetime and Megvii. Both Sensetime and Megvii are Chinese suppliers of equipment to officials in Xinjiang, where minorities of mostly Uighurs and other Muslims are being tracked and held in internment camps.
Despite its name, "MS Celeb" includes thousands of photos of non-celebrities. For example, journalists Kim Zetter, Adrian Chen and Shoshana Zuboff, the author of Surveillance Capitalism, and Julie Brill, the former FTC commissioner responsible for protecting consumer privacy, were all included in the database. Many were unaware that they were included.
"I am in no sense a public person, there is no way in which I’ve ceded my right to privacy,” said Adam Greenfield, a technology writer and urbanist who was included in the data set."
"It’s indicative of Microsoft’s inability to hold their own researchers to integrity and probity that this was not torpedoed before it left the building," he said. "To me, it is indicative of a profound misunderstanding of what privacy is."
Microsoft said it wasn't aware of any violations of Europe's sweeping data privacy law - the GDPR - related to the database. But others argued that since it was no longer being used for strictly research purposes, there ould be violations.