Google has made a library of thousands of AI-manipulated videos publicly accessible, hoping that researchers will use it to develop tools for detecting deceitful content.
‘Deepfakes’ are photos, videos and audio clips created or manipulated by deep-learning models. There is a growing number of open-source deepfake generation methods available online, meaning that little technical expertise is required to generate them.
Deepfakes frequently combine different sources to create composite images of videos, such as by transposing a celebrity’s face onto the body of a pornographic actor. In May 2019, a manipulated intended to humiliate US House of Representatives Speaker Nancy Pelosi was shared by President Donald Trump and many major news outlets. While this video is not considered a true deepfake on account of having been created through a simple manual procedure (true ‘deepfakes’ tend to look authentic to the extent that they are difficult to identify as fake), the incident raised serious questions about how deepfakes could be used for nefarious purposes. For instance, deepfakes could be deployed to manipulate voters in the run-up to the 2020 US presidential election.
Now, Google has announced the release of a large dataset of deepfake videos. Google staff created the dataset by working with 28 “paid and consenting actors” to record hundreds of videos of the actors speaking, making facial expressions and performing common tasks. The researchers then used open-source deepfake generation models to create approximately 3,000 deepfakes based on these videos.
A Google blog post explained: “While many [deepfakes] are likely intended to be humorous, others could be harmful to individuals and society. Google considers these issues seriously.”
The dataset – which is free for use by the research community – has been incorporated into the FaceForensics benchmark effort, which is run by the Technical University of Munich and the University Federico II of Naples and supported by Google. Google hopes that researchers will use this library of real and deepfake videos to train automated deepfake detection tools.
The library of deepfakes will be updated as the technology evolves.
“We firmly believe in supporting a thriving research community around mitigating potential harms from misuses of synthetic media and today’s release of our deepfake dataset in the FaceForensics benchmark is an important step in that direction,” Google wrote.
While the rapid rise of deep-learning models for video and audio manipulation has been acknowledged as a serious cause for concern on account of their application in creating disinformation, deepfake tools also have socially beneficial applications, such as in generating training data for medical imaging or converting text into realistic-sounding synthesised speech.