However, even its inventors acknowledge that it may be abused.
For the purpose of creating a lifelike talking face out of a person in real time, Microsoft Research Asia has introduced a new experimental artificial intelligence tool known as VASA-1. This tool is capable of taking a still image of a person or a drawing of a person and combining it with an existing audio clip. It has the capability to generate face emotions and head motions for a still image that already exists, as well as the proper lip movements to match a song or a speech. There are a great number of instances that the researchers have posted to the project page, and the results appear to be of such high quality that they could trick anyone into believing that they are real.
It is nevertheless obvious that the technology might be abused to simply and quickly make deepfake movies of actual people, despite the fact that the lip and head motions in the examples could still appear to be a little bit robotic and out of sync upon closer observation. This possibility is something that the researchers themselves are aware of, and as a result, they have made the decision to refrain from releasing “an online demo, API, product, additional implementation details, or any related offerings” until they are certain that their technology “will be used responsibly and in accordance with proper regulations.” However, they did not disclose whether or not they intend to incorporate specific protections in order to prevent malicious actors from utilising them for evil ends, such as the production of deepfake porn or disinformation campaigns.
In spite of the fact that their technology has the potential to be abused, the researchers feel that it has a great deal of advantages. They stated that it has the potential to be utilised in order to increase educational equity and accessibility for individuals who struggle with communication. This may be accomplished by providing individuals with disability access to an avatar that is capable of communicating on their behalf. They stated that it is also capable of providing companionship and therapeutic support to individuals who require it, implying that the VASA-1 might be utilised in programmes that allow access to artificial intelligence characters that individuals can communicate with.
According to the paper that was published alongside the announcement, the VoxCeleb2 Dataset was used to train VASA-1. This dataset is comprised of “over 1 million utterances for 6,112 celebrities” that were retrieved from videos that were uploaded to YouTube. In spite of the fact that the tool was trained on real faces, it is also capable of working on creative photographs such as the Mona Lisa. The researchers decided to merge an audio file of Anne Hathaway’s viral interpretation of Lil Wayne’s Paparazzi in order to create a humorous combination. The fact that it is so lovely makes it worthwhile to see, even if you are sceptical about the capabilities of a technology such as this.