Moreover, the model is able to change already existing sound files.
A brand new experimental generative artificial intelligence model has been introduced by NVIDIA. The company refers to it as “a Swiss Army knife for sound.” This technology, which is known as Foundational Generative Audio Transformer Opus 1 or Fugatto, has the capability to receive orders from text prompts and utilize them to either create new audio or edit music, speech, and sound files that are already in existence. The fact that it was developed by a group of artificial intelligence researchers from all around the world, as stated by NVIDIA, allowed the model to have “multi-accent and multilingual capabilities that are stronger.”
“We wanted to create a model that understands and generates sound like humans do,” said Rafael Valle, one of the researchers behind the project and a manager of applied audio research at NVIDIA. Valle is also one of the researchers who made the project possible. As part of its introduction, the business provided a list of potential real-world circumstances in which Fugatto could be useful. It was suggested that music producers may utilize the technology to rapidly build a prototype for a song idea, which they could then simply tweak in order to experiment with a variety of musical styles, voices, and instruments.
There is the possibility that individuals could utilize it to develop content for language learning aids in the voice of their choosing. The producers of video games could also utilize it to create variations of pre-recorded assets in order to accommodate changes in the game that are depending on the choices and actions took by the players. Furthermore, the researchers discovered that the model is capable of doing activities that were not a part of its pre-training requirements, provided that it is fine-tuned. It was able to mix commands that it had been educated on independently, such as the ability to generate speech that sounds angry with a particular accent or the sound of birds singing during a rainstorm. It is also possible for the model to produce noises that change over time, such as the hammering of a downpour as it sweeps across the terrain.
Fugatto is not the first generative artificial intelligence technology that can generate sounds based on text prompts; however, NVIDIA has not disclosed whether or not it will make Fugatto available to the general public. An open-source artificial intelligence kit that can generate noises based on text descriptions was previously made available by Meta. People are able to use Google’s very own text-to-music artificial intelligence, which is known as MusicLM, through the website of the company’s AI Test Kitchen.