MMAudio generates synchronized audio given video and/or text inputs. Our key innovation is multimodal joint training which allows training on a wide range of audio-visual and audio-text datasets.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results