The Metaverse, as Meta CEO Mark Zuckerberg envisions, will be a fully immersive virtual experience that rivals the real thing, at least from the waist down. But the visuals are only part of the overall Metaverse experience.
“Achieving spatially correct audio is key to delivering a realistic sense of presence in the metaverse,” Zuckerberg wrote in a blog post on Friday. “Whether you’re attending a concert or just chatting with friends around a virtual table, having a realistic idea of where the sound is coming from makes you feel like you’re actually there.”
This concert, the blog notes, will sound very different if played in a full-size concert hall than in a college auditorium due to the differences between their physical and acoustic spaces. As such, Meta’s AI and Reality Lab (MAIR, formerly FAIR) is collaborating with researchers at UT Austin to develop a trio of open-source audio “comprehension tasks” that will help developers create more AR and VR experiences. immersive with more realistic sound.
The first is that of MAIR Visual acoustic match template, which can adapt an example audio clip to any given environment simply by using an image of space. Want to hear what the NY Philharmonic would look like inside San Francisco Room Boom Boom? Now you can. Previous simulation models were able to recreate the acoustics of a room based on its layout, but only if the precise geometry and the properties of the materials were already known – or from audio sampled in space, neither of which produced particularly accurate results.
MAIR’s solution is the Visual Acoustic Matching model, called AViTAR, which “learns acoustic matching from web videos in the wild, despite their lack of acoustically mismatched audio and unlabeled data,” according to the post.
“One future use case we’re interested in is reliving past memories,” Zuckerberg wrote, betting on nostalgia. “Imagine being able to put on a pair of AR glasses and see an object with the ability to play a memory associated with it, like picking up a tutu and seeing a hologram of your child’s ballet recital. Audio eliminates reverberation and makes the memory similar to when you experienced it, sitting in your exact place in the audience.
MAIR’s Visually informed dereverberation (VIDA), on the other hand, will remove the echo effect of playing an instrument in a large open space like a subway station or a cathedral. You will only hear the violin, not its reverberation bouncing off distant surfaces. Specifically, it “learns to suppress reverberation based on observed sounds and visual flow, which reveals clues about room geometry, materials, and speaker placement,” the post explains. This technology could be used to more effectively isolate voice and voice commands, making them easier for humans and machines to understand.
VisualVoice does the same as VIDA but for vocals. He uses both visual and audio cues to learn how to separate voices from background noise in his self-supervised training sessions. Meta predicts that this model will get a lot of work in machine understanding applications and to improve accessibility. Think, more accurate captions, Siri understanding your request even when the room isn’t completely silent, or the acoustics of a virtual chat room change as people talking move around the digital room. Again, just ignore the lack of legs.
“We envision a future where people can put on AR glasses and experience a holographic memory that looks and sounds exactly as they experienced it from their perspective, or feel immersed in not only the graphics but also the sounds when ‘they play games in a virtual world,’ Zuckerberg wrote, noting that AViTAR and VIDA can only apply their tasks to the single image they were trained to do and will need much more development before release. public. “These models bring us even closer to the multimodal immersive experiences we want to build in the future.”
#Metas #latest #auditory #AIs #promise #immersive #ARVR #experience