Researchers at Disney Research and ETH Zurich have developed a system that automatically detects the link between images and the sounds the could potentially make, which can be used in applications such as film effects and audio feedback to the hearing impaired.
EurekaAlert revealed the news on their website, explaining how the research team leveraged data from collections of videos in order to solve this challenging task. Disney Research is a network of research laboratories supporting The Walt Disney Company, and its purpose is to pursue scientific and technological innovation to advance the company’s broad media and entertainment efforts.
“Videos with audio tracks provide us with a natural way to learn correlations between sounds and images,” Jean-Charles Bazin, associate research scientist at Disney Research said.
“Video cameras equipped with microphones capture synchronised audio and visual information. In principle, every video frame is a possible training example”, the website writes.
However, when working with recognising sounds in videos, there is rarely just one sound that appears, and the problem is often detecting the sounds that are insignificant or not.
“Sounds associated with a video image can be highly ambiguous,” Markus Gross, vice president for Disney Research said. “By figuring out a way to filter out these extraneous sounds, our research team has taken a big step toward an array of new applications for computer vision”.
“If we have a video collection of cars, the videos that contain actual car engine sounds will have audio features that recur across multiple videos. On the other hand, the uncorrelated sounds that some videos might contain generally won’t share any redundant features with other videos, and thus can be filtered out”, Bazin said.
Once the video frames containing the uncorrelated sounds are filtered out, the computer algorithm will be able to learn which sound that is associated with an image. Through conducting subsequent tests it was revealed that when presented an image, the proposed system often was able to suggest a suitable sound, the website writes.
Through conducting a user study they learnt that the system consistently returned better results than one trained with the unfiltered original video collection. This means that through combining creativity and innovation, this research is a continuation of Disney’s rich legacy of inventing new ways to tell great stories and leveraging technology required to build the future of entertainment, the website writes.
This article was first published at: https://www.eurekalert.org/pub_releases/2016-11/dr-cgm111416.php