Artificial intelligence system learns concepts shared across video, audio, and text
320Report
MIT researchers developed a machine-learning technique that learns to represent data in a way that captures concepts shared between visual and audio modalities. Their model can identify where certain action is taking place in a video and label it.