Workshop

The Second Perception Test Challenge

Viorica Patraucean

Project Page [ Contact: viorica@google.com ]

Abstract

Following the successful 2023 edition, we organise the second Perception Test Challenge to benchmark multimodal perception models on the Perception Test (blog, github) - a diagnostic benchmark created by Google DeepMind to comprehensively probe the abilities of multimodal models across:
* video, audio, and text modalities
* four skill areas: Memory, Abstraction, Physics, Semantics
* four types of reasoning: Descriptive, Explanatory, Predictive, Counterfactual
* six computational tasks: multiple-choice video-QA, grounded video-QA, object tracking, point tracking, action localisation, sound localisation

Chat is not available.