28 June 2019
The 2019 international ActivityNet challenge hosted a number of teams from, Monash University, MIT, University of Maryland, Facebook AI Research (FAIR) and Baidu VIS to compete in seven diverse tasks which aim to push the limits of semantic visual understanding of videos.
Three out of these seven tasks are based on the ActivityNet dataset, which focuses on machine learning to advance models for video understanding.
Dr Xiaojun Chang from the Faculty of Information Technology specialises in developing structured machine learning models for computer vision and multimedia tasks. He investigates how to explore the information contained in videos and develop the advanced artificial intelligence systems for video analysis.
As the group leader of a team of ten students from Carnegie Mellon University and colleagues from ByteDance AI lab, Dr Chang achieved second place for his Kinetics task in the 2019 ActivityNet dataset challenge.
The Kinetics task is intended to evaluate the ability of algorithms to recognize activities in trimmed video sequences. Each video contains a single activity and all the clips have a standard duration of ten seconds. For this task, Kinetics dataset, a large-scale benchmark for trimmed action classification, is used.
To solve this challenging problem, Dr Chang and his team implemented three different models. Specifically, the team used temporal segment networks to model long-term temporal information by evenly sampling a fixed number of clips from the entire video. A non-local neural network was then used to encode the long-term temporal information, followed by a temporal shift module which was incorporated to perform efficient temporal modelling by moving the feature map along the temporal dimension.
From this Kinetics challenge, Dr Chang and his team have learned various representative features for large-scale video classification tasks, in-depth temporal-spatial techniques as well as a valuable understanding of better fusion methods from different modules.
This challenge is the 4th annual of the International Challenge on Activity Recognition, which was first hosted during the 2016 conference on Computer Vision and Pattern Recognition. It focuses on the recognition of daily life, high-level, goal-oriented activities from user-generated videos.
Well done to Dr Xiaojun Chang, and his team on this fantastic result.