Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering

05/27/2021
by   Sateesh Kumar, et al.
4

We present a novel approach for unsupervised activity segmentation, which uses video frame clustering as a pretext task and simultaneously performs representation learning and online clustering. This is in contrast with prior works where representation learning and clustering are often performed sequentially. We leverage temporal information in videos by employing temporal optimal transport and temporal coherence loss. In particular, we incorporate a temporal regularization term into the standard optimal transport module, which preserves the temporal order of the activity, yielding the temporal optimal transport module for computing pseudo-label cluster assignments. Next, the temporal coherence loss encourages neighboring video frames to be mapped to nearby points while distant video frames are mapped to farther away points in the embedding space. The combination of these two components results in effective representations for unsupervised activity segmentation. Furthermore, previous methods require storing learned features for the entire dataset before clustering them in an offline manner, whereas our approach processes one mini-batch at a time in an online manner. Extensive evaluations on three public datasets, i.e. 50-Salads, YouTube Instructions, and Breakfast, and our dataset, i.e., Desktop Assembly, show that our approach performs on par or better than previous methods for unsupervised activity segmentation, despite having significantly less memory constraints.

READ FULL TEXT

page 7

page 8

research
05/31/2023

Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment

This paper presents a novel transformer-based framework for unsupervised...
research
03/31/2021

Learning by Aligning Videos in Time

We present a self-supervised approach for learning video representations...
research
10/20/2019

Differentiable Deep Clustering with Cluster Size Constraints

Clustering is a fundamental unsupervised learning approach. Many cluster...
research
06/30/2022

Timestamp-Supervised Action Segmentation with Graph Convolutional Networks

We introduce a novel approach for temporal activity segmentation with ti...
research
01/29/2020

Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences

Understanding the structure of complex activities in videos is one of th...
research
04/08/2021

Few-Shot Action Recognition with Compromised Metric via Optimal Transport

Although vital to computer vision systems, few-shot action recognition i...
research
02/05/2022

Unsupervised Learning on 3D Point Clouds by Clustering and Contrasting

Learning from unlabeled or partially labeled data to alleviate human lab...

Please sign up or login with your details

Forgot password? Click here to reset