Multimodal and self-supervised representation learning for automatic gesture recognition in surgical robotics

10/31/2020
by   Aniruddha Tamhane, et al.
9

Self-supervised, multi-modal learning has been successful in holistic representation of complex scenarios. This can be useful to consolidate information from multiple modalities which have multiple, versatile uses. Its application in surgical robotics can lead to simultaneously developing a generalised machine understanding of the surgical process and reduce the dependency on quality, expert annotations which are generally difficult to obtain. We develop a self-supervised, multi-modal representation learning paradigm that learns representations for surgical gestures from video and kinematics. We use an encoder-decoder network configuration that encodes representations from surgical videos and decodes them to yield kinematics. We quantitatively demonstrate the efficacy of our learnt representations for gesture recognition (with accuracy between 69.6 learning across multiple tasks (with accuracy between 44.6 surgeon skill classification (with accuracy between 76.8 Further, we qualitatively demonstrate that our self-supervised representations cluster in semantically meaningful properties (surgeon skill and gestures).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2020

Self-Supervised MultiModal Versatile Networks

Videos are a rich source of multi-modal supervision. In this work, we le...
research
11/03/2020

Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery

Automatic surgical gesture recognition is fundamentally important to ena...
research
05/02/2018

Joint Surgical Gesture and Task Classification with Multi-Task and Multimodal Learning

We propose a novel multi-modal and multi-task architecture for simultane...
research
07/25/2019

Weakly Supervised Recognition of Surgical Gestures

Kinematic trajectories recorded from surgical robots contain information...
research
07/27/2023

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures

Recent advancements in surgical computer vision applications have been d...
research
08/16/2022

Matching Multiple Perspectives for Efficient Representation Learning

Representation learning approaches typically rely on images of objects c...
research
07/16/2022

Multi-Modal Unsupervised Pre-Training for Surgical Operating Room Workflow Analysis

Data-driven approaches to assist operating room (OR) workflow analysis d...

Please sign up or login with your details

Forgot password? Click here to reset