LapTool-Net: A Contextual Detector of Surgical Tools in Laparoscopic Videos Based on Recurrent Convolutional Neural Networks

by   Babak Namazi, et al.

We propose a new multilabel classifier, called LapTool-Net to detect the presence of surgical tools in each frame of a laparoscopic video. The novelty of LapTool-Net is the exploitation of the correlation among the usage of different tools and, the tools and tasks - namely, the context of the tools' usage. Towards this goal, the pattern in the co-occurrence of the tools is utilized for designing a decision policy for a multilabel classifier based on a Recurrent Convolutional Neural Network (RCNN) architecture to simultaneously extract the spatio-temporal features. In contrast to the previous multilabel classification methods, the RCNN and the decision model are trained in an end-to-end manner using a multitask learning scheme. To overcome the high imbalance and avoid overfitting caused by the lack of variety in the training data, a high down-sampling rate is chosen based on the more frequent combinations. Furthermore, at the post-processing step, the prediction for all the frames of a video are corrected by designing a bi-directional RNN to model the long-term task's order. LapTool-net was trained using a publicly available dataset of laparoscopic cholecystectomy. The results show LapTool-Net outperforms existing methods significantly, even while using fewer training samples and a shallower architecture.


page 4

page 13


Monitoring tool usage in cataract surgery videos using boosted convolutional and recurrent neural networks

With an estimated 19 million operations performed annually, cataract sur...

SurgeonAssist-Net: Towards Context-Aware Head-Mounted Display-Based Augmented Reality for Surgical Guidance

We present SurgeonAssist-Net: a lightweight framework making action-and-...

End-to-End Video Captioning with Multitask Reinforcement Learning

Although end-to-end (E2E) learning has led to promising performance on a...

Learning Long-Term Style-Preserving Blind Video Temporal Consistency

When trying to independently apply image-trained algorithms to successiv...

BSUV-Net 2.0: Spatio-Temporal Data Augmentations for Video-AgnosticSupervised Background Subtraction

Background subtraction (BGS) is a fundamental video processing task whic...

Synthesising Rare Cataract Surgery Samples with Guided Diffusion Models

Cataract surgery is a frequently performed procedure that demands automa...

Please sign up or login with your details

Forgot password? Click here to reset