Predicting Themes within Complex Unstructured Texts: A Case Study on Safeguarding Reports

by   Aleksandra Edwards, et al.

The task of text and sentence classification is associated with the need for large amounts of labelled training data. The acquisition of high volumes of labelled datasets can be expensive or unfeasible, especially for highly-specialised domains for which documents are hard to obtain. Research on the application of supervised classification based on small amounts of training data is limited. In this paper, we address the combination of state-of-the-art deep learning and classification methods and provide an insight into what combination of methods fit the needs of small, domain-specific, and terminologically-rich corpora. We focus on a real-world scenario related to a collection of safeguarding reports comprising learning experiences and reflections on tackling serious incidents involving children and vulnerable adults. The relatively small volume of available reports and their use of highly domain-specific terminology makes the application of automated approaches difficult. We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches. Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.


page 1

page 2

page 3

page 4


A Sentence-level Hierarchical BERT Model for Document Classification with Limited Labelled Data

Training deep learning models with limited labelled data is an attractiv...

Denoising Adversarial Autoencoders: Classifying Skin Lesions Using Limited Labelled Training Data

We propose a novel deep learning model for classifying medical images in...

Analysis of Railway Accidents' Narratives Using Deep Learning

Automatic understanding of domain specific texts in order to extract use...

SPaR.txt, a cheap Shallow Parsing approach for Regulatory texts

Automated Compliance Checking (ACC) systems aim to semantically parse bu...

Informed Machine Learning, Centrality, CNN, Relevant Document Detection, Repatriation of Indigenous Human Remains

Among the pressing issues facing Australian and other First Nations peop...

Using Deep Learning For Title-Based Semantic Subject Indexing To Reach Competitive Performance to Full-Text

For (semi-)automated subject indexing systems in digital libraries, it i...

Beyond Supervised Classification: Extreme Minimal Supervision with the Graph 1-Laplacian

We consider the task of classifying when an extremely reduced amount of ...

Please sign up or login with your details

Forgot password? Click here to reset