Deep Learning for Technical Document Classification

06/27/2021
by   Shuo Jiang, et al.
0

In large technology companies, the requirements for managing and organizing technical documents created by engineers and managers in supporting relevant decision making have increased dramatically in recent years, which has led to a higher demand for more scalable, accurate, and automated document classification. Prior studies have primarily focused on processing text for classification and small-scale databases. This paper describes a novel multimodal deep learning architecture, called TechDoc, for technical document classification, which utilizes both natural language and descriptive images to train hierarchical classifiers. The architecture synthesizes convolutional neural networks and recurrent neural networks through an integrated training process. We applied the architecture to a large multimodal technical document database and trained the model for classifying documents based on the hierarchical International Patent Classification system. Our results show that the trained neural network presents a greater classification accuracy than those using a single modality and several earlier text classification methods. The trained model can potentially be scaled to millions of real-world technical documents with both text and figures, which is useful for data and knowledge management in large technology companies and organizations.

READ FULL TEXT
research
09/24/2017

HDLTex: Hierarchical Deep Learning for Text Classification

The continually increasing number of documents produced each year necess...
research
12/19/2019

A Framework for Explainable Text Classification in Legal Document Review

Companies regularly spend millions of dollars producing electronically-s...
research
07/15/2019

Multimodal deep networks for text and image-based document classification

Classification of document images is a critical step for archival of old...
research
01/06/2021

On-Device Document Classification using multimodal features

From small screenshots to large videos, documents take up a bulk of spac...
research
05/14/2016

Rationale-Augmented Convolutional Neural Networks for Text Classification

We present a new Convolutional Neural Network (CNN) model for text class...
research
09/25/2017

DOC: Deep Open Classification of Text Documents

Traditional supervised learning makes the closed-world assumption that t...
research
04/13/2005

Learning from Web: Review of Approaches

Knowledge discovery is defined as non-trivial extraction of implicit, pr...

Please sign up or login with your details

Forgot password? Click here to reset