Building a Semantic Role Labelling System for Vietnamese

by   Thai Hoang Pham, et al.

Semantic role labelling (SRL) is a task in natural language processing which detects and classifies the semantic arguments associated with the predicates of a sentence. It is an important step towards understanding the meaning of a natural language. There exists SRL systems for well-studied languages like English, Chinese or Japanese but there is not any such system for the Vietnamese language. In this paper, we present the first SRL system for Vietnamese with encouraging accuracy. We first demonstrate that a simple application of SRL techniques developed for English could not give a good accuracy for Vietnamese. We then introduce a new algorithm for extracting candidate syntactic constituents, which is much more accurate than the common node-mapping algorithm usually used in the identification step. Finally, in the classification step, in addition to the common linguistic features, we propose novel and useful features for use in SRL. Our SRL system achieves an F_1 score of 73.53% on the Vietnamese PropBank corpus. This system, including software and corpus, is available as an open source project and we believe that it is a good baseline for the development of future Vietnamese SRL systems.


page 1

page 2

page 3

page 4


Vietnamese Semantic Role Labelling

In this paper, we study semantic role labelling (SRL), a subtask of sema...

MASALA: Modelling and Analysing the Semantics of Adpositions in Linguistic Annotation of Hindi

We present a completed, publicly available corpus of annotated semantic ...

The Preposition Project

Prepositions are an important vehicle for indicating semantic roles. The...

An enhanced computational feature selection method for medical synonym identification via bilingualism and multi-corpus training

Medical synonym identification has been an important part of medical nat...

NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System

We present new data and semantic parsing methods for the problem of mapp...

The Parallel Meaning Bank: A Framework for Semantically Annotating Multiple Languages

This paper gives a general description of the ideas behind the Parallel ...

Moroccan Dialect -Darija- Open Dataset

Darija Open Dataset (DODa) is an open-source project for the Moroccan di...

Please sign up or login with your details

Forgot password? Click here to reset