FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

03/15/2022
by   Maximillian Chen, et al.
0

Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar dependency parse trees between a pair of documents based on tree kernels. FastKASSIM is more robust to syntactic dissimilarities and differences in length, and runs up to to 5.2 times faster than our baseline method over the documents in the r/ChangeMyView corpus.

READ FULL TEXT
research
02/25/2010

Syntactic Topic Models

The syntactic topic model (STM) is a Bayesian nonparametric model of lan...
research
06/02/2021

Self-Supervised Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference

We present a novel model for the problem of ranking a collection of docu...
research
08/20/2021

Supervised Contrastive Learning for Interpretable Long Document Comparison

Recent advancements in deep learning techniques have transformed the are...
research
01/20/2022

JEDI: These aren't the JSON documents you're looking for... (Extended Version*)

The JavaScript Object Notation (JSON) is a popular data format used in d...
research
04/27/2020

SFTM: Fast Comparison of Web Documents using Similarity-based Flexible Tree Matching

Tree matching techniques have been investigated in many fields, includin...
research
08/09/2015

An Automatic Machine Translation Evaluation Metric Based on Dependency Parsing Model

Most of the syntax-based metrics obtain the similarity by comparing the ...
research
08/06/2018

An Efficient Approach to Learning Chinese Judgment Document Similarity Based on Knowledge Summarization

A previous similar case in common law systems can be used as a reference...

Please sign up or login with your details

Forgot password? Click here to reset