FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric
Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar dependency parse trees between a pair of documents based on tree kernels. FastKASSIM is more robust to syntactic dissimilarities and differences in length, and runs up to to 5.2 times faster than our baseline method over the documents in the r/ChangeMyView corpus.
READ FULL TEXT