SemRe-Rank: Incorporating Semantic Relatedness to Improve Automatic Term Extraction Using Personalized PageRank
Automatic Term Extraction deals with the extraction of terminology from a domain specific corpus, and has long been an established research area in data and knowledge acquisition. ATE remains a challenging task as it is known that no existing methods can consistently outperforms others in all domains. This work adopts a different strategy towards this problem as we propose to 'enhance' existing ATE methods instead of 'replace' them. We introduce SemRe-Rank, a generic method based on the concept of incorporating semantic relatedness - an often overlooked venue - into an existing ATE method to further improve its performance. SemRe-Rank applies a personalized PageRank process to a semantic relatedness graph of words to compute their 'semantic importance' scores, which are then used to revise the scores of term candidates computed by a base ATE algorithm. Extensively evaluated with 13 state-of-the-art ATE methods on four datasets of diverse nature, it is shown to have achieved widespread improvement over all methods and across all datasets. The best performing variants of SemRe-Rank have achieved, on some datasets, an improvement of 0.15 (on a scale of 0 1.0) in terms of the precision in the top ranked K term candidates, and an improvement of 0.28 in terms of overall F1.
READ FULL TEXT