Novel Keyword Extraction and Language Detection Approaches

09/24/2020
by   Malgorzata Pikies, et al.
0

Fuzzy string matching and language classification are important tools in Natural Language Processing pipelines, this paper provides advances in both areas. We propose a fast novel approach to string tokenisation for fuzzy language matching and experimentally demonstrate an 83.6 processing time with an estimated improvement in recall of 3.1 a 2.6 are subdivided into multiple words, without needing to scan character-to-character. So far there has been little work considering using metadata to enhance language classification algorithms. We provide observational data and find the Accept-Language header is 14 match the classification than the IP Address.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset