Novel Keyword Extraction and Language Detection Approaches
Fuzzy string matching and language classification are important tools in Natural Language Processing pipelines, this paper provides advances in both areas. We propose a fast novel approach to string tokenisation for fuzzy language matching and experimentally demonstrate an 83.6 processing time with an estimated improvement in recall of 3.1 a 2.6 are subdivided into multiple words, without needing to scan character-to-character. So far there has been little work considering using metadata to enhance language classification algorithms. We provide observational data and find the Accept-Language header is 14 match the classification than the IP Address.
READ FULL TEXT