A Machine Learning Approach To Prevent Malicious Calls Over Telephony Networks
Malicious calls, i.e., telephony spams and scams, have been a long-standing challenging issue that causes billions of dollars of annual financial loss worldwide. This work presents the first machine learning-based solution without relying on any particular assumptions on the underlying telephony network infrastructures. The main challenge of this decade-long problem is that it is unclear how to construct effective features without the access to the telephony networks' infrastructures. We solve this problem by combining several innovations. We first develop a TouchPal user interface on top of a mobile App to allow users tagging malicious calls. This allows us to maintain a large-scale call log database. We then conduct a measurement study over three months of call logs, including 9 billion records. We design 29 features based on the results, so that machine learning algorithms can be used to predict malicious calls. We extensively evaluate different state-of-the-art machine learning approaches using the proposed features, and the results show that the best approach can reduce up to 90 a precision over 99.99 models are efficient to implement without incurring a significant latency overhead. We also conduct ablation analysis, which reveals that using 10 out of the 29 features can reach a performance comparable to using all features.
READ FULL TEXT