LeakSemantic: Identifying Abnormal Sensitive Network Transmissions in Mobile Applications
Mobile applications (apps) often transmit sensitive data through network with various intentions. Some transmissions are needed to fulfill the app's functionalities. However, transmissions with malicious receivers may lead to privacy leakage and tend to behave stealthily to evade detection. The problem is twofold: how does one unveil sensitive transmissions in mobile apps, and given a sensitive transmission, how does one determine if it is legitimate? In this paper, we propose LeakSemantic, a framework that can automatically locate abnormal sensitive network transmissions from mobile apps. LeakSemantic consists of a hybrid program analysis component and a machine learning component. Our program analysis component combines static analysis and dynamic analysis to precisely identify sensitive transmissions. Compared to existing taint analysis approaches, LeakSemantic achieves better accuracy with fewer false positives and is able to collect runtime data such as network traffic for each transmission. Based on features derived from the runtime data, machine learning classifiers are built to further differentiate between the legal and illegal disclosures. Experiments show that LeakSemantic achieves 91 on 2279 sensitive connections from 1404 apps.
READ FULL TEXT