Privacy-Preserving Categorization of Mobile Applications Based on Large-scale Usage Data

2019 
Abstract Categorization of mobile applications (apps) according to their functionalities is essential for app stores in maintaining a huge quantity of apps efficiently and securely. The problem in existing methods is that the apps are uploaded from untrusted sources and the static features extracted for categorization can be easily masked by obfuscation or encryption. To solve this problem and improve the categorization accuracy, we propose to extract features from usage data generated by apps running on mobile devices. Usage data, such as average running time or number of active users of an app, is hard to be manipulated by untrusted developers, while different types of apps generate different usage patterns. Based on this observation, we propose a new privacy-preserving categorization method of mobile apps based on learning patterns from a large scale of usage data. Firstly, the usage data collected from different users is anonymized by shuffling. Then we formalize the usage data as time series, extract and cluster usage data for each app based on Dynamic Time Warping. We utilize the Shape Features to segment the clustered time series and transform them into feature vectors. Finally, we adopt five machine learning methods to train and test the categorization models on 3,086 apps. The results show that SVM performs the best. When we exclude apps with the small number of the usage data flows under 50,000, the categorization performance (F1-score) of our method is improved to be over 96%, which is significantly better than the previous methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    41
    References
    2
    Citations
    NaN
    KQI
    []