Clone or Relative?: Understanding the Origins of Similar Android Apps

2016 
Since it is not hard to repackage an Android app, there are many cloned apps, which we call clones in this work. As previous studies have reported, clones are generated for bad purposes by malicious parties, e.g., adding malicious functions, injecting/replacing advertising modules, and piracy. Besides such clones, there are legitimate, similar apps, which we call "relatives" in this work. These relatives are not clones but are similar in nature; i.e., they are generated by the same app-building service or by the same developer using a same template. Given these observations, this paper aims to answer the following two research questions: (RQ1) How can we distinguish between clones and relatives? (RQ2) What is the breakdown of clones and relatives in the official and third-party marketplaces? To answer the first research question, we developed a scalable framework called APPraiser that systematically extracts similar apps and classifies them into clones and relatives. We note that our key algorithms, which leverage sparseness of the data, have the time complexity of O(n) in practice. To answer the second research question, we applied the APPraiser framework to the over 1.3 millions of apps collected from official and third-party marketplaces. Our analysis revealed the following findings: In the official marketplace, 79% of similar apps were attributed to relatives while, in the third-party marketplace, 50% of similar apps were attributed to clones. The majority of relatives are apps developed by prolific developers in both marketplaces. We also found that in the third-party market, of the clones that were originally published in the official market, 76% of them are malware.To the best of our knowledge, this is the first work that clarified the breakdown of "similar" Android apps, and quantified their origins using a huge dataset equivalent to the size of official market.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    9
    Citations
    NaN
    KQI
    []