An Exploratory Study of Interface Redundancy in Code Repositories

2016 
An important property of software repositories is their level of cross-project redundancy. For instance, much has been done to assess how much code cloning happens across software corpora. In this paper we study a much less targeted type of replication: Interface Redundancy (IR). IR refers to the level of repetition of whole method interfaces - return type, method name, and parameters types - across a code corpus. Such type of redundancy is important because if two non-trivial methods ever share the same interface it is very likely that they implement analogous functions, even though their code, structure, or vocabulary might be diverse. A certain level of IR is a requirement for approaches that rely on the recurrence of interfaces to fulfill a given task (e.g., interface-driven code search - IDCS). In this paper we report on an experiment to measure IR in a large-scale Java repository. Our target corpus contains more than 380,000 methods from 99 Java projects extracted randomly from an open source repository. Results are promising as they show that the chances of an interface from a non-trivial method to repeat itself across a large repository is around 25% (i.e., approximately 1/4 of such interfaces are redundant). Also, more than 80% of the target projects contained IR (with the average percentage of redundant interfaces for these projects being above 30%). As additional analyses we investigated the distribution of the different types of redundant interfaces (e.g., intra-vs inter-project), characterized the redundant interfaces and show that such a knowledge can help improve IDCS, and provided evidence that only a very small part of IR refers to method cloning (around 0.002%).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    4
    Citations
    NaN
    KQI
    []