Mining GitHub: Why Commit Stops -- Exploring the Relationship between Developer's Commit Pattern and File Version Evolution

2013 
Using the freeware in GitHub, we are often confused by a phenomenon: the new version of GitHub freeware usually was released in an indefinite frequency, and developers often committed nothing for a long time. This evolution phenomenon interferes with our own development plan and architecture design. Why do these updates happen at that time? Can we predict GitHub software version evolution by developers' activities? This paper aims to explore the developer commit patterns in GitHub, and try to mine the relationship between these patterns (if exists) and code evolution. First, we define four metrics to measure commit activity and code evolution: the changes in each commit, the time between two commits, the author of each changes, and the source code dependency. Then, we adopt visualization techniques to explore developers' commit activity and code evolution. Visual techniques are used to describe the progress of the given project and the authors' contributions. To analyze the commit logs in GitHub software repository automatically, Commits Analysis Tool (CAT) is designed and implemented. Finally, eight open source projects in GitHub are analyzed using CAT, and we find that: 1) the file changes in the previous versions may affect the file depend on it in the next version, 2) the average days around "huge commit" is 3 times of that around normal commit. Using these two patterns and developer's commit model, we can predict when his next commit comes and which file may be changed in that commit. Such information is valuable for project planning of both GitHub projects and other projects which use GitHub freeware to develop software.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    17
    Citations
    NaN
    KQI
    []