Generation of overlapping clusters constructing suitable graph for crime report analysis

2021 
Abstract Cybercrime is a kind of criminal activity generally committed by cybercriminals or hackers. Crime activities are growing explosively all over the world which motivates the law enforcement agencies for systematic analysis of crimes. In many cases, crime information is stored as online text reports in an unstructured way and one report describes several different criminal activities. Analysis of these crime reports for identifying patterns and trends in crime and devising solutions to crime detection and prevention strategies are very challenging tasks. In this paper, the crime reports are preprocessed and relations among named entity pairs are extracted to give the structured form to the reports. Each extracted relation is converted to an n -dimensional real-valued vector based on the concept of Word2Vec model of Natural Language Processing. Then a novel agglomerative graph partitioning algorithm using various graph centrality measures is applied to partition the extracted relations. All the extracted relations of a report which are in a single partition are replaced by the representative of that partition and thus each report is described by a set of distinct types of relations. Next, a graph for the set of reports is constructed in such a way that nodes are corresponding to the tuple of relations that describes the reports, and an edge between a pair of nodes is drawn only if the corresponding pair of relations are of a similar type of two different reports. The constructed graph is a disconnected graph with each connected component is a clique. These cliques are easily identified in linear time of the number of edges in the graph and each clique provides a cluster of reports. As each report is described by a set of relations of different types, so obtained clusters are overlapping clusters. The degree of membership of a report in a cluster is also identified in the paper. The proposed method is experimented, and compared with some state-of-the-art partition-based and overlapping clustering algorithms to demonstrate its effectiveness in the domain of crime corpora.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    62
    References
    0
    Citations
    NaN
    KQI
    []