Jitendra K. Harlalka

North Carolina State University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Nagiza F. Samatova

North Carolina State University

Stephen Ranshous

Pacific Northwest National Laboratory

Kanchana Padmanabhan

North Carolina State University

Gonzalo A. Bello

North Carolina State University

Steve Harenberg

SAS Institute (United States)

Ramona G. Seay

North Carolina State University

Alok Choudhary

University of Petroleum and Energy Studies

L. Gjeltema

North Carolina State University

Rada Chirkova

North Carolina State University

Fredrick H. M. Semazzi

North Carolina State University

Cooperative Institutions

North Carolina State University

Oak Ridge National Laboratory

North Central State College

Pacific Northwest National Laboratory

Northwestern University

Maharshi Dayanand University

Twin Cities Orthopedics

University of Minnesota

Battelle

South Carolina State University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Community detection in large‐scale networks: a survey and empirical evaluation

Wiley Interdisciplinary Reviews Computational Statistics (2014)

Steve Harenberg Gonzalo A. Bello L. Gjeltema Stephen Ranshous Jitendra K. Harlalka

Community detection is a common problem in graph data analytics that consists of finding groups of densely connected nodes with few connections to nodes outside of the group. In particular, identifying communities in large‐scale networks is an important task in many scientific domains. In this review, we evaluated eight state‐of‐the‐art and five traditional algorithms for overlapping and disjoint community detection on large‐scale real‐world networks with known ground‐truth communities. These 13 algorithms were empirically compared using goodness metrics that measure the structural properties of the identified communities, as well as performance metrics that evaluate these communities against the ground‐truth. Our results show that these two types of metrics are not equivalent. That is, an algorithm may perform well in terms of goodness metrics, but poorly in terms of performance metrics, or vice versa. WIREs Comput Stat 2014, 6:426–439. doi: 10.1002/wics.1319 This article is categorized under: Algorithms and Computational Methods > Algorithms Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Data: Types and Structure > Graph and Network Data

Disjoint sets

Clustering coefficient

Ground truth

10.1002/wics.1319

Cite

Citations (222)

Memory-efficient Query-driven Community Detection with Application to Complex Disease Associations

Steve Harenberg Ramona G. Seay Stephen Ranshous Kanchana Padmanabhan Jitendra K. Harlalka

Community detection in real-world graphs presents a number of challenges. First, even if the number of detected communities grows linearly with the graph size, it becomes impossible to manually inspect each community for value added to the application knowledge base. Mining for communities with query nodes as knowledge priors could allow for filtering out irrelevant information and for enriching end-users knowledge associated with the problem of interest, such as discovery of genes functionally associated with the Alzheimer's (AD) biomarker genes.Second, the data-intensive nature of community enumeration challenges current approaches that often assume that the input graph and the detected communities fit in memory. As computer systems scale, DRAM memory sizes are not expected to increase linearly, while technologies such as SSD memories have the potential to provide much higher capacities at a lower power-cost point, and have a much lower latency than disks. Out-of-core algorithms and/or database-inspired indexing could provide an opportunity for different design optimizations for query-driven community detection algorithms tuned for emerging architectures.Therefore, this work addresses the need for query-driven and memory-efficient community detection. Using maximal cliques as the community definition, due to their high signal-to-noise ratio, we propose and systematically compare two contrasting methods: indexed-based and out-of-core. Both methods improve peak memory efficiency as much as 1000X compared to the state-of-the-art. However, the index-based method, which also has a 10-to-100-fold run time reduction, outperforms the out-of-core algorithm in most cases. The achieved scalability enables the discovery of diseases that are known to be or likely associated with Alzheimer's when the genome-scale network is mined with AD biomarker genes as knowledge priors.

Dram

10.1137/1.9781611973440.115

Cite

Citations (3)

Response-Guided Community Detection: Application to Climate Index Discovery

Lecture notes in computer science (2015)

Gonzalo A. Bello Michael Angus Navya Pedemane Jitendra K. Harlalka Fredrick H. M. Semazzi

Predictability

Representation

Climate science

10.1007/978-3-319-23525-7_45

Cite

Citations (14)