logo
    Fast Maximum Likelihood Estimation via Equilibrium Expectation for Large Network Data
    26
    Citation
    87
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    Abstract A major line of contemporary research on complex networks is based on the development of statistical models that specify the local motifs associated with macro-structural properties observed in actual networks. This statistical approach becomes increasingly problematic as network size increases. In the context of current research on efficient estimation of models for large network data sets, we propose a fast algorithm for maximum likelihood estimation (MLE) that affords a significant increase in the size of networks amenable to direct empirical analysis. The algorithm we propose in this paper relies on properties of Markov chains at equilibrium, and for this reason it is called equilibrium expectation (EE). We demonstrate the performance of the EE algorithm in the context of exponential random graph models (ERGMs) a family of statistical models commonly used in empirical research based on network data observed at a single period in time. Thus far, the lack of efficient computational strategies has limited the empirical scope of ERGMs to relatively small networks with a few thousand nodes. The approach we propose allows a dramatic increase in the size of networks that may be analyzed using ERGMs. This is illustrated in an analysis of several biological networks and one social network with 104,103 nodes.
    Keywords:
    Network Analysis
    Macro
    Random graphs and networks are of great importance in any fields including mathematics, computer science, statistics, biology and sociology. This research aims to develop statistical theory and methods of statistical inference for random graphs in novel directions. A major strand of the research is the development of conditional goodness-of-fit tests for random graph models and for random block graph models. On the theoretical side, this entails proving a new conditional central limit theorem for a certain graph statistics, which are closely related to the number of two-stars and the number of triangles, and where the conditioning is on the number of edges in the graph. A second strand of the research is to develop composite likelihood methods for estimation of the parameters in exponential random graph models. Composite likelihood methods based on edge data have previously been widely used. A novel contribution of the thesis is the development of composite likelihood methods based on more complicated data structures. The goals of this PhD thesis also include testing the numerical performance of the novel methods in extensive simulation studies and through applications to real graphical data sets.
    Statistical Inference
    Graphical model
    Citations (0)
    Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likelihood and $M$-estimators of a wide range of canonical and curved exponential-family models of random graphs with local dependence. All results are nonasymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an application, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes.
    Statistical Inference
    Random geometric graph
    Citations (15)
    Three sets of data in the pajek program were taken as an example,with different data densities and sizes. The Markov random graph model and two exponential random graph models based on Snijders's new specification were used to make comparative analysis on the three sets of data.The results show that the decay problem occurs in the Markov random graph model and that the degradation becomes more serious with increasing size and density of the network.The results also indicate that the exponential random models have higher goodness of fit compared with the Markov random model,therefore,the Markov random model can be replaced by the improved exponential random model.
    Random geometric graph
    Goodness of fit
    Citations (0)
    A flexible approach to modeling network data is based on exponential-family random graph models. We consider here exponential-family random graph models with additional structure in the form of local dependence, which have important conceptual and statistical advantages over models without additional structure. An open problem is how to estimate such models from large random graphs. We pave the ground for massive-scale estimation of such models by exploiting model structure for the purpose of parallel computing. The main idea is that we can first decompose random graphs into subgraphs with local dependence and then perform parallel computing on subgraphs. We hence propose a two-step likelihood-based approach. The first step estimates the local structure underlying random graphs. The second step estimates parameters given the estimated local structure of random graphs. Both steps can be implemented in parallel, which enables massive-scale estimation. We demonstrate the advantages of the two-step likelihood-based approach by simulations and an application to a large Amazon product network.
    Citations (2)
    Patent Citation Analysis has been gaining considerable traction over the past few decades. In this paper, we collect extensive information on patents and citations and provide a perspective of citation network analysis of patents from a statistical viewpoint. We identify and analyze the most cited patents, the most innovative and the highly cited companies along with the structural properties of the network by providing in-depth descriptive analysis. Furthermore, we employ Exponential Random Graph Models (ERGMs) to analyze the citation networks. ERGMs enables understanding the social perspectives of a patent citation network which has not been studied earlier. We demonstrate that social properties such as homophily (the inclination to cite patents from the same country or in the same language) and transitivity (the inclination to cite references’ references) together with the technicalities of the patents ( e.g., language, categories), has a significant effect on citations. We also provide an in-depth analysis of citations for sectors in patents and how it is affected by the size of the same. Overall, our paper delves into European patents with the aim of providing new insights and serves as an account for fitting ERGMs on large networks and analyzing them. ERGMs help us model network mechanisms directly, instead of acting as a proxy for unspecified dependence and relationships among the observations.
    Network Analysis
    Homophily
    Social Network Analysis
    Citation analysis
    Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likelihood and $M$-estimators of a wide range of canonical and curved exponential-family models of random graphs with local dependence. All results are non-asymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an application, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes.
    Statistical Inference
    Random geometric graph
    Citations (0)
    We consider the challenging problem of statistical inference for exponential-family random graph models based on a single observation of a random graph with complex dependence. To facilitate statistical inference, we consider random graphs with additional structure in the form of block structure. We have shown elsewhere that when the block structure is known, it facilitates consistency results for $M$-estimators of canonical and curved exponential-family random graph models with complex dependence, such as transitivity. In practice, the block structure is known in some applications (e.g., multilevel networks), but is unknown in others. When the block structure is unknown, the first and foremost question is whether it can be recovered with high probability based on a single observation of a random graph with complex dependence. The main consistency results of the paper show that it is possible to do so under weak dependence and smoothness conditions. These results confirm that exponential-family random graph models with block structure constitute a promising direction of statistical network analysis.
    Stochastic block model
    Random geometric graph
    Statistical Inference
    Citations (14)
    We describe the R package hergm that implements hierarchical exponential-family random graph models with local dependence. Hierarchical exponential-family random graph models with local dependence tend to be superior to conventional exponential-family random graph models with global dependence in terms of goodness-of-fit. The advantage of hierarchical exponential-family random graph models is rooted in the local dependence induced by them. We discuss the notion of local dependence and the construction of models with local dependence along with model estimation, goodness-of-fit, and simulation. Simulation results and three applications are presented.
    Goodness of fit
    Hierarchical database model
    Citations (32)
    The exponential family of random graphs are among the most widely studied network models. We show that any exponential random graph model may alternatively be viewed as a lattice gas model with a finite Banach space norm. The system may then be treated using cluster expansion methods from statistical mechanics. In particular, we derive a convergent power series expansion for the limiting free energy in the case of small parameters. Since the free energy is the generating function for the expectations of other random variables, this characterizes the structure and behavior of the limiting network in this parameter region.
    Cluster expansion
    Limiting
    Lattice (music)
    Statistical Mechanics