logo
    High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation
    8
    Citation
    25
    Reference
    20
    Related Paper
    Citation Trend
    Abstract:
    The ratio between two probability density functions is an important component of various tasks, including selection bias correction, novelty detection and classification. Recently, several estimators of this ratio have been proposed. Most of these methods fail if the sample space is high-dimensional, and hence require a dimension reduction step, the result of which can be a significant loss of information. Here we propose a simple-to-implement, fully nonparametric density ratio estimator that expands the ratio in terms of the eigenfunctions of a kernel-based operator; these functions reflect the underlying geometry of the data (e.g., submanifold structure), often leading to better estimates without an explicit dimension reduction step. We show how our general framework can be extended to address another important problem, the estimation of a likelihood function in situations where that function cannot be well-approximated by an analytical form. One is often faced with this situation when performing statistical inference with data from the sciences, due the complexity of the data and of the processes that generated those data. We emphasize applications where using existing likelihood-free methods of inference would be challenging due to the high dimensionality of the sample space, but where our spectral series method yields a reasonable estimate of the likelihood function. We provide theoretical guarantees and illustrate the effectiveness of our proposed method with numerical experiments.
    Keywords:
    Intrinsic dimension
    Density estimation
    Empirical likelihood
    Kernel (algebra)
    Statistical Inference
    In finite mixture models, apart from underlying mixing measure, true kernel density function of each subpopulation in the data is, in many scenarios, unknown. Perhaps the most popular approach is to choose some kernel functions that we empirically believe our data are generated from and use these kernels to fit our models. Nevertheless, as long as the chosen kernel and the true kernel are different, statistical inference of mixing measure under this setting will be highly unstable. To overcome this challenge, we propose flexible and efficient robust estimators of the mixing measure in these models, which are inspired by the idea of minimum Hellinger distance estimator, model selection criteria, and superefficiency phenomenon. We demonstrate that our estimators consistently recover the true number of components and achieve the optimal convergence rates of parameter estimation under both the well- and mis-specified kernel settings for any fixed bandwidth. These desirable asymptotic properties are illustrated via careful simulation studies with both synthetic and real data.
    Kernel density estimation
    Kernel (algebra)
    Statistical Inference
    Hellinger distance
    Citations (2)
    In this article, we discuss two specific classes of models - Gaussian Mixture Copula models and Mixture of Factor Analyzers - and the advantages of doing inference with gradient descent using automatic differentiation. Gaussian mixture models are a popular class of clustering methods, that offers a principled statistical approach to clustering. However, the underlying assumption, that every mixing component is normally distributed, can often be too rigid for several real life datasets. In order to to relax the assumption about the normality of mixing components, a new class of parametric mixture models that are based on Copula functions - Gaussian Mixuture Copula Models were introduced. Estimating the parameters of the proposed Gaussian Mixture Copula Model (GMCM) through maximum likelihood has been intractable due to the positive semi-positive-definite constraints on the variance-covariance matrices. Previous attempts were limited to maximizing a proxy-likelihood which can be maximized using EM algorithm. These existing methods, even though easier to implement, does not guarantee any convergence nor monotonic increase of the GMCM Likelihood. In this paper, we use automatic differentiation tools to maximize the exact likelihood of GMCM, at the same time avoiding any constraint equations or Lagrange multipliers. We show how our method leads a monotonic increase in likelihood and converges to a (local) optimum value of likelihood. In this paper, we also show how Automatic Differentiation can be used for inference with Mixture of Factor Analyzers and advantages of doing so. We also discuss how this method also has all the properties such as monotonic increase in likelihood and convergence to a local optimum. Note that our work is also applicable to special cases of these two models - for e.g. Simple Copula models, Factor Analyzer model, etc.
    Marginal likelihood
    Citations (0)
    Kernel estimation of a probability density function supported on the unit interval has proved difficult, because of the well known boundary bias issues a conventional kernel density estimator would necessarily face in this situation. Transforming the variable of interest into a variable whose density has unconstrained support, estimating that density, and obtaining an estimate of the density of the original variable through back-transformation, seems a natural idea to easily get rid of the boundary problems. In practice, however, a simple and efficient implementation of this methodology is far from immediate, and the few attempts found in the literature have been reported not to perform well. In this paper, the main reasons for this failure are identified and an easy way to correct them is suggested. It turns out that combining the transformation idea with local likelihood density estimation produces viable density estimators, mostly free from boundary issues. Their asymptotic properties are derived, and a practical cross-validation bandwidth selection rule is devised. Extensive simulations demonstrate the excellent performance of these estimators compared to their main competitors for a wide range of density shapes. In fact, they turn out to be the best choice overall. Finally, they are used to successfully estimate a density of non-standard shape supported on $[0,1]$ from a small-size real data sample.
    Kernel density estimation
    Density estimation
    Kernel (algebra)
    Citations (0)
    We propose a robust inferential procedure for assessing uncertainties of parameter estimation in high-dimensional linear models, where the dimension $p$ can grow exponentially fast with the sample size $n$. Our method combines the de-biasing technique with the composite quantile function to construct an estimator that is asymptotically normal. Hence it can be used to construct valid confidence intervals and conduct hypothesis tests. Our estimator is robust and does not require the existence of first or second moment of the noise distribution. It also preserves efficiency in the sense that the worst case efficiency loss is less than 30\% compared to the square-loss-based de-biased Lasso estimator. In many cases our estimator is close to or better than the latter, especially when the noise is heavy-tailed. Our de-biasing procedure does not require solving the $L_1$-penalized composite quantile regression. Instead, it allows for any first-stage estimator with desired convergence rate and empirical sparsity. The paper also provides new proof techniques for developing theoretical guarantees of inferential procedures with non-smooth loss functions. To establish the main results, we exploit the local curvature of the conditional expectation of composite quantile loss and apply empirical process theories to control the difference between empirical quantities and their conditional expectations. Our results are established under weaker assumptions compared to existing work on inference for high-dimensional quantile regression. Furthermore, we consider a high-dimensional simultaneous test for the regression parameters by applying the Gaussian approximation and multiplier bootstrap theories. We also study distributed learning and exploit the divide-and-conquer estimator to reduce computation complexity when the sample size is massive. Finally, we provide empirical results to verify the theory.
    Quantile regression
    Quantile
    Regression testing
    Citations (18)
    Several interesting generative learning algorithms involve a complex probability distribution over many random variables, involving intractable normalization constants or latent variable normalization. Some of them may even not have an analytic expression for the unnormalized probability function and no tractable approximation. This makes it difficult to estimate the quality of these models, once they have been trained, or to monitor their quality (e.g. for early stopping) while training. A previously proposed method is based on constructing a non-parametric density estimator of the model's probability function from samples generated by the model. We revisit this idea, propose a more efficient estimator, and prove that it provides a lower bound on the true test log-likelihood, and an unbiased estimator as the number of generated samples goes to infinity, although one that incorporates the effect of poor mixing. We further propose a biased variant of the estimator that can be used reliably with a finite number of samples for the purpose of model comparison.
    Bounding overwatch
    Generative model
    Citations (16)
    Although there are several model determination methods having some desirable properties, most of them require that sufficient information on the statistical nature of the model be available or that the error term be relatively simple such as iid normal. Certainly, this is an important limit which rules out many case of practical interests. In this paper we study model determination for the case when only some limited information on the model is given or when statistical nature of the error term is not simple. Our procedure is based on a likelihood function which is usually not available in the situation of our interest in this paper. Thus, we first discuss how to get a likelihood function under the situation of our interest. We derive a semi-parametric likelihood in the framework of a refined generalized method of moments (RGMM) studied in Kim (2001). The derived likelihood is a correct specification for the true likelihood in the second order. Based on the derived likelihood we study the model selection problem conditional on data. It is shown that the derived criterion is a consistent criterion. Also, the criterion is relatively simple to apply. A simple Monte Carlo study shows that its performance is reasonably good under nonstandard situations.
    Information Criteria
    Parametric model
    Likelihood principle
    Marginal likelihood
    Citations (0)
    Summary The estimation of a density profile from experimental data points is a challenging problem, which is usually tackled by plotting a histogram. Prior assumptions on the nature of the density, from its smoothness to the specification of its form, allow the design of more accurate estimation procedures, such as maximum likelihood. Our aim is to construct a procedure that makes no explicit assumptions, but still providing an accurate estimate of the density. We introduce the self-consistent estimate: the power spectrum of a candidate density is given, and an estimation procedure is constructed on the assumption, to be released a posteriori, that the candidate is correct. The self-consistent estimate is defined as a prior candidate density that precisely reproduces itself. Our main result is to derive the exact expression of the self-consistent estimate for any given data set, and to study its properties. Applications of the method require neither priors on the form of the density nor the subjective choice of parameters. A cut-off frequency, akin to a bin size or a kernel bandwidth, emerges naturally from the derivation. We apply the self-consistent estimate to artificial data generated from various distributions and show that it reaches the theoretical limit for the scaling of the square error with the size of the data set.
    Kernel density estimation
    Density estimation
    Smoothness
    Kernel (algebra)
    We consider, in the modern setting of high-dimensional statistics, the classic problem of optimizing the objective function in regression using M-estimates when the error distribution is assumed to be known. We propose an algorithm to compute this optimal objective function that takes into account the dimensionality of the problem. Although optimality is achieved under assumptions on the design matrix that will not always be satisfied, our analysis reveals generally interesting families of dimension-dependent objective functions.
    High dimensional
    Matrix (chemical analysis)
    Intrinsic dimension
    Design matrix
    Citations (87)