logo
    Nonparametric Estimation of Repeated Densities with Heterogeneous Sample Sizes
    0
    Citation
    0
    Reference
    10
    Related Paper
    Abstract:
    We consider the estimation of densities in multiple subpopulations, where the available sample size in each subpopulation greatly varies. This problem occurs in epidemiology, for example, where different diseases may share similar pathogenic mechanism but differ in their prevalence. Without specifying a parametric form, our proposed method pools information from the population and estimate the density in each subpopulation in a data-driven fashion. Drawing from functional data analysis, low-dimensional approximating density families in the form of exponential families are constructed from the principal modes of variation in the log-densities. Subpopulation densities are subsequently fitted in the approximating families based on likelihood principles and shrinkage. The approximating families increase in their flexibility as the number of components increases and can approximate arbitrary infinite-dimensional densities. We also derive convergence results of the density estimates with discrete observations. The proposed methods are shown to be interpretable and efficient in simulation as well as applications to electronic medical record and rainfall data.
    Keywords:
    Density estimation
    Sample (material)
    Abstract The paper deals with approximation of nonlinear estimation within the Bayesian framework. The underlying idea is to project the true posterior density orthogonally onto a prespecified approximation family. The problem to be coped with is that the true posterior density is not typically at disposal when estimation is implemented recursively . It is shown that there exists a Bayes-closed description of the posterior density which is recursively computable without complete knowledge of the true posterior. We study a mutual relationship between the equivalence class, composed of densities matching the current description, and a prespecified parametric family. It is proved that if the approximation family is of the mixture type, the equivalence classes can be made orthogonal to this family. Then the approximating density done by the orthogonal projection of the true posterior density minimizes the Kullback-Leibler distance between both densities. On the contrary, if the approximation family is of the exponential type, the analogous result holds at most locally. To be able to give a sensible definition of the orthogonal projection, we have been forced to introduce a Riemannian geometry on the family of probability distributions. Being aware that the differential-geometric concepts and tools do not belong to common knowledge of control engineers, we include necessary preliminary information.
    Density estimation
    Citations (89)
    The potential source of complexity while analyzing the data is to choose on whether the data collected could be analyzed properly by the application of parametric tests or nonparametric tests. This concern cannot be underrated as there are certain assumptions which should be fulfilled before analyzing the data by applying either of the two types of tests. This article describes in detail the difference between parametric and nonparametric tests, when to apply which and the advantages of using one over the other.
    Parametric model
    Citations (16)
    ADVERTISEMENT RETURN TO ISSUEPREVArticleNEXTClassification of Vegetable Oils by Principal Component Analysis of FTIR SpectraDavid A. Rusak , Leah M. Brown , and Scott D. Martin View Author Information Department of Chemistry, University of Scranton, Scranton, PA 18510Cite this: J. Chem. Educ. 2003, 80, 5, 541Publication Date (Web):May 1, 2003Publication History Received3 August 2009Published online1 May 2003Published inissue 1 May 2003https://pubs.acs.org/doi/10.1021/ed080p541https://doi.org/10.1021/ed080p541research-articleACS PublicationsRequest reuse permissionsArticle Views2531Altmetric-Citations42LEARN ABOUT THESE METRICSArticle Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated. Share Add toView InAdd Full Text with ReferenceAdd Description ExportRISCitationCitation and abstractCitation and referencesMore Options Share onFacebookTwitterWechatLinked InRedditEmail Other access optionsGet e-Alertsclose SUBJECTS:Infrared light,Lipids,Mathematical methods,Plant derived food,Principal component analysis Get e-Alerts
    Chemometrics
    Plot (graphics)
    Citations (53)
    Kernel density estimation
    Density estimation
    Kernel (algebra)
    Regression function
    We wish to estimate the probability density $g(y)$ that produced an observed random sample of vectors $y_1, y_2, \dots, y_n$. Estimates of $g(y)$ are traditionally constructed in two quite different ways: by maximum likelihood fitting within some parametric family such as the normal or by nonparametric methods such as kernel density estimation. These two methods can be combined by putting an exponential family "through" a kernel estimator. These are the specially designed exponential families mentioned in the title. Poisson regression methods play a major role in calculations concerning such families.
    Kernel density estimation
    Density estimation
    Kernel (algebra)
    Kernel regression
    Citations (178)