logo
    PAC-learnability of probabilistic deterministic finite state automata in terms of variation distance
    0
    Citation
    0
    Reference
    20
    Related Paper
    Keywords:
    Variation (astronomy)
    Learnability
    Divergence (linguistics)
    Total variation
    We study graphical modeling in the case of string-valued random variables. Whereas a weighted finite-state transducer can model the probabilistic relationship between two strings, we are interested in building up joint models of three or more strings. This is needed for inflectional paradigms in morphology, cognate modeling or language reconstruction, and multiple-string alignment. We propose a Markov Random Field in which each factor (potential function) is a weighted finite-state machine, typically a transducer that evaluates the relationship between just two of the strings. The full joint distribution is then a product of these factors. Though decoding is actually undecidable in general, we can still do efficient joint inference using approximate belief propagation; the necessary computations and messages are all finite-state. We demonstrate the methods by jointly predicting morphological forms.
    Graphical model
    Factor graph
    Undecidable problem
    Belief Propagation
    Markov random field
    Probabilistic automaton
    Citations (46)
    Model based probabilistic classification is heavily used in data mining and machine learning. For computational learning these models may need approximation steps however. One popular approximation in classification is to model the class conditional densities by factorization, which in the independence case is usually called the ’Naive Bayes’ classifier. In general probabilistic independence cannot model all distributions exactly, and not much has been published on how much a discrete distribution can differ from the independence assumption. In this dissertation the approximation quality of factorizations is analyzed in two articles.A specific class of factorizations is the factorizations represented by graphical models. Several challenges arise from the use of statistical methods for learning graphical models from data. Examples of problems include the increase in the number of graphical model structures as a function of the number of nodes, and the equivalence of statistical models determined by different graphical models. In one article an algorithm for learning graphical models is presented. In the final article an algorithm for clustering parts of DNA strings is developed, and a graphical representation for the remaining DNA part is learned.
    Graphical model
    Conditional independence
    Probabilistic classification
    Independence
    Citations (1)
    We leverage proof techniques from discrete Fourier analysis and an existing result in coding theory to derive new bounds for the problem of non-interactive simulation of binary random variables. Previous bounds in the literature were derived by applying data processing inequalities concerning maximal correlation or hypercontractivity. We show that our bounds are sharp in some regimes. Indeed, for a specific instance of the problem parameters, our main result resolves an open problem posed by E. Mossel in 2017. As by-products of our analyses, various new properties of the average distance and distance enumerator of binary block codes are established.
    Leverage (statistics)
    Binary data
    Information Theory
    Coding theory
    Citations (10)
    This paper addresses the problem of pattern classification in the symbolic dynamic domain, where the patterns of interest are represented by probabilistic finite state automata (PFSA) with possibly dissimilar algebraic structures. A combination of Dirichlet and multinomial distributions is used to model the uncertainties due to the finite length approximation of symbol strings. The classifier algorithm follows the structure of a Bayes model and has been validated on a simulation test bed.
    Probabilistic automaton
    Multinomial distribution
    Domain Adaptation
    Citations (1)