Improving Optimization in Models With Continuous Symmetry Breaking.

2018 
Many loss functions in representation learning are invariant under a continuous symmetry transformation. As an example, consider word embeddings (Mikolov et al., 2013), where the loss remains unchanged if we simultaneously rotate all word and context embedding vectors. We show that representation learning models with a continuous symmetry and a quadratic Markovian time series prior possess so-called Goldstone modes. These are low cost deviations from the optimum which slow down convergence of gradient descent. We use tools from gauge theory in physics to design an optimization algorithm that solves the slow convergence problem. Our algorithm leads to a fast decay of Goldstone modes, to orders of magnitude faster convergence, and to more interpretable representations, as we show for dynamic extensions of matrix factorization and word embedding models. We present an example application, translating modern words into historic language using a shared representation space.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    6
    Citations
    NaN
    KQI
    []