Michael I. Jordan

University of California, Berkeley

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Martin J. Wainwright

University of California, Berkeley

118

Tianyi Lin

Columbia University

Peter L. Bartlett

University of California, Berkeley

Tamara Broderick

Western Caspian University

Chi Jin

Princeton University

Nhat Ho

The University of Texas at Austin

Benjamin Recht

University of California, Berkeley

Aaditya Ramdas

Carnegie Mellon University

Anastasios N. Angelopoulos

University of California, Berkeley

Ion Stoica

Berkeley College

Cooperative Institutions

University of California, Berkeley

371

University of Otago

204

University of Asia Pacific

203

British Council

203

University of Papua New Guinea

202

Berkeley College

164

Stanford University

121

Google (United States)

110

Massachusetts Institute of Technology

Carnegie Mellon University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Fast Algorithms for Computational Optimal Transport and Wasserstein Barycenter

arXiv (Cornell University) (2019)

Wenshuo Guo Nhat Ho Michael I. Jordan

We provide theoretical complexity analysis for new algorithms to compute the optimal transport (OT) distance between two discrete probability distributions, and demonstrate their favorable practical performance over state-of-art primal-dual algorithms and their capability in solving other problems in large-scale, such as the Wasserstein barycenter problem for multiple probability distributions. First, we introduce the \emph{accelerated primal-dual randomized coordinate descent} (APDRCD) algorithm for computing the OT distance. We provide its complexity upper bound $\bigOtil(\frac{n^{5/2}}{\varepsilon})$ where $n$ stands for the number of atoms of these probability measures and $\varepsilon > 0$ is the desired accuracy. This complexity bound matches the best known complexities of primal-dual algorithms for the OT problems, including the adaptive primal-dual accelerated gradient descent (APDAGD) and the adaptive primal-dual accelerated mirror descent (APDAMD) algorithms. Then, we demonstrate the better performance of the APDRCD algorithm over the APDAGD and APDAMD algorithms through extensive experimental studies, and further improve its practical performance by proposing a greedy version of it, which we refer to as \emph{accelerated primal-dual greedy coordinate descent} (APDGCD). Finally, we generalize the APDRCD and APDGCD algorithms to distributed algorithms for computing the Wasserstein barycenter for multiple probability distributions.

Coordinate Descent

Descent (aeronautics)

10.48550/arxiv.1905.09952

Cite

Citations (3)

Regression on manifolds using kernel dimension reduction

Jens Nilsson Fei Sha Michael I. Jordan

We study the problem of discovering a manifold that best preserves information relevant to a nonlinear regression. Solving this problem involves extending and uniting two threads of research. On the one hand, the literature on sufficient dimension reduction has focused on methods for finding the best linear subspace for nonlinear regression; we extend this to manifolds. On the other hand, the literature on manifold learning has focused on unsupervised dimensionality reduction; we extend this to the supervised setting. Our approach to solving the problem involves combining the machinery of kernel dimension reduction with Laplacian eigenmaps. Specifically, we optimize cross-covariance operators in kernel feature spaces that are induced by the normalized graph Laplacian. The result is a highly flexible method in which no strong assumptions are made on the regression function or on the distribution of the covariates. We illustrate our methodology on the analysis of global temperature data and image manifolds.

Kernel (algebra)

Intrinsic dimension

Manifold (fluid mechanics)

Manifold alignment

Sufficient dimension reduction

10.1145/1273496.1273584

Cite

Citations (104)

Accelerating Inexact HyperGradient Descent for Bilevel Optimization

arXiv (Cornell University) (2023)

Haikuo Yang Luo Luo Chris Junchi Li Michael I. Jordan

We present a method for solving general nonconvex-strongly-convex bilevel optimization problems. Our method -- the \emph{Restarted Accelerated HyperGradient Descent} (\texttt{RAHGD}) method -- finds an $\epsilon$-first-order stationary point of the objective with $\tilde{\mathcal{O}}(\kappa^{3.25}\epsilon^{-1.75})$ oracle complexity, where $\kappa$ is the condition number of the lower-level objective and $\epsilon$ is the desired accuracy. We also propose a perturbed variant of \texttt{RAHGD} for finding an $\big(\epsilon,\mathcal{O}(\kappa^{2.5}\sqrt{\epsilon}\,)\big)$-second-order stationary point within the same order of oracle complexity. Our results achieve the best-known theoretical guarantees for finding stationary points in bilevel optimization and also improve upon the existing upper complexity bound for finding second-order stationary points in nonconvex-strongly-concave minimax optimization problems, setting a new state-of-the-art benchmark. Empirical studies are conducted to validate the theoretical results in this paper.

Stationary point

Benchmark (surveying)

Bilevel optimization

Descent (aeronautics)

10.48550/arxiv.2307.00126

Cite

Citations (0)

An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit

arXiv (Cornell University) (2021)

Aldo Pacchiano Peter L. Bartlett Michael I. Jordan

We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We propose the first algorithm that achieves logarithmic regret for this problem when the collision reward is unknown. Our results are based on two innovations. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps, up to constant factors, in the absence of collisions. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players, while preserving meaningful instance-dependent logarithmic regret guarantees.

Leverage (statistics)

Constant (computer programming)

Multi-armed bandit

10.48550/arxiv.2111.04873

Cite

Citations (0)

No-Regret Learning in Partially-Informed Auctions

arXiv (Cornell University) (2022)

Wenshuo Guo Michael I. Jordan Ellen Vitercik

Auctions with partially-revealed information about items are broadly employed in real-world applications, but the underlying mechanisms have limited theoretical support. In this work, we study a machine learning formulation of these types of mechanisms, presenting algorithms that are no-regret from the buyer's perspective. Specifically, a buyer who wishes to maximize his utility interacts repeatedly with a platform over a series of $T$ rounds. In each round, a new item is drawn from an unknown distribution and the platform publishes a price together with incomplete, "masked" information about the item. The buyer then decides whether to purchase the item. We formalize this problem as an online learning task where the goal is to have low regret with respect to a myopic oracle that has perfect knowledge of the distribution over items and the seller's masking function. When the distribution over items is known to the buyer and the mask is a SimHash function mapping $\mathbb{R}^d$ to $\{0,1\}^{\ell}$, our algorithm has regret $\tilde O((Td\ell)^{1/2})$. In a fully agnostic setting when the mask is an arbitrary function mapping to a set of size $n$ and the prices are stochastic, our algorithm has regret $\tilde O((Tn)^{1/2})$.

10.48550/arxiv.2202.10606

Cite

Citations (0)

A Minimal Intervention Principle for Coordinated Movement

Neural Information Processing Systems (2002)

Emanuel Todorov Michael I. Jordan

Behavioral goals are achieved reliably and repeatedly with movements rarely reproducible in their detail. Here we offer an explanation: we show that not only are variability and goal achievement compatible, but indeed that allowing variability in redundant dimensions is the optimal control strategy in the face of uncertainty. The optimal feedback control laws for typical motor tasks obey a minimal intervention principle: deviations from the average trajectory are only corrected when they interfere with the task goals. The resulting behavior exhibits task-constrained variability, as well as synergetic coupling among actuators—which is another unexplained empirical phenomenon.

Phenomenon

Motor Control

Source

Cite

Citations (94)

Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems

arXiv (Cornell University) (2021)

Chris Junchi Li Michael I. Jordan

Motivated by the problem of online canonical correlation analysis, we propose the \emph{Stochastic Scaled-Gradient Descent} (SSGD) algorithm for minimizing the expectation of a stochastic function over a generic Riemannian manifold. SSGD generalizes the idea of projected stochastic gradient descent and allows the use of scaled stochastic gradients instead of stochastic gradients. In the special case of a spherical constraint, which arises in generalized eigenvector problems, we establish a nonasymptotic finite-sample bound of $\sqrt{1/T}$, and show that this rate is minimax optimal, up to a polylogarithmic factor of relevant parameters. On the asymptotic side, a novel trajectory-averaging argument allows us to achieve local asymptotic normality with a rate that matches that of Ruppert-Polyak-Juditsky averaging. We bring these ideas together in an application to online canonical correlation analysis, deriving, for the first time in the literature, an optimal one-time-scale algorithm with an explicit rate of local asymptotic convergence to normality. Numerical studies of canonical correlation analysis are also provided for synthetic data.

Stochastic Gradient Descent

10.48550/arxiv.2112.14738

Cite

Citations (0)

Large-Scale System Problems Detection by Mining Console Logs

Wei Xu Ling Huang Armando Fox David A. Patterson Michael I. Jordan

Surprisingly, console logs rarely help operators detect problems in large-scale datacenter services, for they often consist of the voluminous intermixing of messages from many software components written by independent developers. We propose a general methodology to mine this rich source of information to automatically detect system runtime problems. We first parse console logs by combining source code analysis with information retrieval to create composite features. We then analyze these features using machine learning to detect operational problems. We show that our method enables analyses that are impossible with previous methods because of its superior ability to create sophisticated features. We also show how to distill the results of our analysis to an operator-friendly one-page decision tree showing the critical messages associated with the detected problems. We validate our approach using the Darkstar online game server and the Hadoop File System, where we detect numerous real problems with high accuracy and few false positives. In the Hadoop case, we are able to analyze 24 million lines of console logs in 3 minutes. Our methodology works on textual console logs of any size and requires no changes to the service software, no human input, and no knowledge of the software’s internals.

Source

Cite

Citations (17)

Towards Optimal Statistical Watermarking

arXiv (Cornell University) (2023)

Baihe Huang Banghua Zhu Hanlin Zhu Jason D. Lee Jiantao Jiao

We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error. We characterize the Uniformly Most Powerful (UMP) watermark in the general hypothesis testing setting and the minimax Type II error in the model-agnostic setting. In the common scenario where the output is a sequence of $n$ tokens, we establish nearly matching upper and lower bounds on the number of i.i.d. tokens required to guarantee small Type I and Type II errors. Our rate of $\Theta(h^{-1} \log (1/h))$ with respect to the average entropy per token $h$ highlights potentials for improvement from the rate of $h^{-2}$ in the previous works. Moreover, we formulate the robust watermarking problem where the user is allowed to perform a class of perturbations on the generated texts, and characterize the optimal Type II error of robust UMP tests via a linear programming problem. To the best of our knowledge, this is the first systematic statistical treatment on the watermarking problem with near-optimal rates in the i.i.d. setting, which might be of interest for future works.

10.48550/arxiv.2312.07930

Cite

Citations (0)

The Handbook of Brain Theory and Neural Networks

Michael I. Jordan Yair Weiss

Source

Cite

Citations (55)