Adam: A Method for Stochastic Optimization

arXiv (Cornell University) (2014)

Diederik P. Kingma Jimmy Ba

51,836

Citation

Reference

Related Paper

Citation Trend

Abstract:

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

Topics:

Stochastic Gradient Optimization Techniques

Advanced Optimization Algorithms Research

Machine Learning and ELM

10.48550/arxiv.1412.6980

Cite

PDF

Time-Consistency of Optimization Problems

Proceedings of the AAAI Conference on Artificial Intelligence (2021)

Takayuki Osogami Tetsuro Morimura

We study time-consistency of optimization problems, where we say that an optimization problem is time-consistent if its optimal solution, or the optimal policy for choosing actions, does not depend on when the optimization problem is solved. Time-consistency is a minimal requirement on an optimization problem for the decisions made based on its solution to be rational. We show that the return that we can gain by taking "optimal" actions selected by solving a time-inconsistent optimization problem can be surely dominated by that we could gain by taking "suboptimal" actions. We establish sufficient conditions on the objective function and on the constraints for an optimization problem to be time-consistent. We also show when the sufficient conditions are necessary. Our results are relevant in stochastic settings particularly when the objective function is a risk measure other than expectation or when there is a constraint on a risk measure.

Time consistency

Dynamic risk measure

Constrained optimization

10.1609/aaai.v26i1.8391

Cite

Citations (6)

A Single-Timescale Method for Stochastic Bilevel Optimization

arXiv (Cornell University) (2021)

Tianyi Chen Yuejiao Sun Quan Xiao Wotao Yin

Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of stochastic optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $ε$-stationary point of the bilevel problem, STABLE requires ${\cal O}(ε^{-2})$ samples in total; and to achieve an $ε$-optimal solution in the strongly convex case, STABLE requires ${\cal O}(ε^{-1})$ samples. To the best of our knowledge, this is the first bilevel optimization algorithm achieving the same order of sample complexity as the stochastic gradient descent method for the single-level stochastic optimization.

Bilevel optimization

Minification

10.48550/arxiv.2102.04671

Cite

Citations (3)

A Single-Timescale Stochastic Bilevel Optimization Method

arXiv (Cornell University) (2021)

Tianyi Chen Yuejiao Sun Wotao Yin

Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of stochastic optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $\epsilon$-stationary point of the bilevel problem, STABLE requires ${\cal O}(\epsilon^{-2})$ samples in total; and to achieve an $\epsilon$-optimal solution in the strongly convex case, STABLE requires ${\cal O}(\epsilon^{-1})$ samples. To the best of our knowledge, this is the first bilevel optimization algorithm achieving the same order of sample complexity as the stochastic gradient descent method for the single-level stochastic optimization.

Bilevel optimization

Minification

Source

Cite

Citations (25)

Optimization instances for deterministic and stochastic problems on energy efficient investments planning at the building level

Data in Brief (2015)

Emilio L. Cano Javier M. Moguerza Antonio Alonso‐Ayuso

Optimization instances relate to the input and output data stemming from optimization problems in general. Typically, an optimization problem consists of an objective function to be optimized (either minimized or maximized) and a set of constraints. Thus, objective and constraints are jointly a set of equations in the optimization model. Such equations are a combination of decision variables and known parameters, which are usually related to a set domain. When this combination is a linear combination, we are facing a classical Linear Programming (LP) problem. An optimization instance is related to an optimization model. We refer to that model as the Symbolic Model Specification (SMS) containing all the sets, variables, and parameters symbols and relations. Thus, a whole instance is composed by the SMS, the elements in each set, the data values for all the parameters, and, eventually, the optimal decisions resulting from the optimization solution. This data article contains several optimization instances from a real-world optimization problem relating to investment planning on energy efficient technologies at the building level.

Robust Optimization

10.1016/j.dib.2015.10.021

Cite

Citations (2)

Stochastic optimization using a sparse grid collocation scheme

Probabilistic Engineering Mechanics (2008)

Sethuraman Sankaran

Collocation (remote sensing)

Sparse grid

10.1016/j.probengmech.2008.11.002

Cite

Citations (17)

Robust optimization with multiple ranges and chance constraints

Ruken Düzgün

We present a robust optimization approach with multiple ranges and chance constraints. The first part of the dissertation focuses on the case when the uncertainty in each objective coefficient is described using multiple ranges. This setting arises when the uncertain coefficients, such as cash flows, depend on an underlying random variable, such as the effectiveness of a new drug. Traditional one-range robust optimization would require wide ranges and lead to conservative results. In our approach, the decision-maker limits the numbers of coefficients that fall within each range and that deviate from the nominal value of their range. We show how to develop tractable reformulations to this mixed-integer problem and apply our approach to a RD in particular, it finds the optimal solution more often. We show the how to use multi-range robust optimization approach to have a robust project selection problem. While this approach can imitate the stochastic optimization’s scenario settings, our problem is significantly faster than stochastic optimization, since we do not have the burden of having many scenarios. We also develop a robust approach to price optimization in presence of other retailers. The last part of the dissertation connects robust optimization with chance constraints and shows that the Bernstein approximation of robust binary optimization problems leads to robust counterparts of the same structure as the deterministic models, but with modified objective coefficients that depend on a single new parameter introduced in the approximation.

Robust Optimization

Robustness

Source

Cite

Citations (6)

Stochastic Optimization Problems with Incomplete Information on Distribution Functions

SIAM Journal on Control and Optimization (1985)

Y. Ermoliev Alexei A. Gaivoronski C. Nedeva

The main purpose of this paper is to discuss numerical optimization procedures, based on duality theory, for stochastic extremal problems in which the distribution function is only partially known. We formulate such problems as minimax problems in which the "inner" problem involves optimization with respect to probability measures. The latter problem is solved using generalized linear programming techniques. Then we state the dual problem to the initial stochastic optimization problem. Numerical procedures that avoid the difficulties associated with solving the "inner" problem are proposed.

Duality (order theory)

10.1137/0323044

Cite

Citations (85)

STATE-SPACE SOLUTION TO STOCHASTIC H∞-OPTIMIZATION PROBLEM WITH UNCERTAINTY

IFAC Proceedings Volumes (2005)

Alexander P. Kurdyukov Eugene A. Maximov

Minification

Limiting

Robust Optimization

10.3182/20050703-6-cz-1902.01016

Cite

Citations (8)

Solution of the stochastic H ∞-optimization problem for discrete time linear systems under parametric uncertainty

Automation and Remote Control (2006)

Alexander P. Kurdyukov Eugene A. Maximov

Limiting

Discrete optimization

10.1134/s0005117906080078

Cite

Citations (16)

Survey on ant colony optimization algorithms for stochastic combinatorial optimization

Jisuanji yingyong yanjiu (2010)

LI Kai-qi

The optimization problem under uncertainty because of its closer to the real world environment,thus have become a growing reasearch area recently. This paper thoroughly reviewed ant colony optimization algorithms,and their applications to the class of stochastic combinatorial optimization problems under uncertainty conditions. Firstly,introduced the conceptual classification model for combinatorial problems under uncertainty conditions and a general definition for the stochastic combina-torial optimization problem. Then,pointed out the main difference between stochastic combinatorial optimization problem and deterministic combinatorial optimization problem,that was the computation of the objective function under uncertainty,and then summarized the current solutions for solvins this problem. Finally,proposed several possible research directions and the expectations of the development in this area.

Extremal optimization

Robust Optimization

Source

Cite

Citations (0)