Adam: A Method for Stochastic Optimization
51,836
Citation
0
Reference
10
Related Paper
Citation Trend
Abstract:
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.We study time-consistency of optimization problems, where we say that an optimization problem is time-consistent if its optimal solution, or the optimal policy for choosing actions, does not depend on when the optimization problem is solved. Time-consistency is a minimal requirement on an optimization problem for the decisions made based on its solution to be rational. We show that the return that we can gain by taking "optimal" actions selected by solving a time-inconsistent optimization problem can be surely dominated by that we could gain by taking "suboptimal" actions. We establish sufficient conditions on the objective function and on the constraints for an optimization problem to be time-consistent. We also show when the sufficient conditions are necessary. Our results are relevant in stochastic settings particularly when the objective function is a risk measure other than expectation or when there is a constraint on a risk measure.
Time consistency
Dynamic risk measure
Constrained optimization
Cite
Citations (6)
Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of stochastic optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $ε$-stationary point of the bilevel problem, STABLE requires ${\cal O}(ε^{-2})$ samples in total; and to achieve an $ε$-optimal solution in the strongly convex case, STABLE requires ${\cal O}(ε^{-1})$ samples. To the best of our knowledge, this is the first bilevel optimization algorithm achieving the same order of sample complexity as the stochastic gradient descent method for the single-level stochastic optimization.
Bilevel optimization
Minification
Cite
Citations (3)
Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of stochastic optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $\epsilon$-stationary point of the bilevel problem, STABLE requires ${\cal O}(\epsilon^{-2})$ samples in total; and to achieve an $\epsilon$-optimal solution in the strongly convex case, STABLE requires ${\cal O}(\epsilon^{-1})$ samples. To the best of our knowledge, this is the first bilevel optimization algorithm achieving the same order of sample complexity as the stochastic gradient descent method for the single-level stochastic optimization.
Bilevel optimization
Minification
Cite
Citations (25)
Optimization instances relate to the input and output data stemming from optimization problems in general. Typically, an optimization problem consists of an objective function to be optimized (either minimized or maximized) and a set of constraints. Thus, objective and constraints are jointly a set of equations in the optimization model. Such equations are a combination of decision variables and known parameters, which are usually related to a set domain. When this combination is a linear combination, we are facing a classical Linear Programming (LP) problem. An optimization instance is related to an optimization model. We refer to that model as the Symbolic Model Specification (SMS) containing all the sets, variables, and parameters symbols and relations. Thus, a whole instance is composed by the SMS, the elements in each set, the data values for all the parameters, and, eventually, the optimal decisions resulting from the optimization solution. This data article contains several optimization instances from a real-world optimization problem relating to investment planning on energy efficient technologies at the building level.
Robust Optimization
Cite
Citations (2)
Collocation (remote sensing)
Sparse grid
Cite
Citations (17)
We present a robust optimization approach with multiple ranges and chance constraints. The first part of the dissertation focuses on the case when the uncertainty in each objective coefficient is described using multiple ranges. This setting arises when the uncertain coefficients, such as cash flows, depend on an underlying random variable, such as the effectiveness of a new drug. Traditional one-range robust optimization would require wide ranges and lead to conservative results. In our approach, the decision-maker limits the numbers of coefficients that fall within each range and that deviate from the nominal value of their range. We show how to develop tractable reformulations to this mixed-integer problem and apply our approach to a RD in particular, it finds the optimal solution more often. We show the how to use multi-range robust optimization approach to have a robust project selection problem. While this approach can imitate the stochastic optimization’s scenario settings, our problem is significantly faster than stochastic optimization, since we do not have the burden of having many scenarios. We also develop a robust approach to price optimization in presence of other retailers. The last part of the dissertation connects robust optimization with chance constraints and shows that the Bernstein approximation of robust binary optimization problems leads to robust counterparts of the same structure as the deterministic models, but with modified objective coefficients that depend on a single new parameter introduced in the approximation.
Robust Optimization
Robustness
Cite
Citations (6)
The main purpose of this paper is to discuss numerical optimization procedures, based on duality theory, for stochastic extremal problems in which the distribution function is only partially known. We formulate such problems as minimax problems in which the "inner" problem involves optimization with respect to probability measures. The latter problem is solved using generalized linear programming techniques. Then we state the dual problem to the initial stochastic optimization problem. Numerical procedures that avoid the difficulties associated with solving the "inner" problem are proposed.
Duality (order theory)
Cite
Citations (85)
Minification
Limiting
Robust Optimization
Cite
Citations (8)
Limiting
Discrete optimization
Cite
Citations (16)
The optimization problem under uncertainty because of its closer to the real world environment,thus have become a growing reasearch area recently. This paper thoroughly reviewed ant colony optimization algorithms,and their applications to the class of stochastic combinatorial optimization problems under uncertainty conditions. Firstly,introduced the conceptual classification model for combinatorial problems under uncertainty conditions and a general definition for the stochastic combina-torial optimization problem. Then,pointed out the main difference between stochastic combinatorial optimization problem and deterministic combinatorial optimization problem,that was the computation of the objective function under uncertainty,and then summarized the current solutions for solvins this problem. Finally,proposed several possible research directions and the expectations of the development in this area.
Extremal optimization
Robust Optimization
Cite
Citations (0)