We consider decomposition approaches for the solution of multistage stochastic programs that appear in financial applications. In particular, we discuss the performance of two algorithms that we test on the mean-variance portfolio optimization problem. The first algorithm is based on a regularized version of Benders decomposition, and we discuss its extension to the quadratic case. The second algorithm is an augmented lagrangian method. Our results indicate that the algorithm based on regularized Benders decomposition is more efficient, which is in line with similar studies performed in the linear setting.
Department of Computing Master of Engineering Global Optimisation Using Back-Tracking and Stochastic Methods by Karol Pysniak In this thesis we propose and analyse new back-tracking and stochastic methods applied to global optimization algorithms. The back-tracking methods in global optimization involve using the information gathered by the points that have already been visited in the domain. In this work we introduce an information model based on Gaussian Processes to Multiple Start Search algorithm and add the integral term to Stochastic Gradient Descent algorithm to steer its trajectory away from the explored regions. Stochastic methods in global optimization involve introducing random processes that maintain converging property of algorithms and in this thesis we propose adding Stochastic Term to the candidate point ranking in Hyperbolic Cross Points and introduce new cooling functions for Stochastic Gradient Descent algorithm and prove that those functions do not alter the mathematical properties of the original algorithm. The performance of the proposed methods is evaluated on three test functions: Pinter function, Michalewicz function and Restrigin function. Those functions are chosen, because they are extremely challenging for global optimization algorithms to find a solution, but can be easily solved by a human. All of the methods are also compared against each other by tracking their performance when applied to Protein Folding Problem based on AB off-lattice model.
We consider the problem of minimizing a strongly convex function that depends on an uncertain parameter $\theta$. The uncertainty in the objective function means that the optimum, $x^*(\theta)$, is also a function of $\theta$. We propose an efficient method to compute $x^*(\theta)$ and its statistics. We use a chaos expansion of $x^*(\theta)$ along a truncated basis and study first-order methods that compute the optimal coefficients. We establish the convergence rate of the method as the number of basis functions, and hence the dimensionality of the optimization problem is increased. We give the first non-asymptotic rates for the gradient descent and the accelerated gradient descent methods. Our analysis exploits convexity and does not rely on a diminishing step-size strategy. As a result, it is much faster than the state-of-the-art both in theory and in our preliminary numerical experiments. A surprising side-effect of our analysis is that the proposed method also acts as a variance reduction technique to the problem of estimating $x^*(\theta)$.
Composite optimization models consist of the minimization of the sum of a smooth (not necessarily convex) function and a nonsmooth convex function. Such models arise in many applications where, in addition to the composite nature of the objective function, a hierarchy of models is readily available. It is common to take advantage of this hierarchy of models by first solving a low fidelity model and then using the solution as a starting point to a high fidelity model. We adopt an optimization point of view and show how to take advantage of the availability of a hierarchy of models in a consistent manner. We do not use the low fidelity model just for the computation of promising starting points but also for the computation of search directions. We establish the convergence and convergence rate of the proposed algorithm. Our numerical experiments on large scale image restoration problems and the transition path problem suggest that, for certain classes of problems, the proposed algorithm is significantly faster than the state of the art.
The intuitive connection to robustness and convincing empirical evidence have made the flatness of the loss surface an attractive measure of generalizability for neural networks. Yet it suffers from various problems such as computational difficulties, reparametrization issues, and a growing concern that it may only be an epiphenomenon of optimization methods. We provide empirical evidence that under the cross-entropy loss once a neural network reaches a non-trivial training error, the flatness correlates (via Pearson Correlation Coefficient) well to the classification margins, which allows us to better reason about the concerns surrounding flatness. Our results lead to the practical recommendation that when assessing generalizability one should consider a margin-based measure instead, as it is computationally more efficient, provides further insight, and is highly correlated to flatness. We also use our insight to replace the misleading folklore that small-batch methods generalize better because they are able to escape sharp minima. Instead, we argue that large-batch methods did not have enough time to maximize margins and hence generalize worse.
The operation of pump systems in water distribution systems (WDS) is commonly the most expensive task for utilities with up to 70% of the operating cost of a pump system attributed to electricity consumption. Optimisation of pump scheduling could save 10-20% by improving efficiency or shifting consumption to periods with low tariffs. Due to the complexity of the optimal control problem, heuristic methods which cannot guarantee optimality are often applied. To facilitate the use of mathematical optimisation this paper investigates formulations of WDS components. We show that linear approximations outperform non-linear approximations, while maintaining comparable levels of accuracy.