Shie Mannor

Technion – Israel Institute of Technology

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Constantine Caramanis

The University of Texas at Austin

Yonathan Efroni

Technion – Israel Institute of Technology

Gal Dalal

Nvidia (United Kingdom)

Tom Zahavy

Google (United Kingdom)

Aviv Tamar

Technion – Israel Institute of Technology

Chen Tessler

Nvidia (United Kingdom)

Daniel J. Mankowitz

DeepMind (United Kingdom)

Guy Tennenholtz

Google (United States)

Nadav Merlis

École Nationale de la Statistique et de l'Administration Économique

Timothy Mann

Wayne State University

Cooperative Institutions

Technion – Israel Institute of Technology

121

Google (United States)

McGill University

Massachusetts Institute of Technology

Nvidia (United Kingdom)

Tel Aviv University

Stanford University

Hebrew University of Jerusalem

Weizmann Institute of Science

Intel (United States)

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Arbitrarily modulated Markov decision processes

Jia Yuan Yu Shie Mannor

We consider decision-making problems in Markov decision processes where both the rewards and the transition probabilities vary in an arbitrary (e.g., nonstationary) fashion. We propose an online Q-learning style algorithm and give a guarantee on its performance evaluated in retrospect against alternative policies. Unlike previous works, the guarantee depends critically on the variability of the uncertainty in the transition probabilities, but holds regardless of arbitrary changes in rewards and transition probabilities over time. Besides its intrinsic computational efficiency, this approach requires neither prior knowledge nor estimation of the transition probabilities.

Decision theory

10.1109/cdc.2009.5400662

Cite

Citations (34)

Chance-Constrained Outage Scheduling using a Machine Learning Proxy

arXiv (Cornell University) (2018)

Gal Dalal Elad Gilboa Shie Mannor Louis Wehenkel

Outage scheduling aims at defining, over a horizon of several months to years, when different components needing maintenance should be taken out of operation. Its objective is to minimize operation-cost expectation while satisfying reliability-related constraints. We propose a distributed scenario-based chance-constrained optimization formulation for this problem. To tackle tractability issues arising in large networks, we use machine learning to build a proxy for predicting outcomes of power system operation processes in this context. On the IEEE-RTS79 and IEEE-RTS96 networks, our solution obtains cheaper and more reliable plans than other candidates.

Proxy (statistics)

Time horizon

10.48550/arxiv.1801.00500

Cite

Citations (0)

Multi Instance Learning For Unbalanced Data

arXiv (Cornell University) (2018)

Mark Kozdoba Edward Moroshko Lior Shani T. Takagi Takashi Katoh

In the context of Multi Instance Learning, we analyze the Single Instance (SI) learning objective. We show that when the data is unbalanced and the family of classifiers is sufficiently rich, the SI method is a useful learning algorithm. In particular, we show that larger data imbalance, a quality that is typically perceived as negative, in fact implies a better resilience of the algorithm to the statistical dependencies of the objects in bags. In addition, our results shed new light on some known issues with the SI method in the setting of linear classifiers, and we show that these issues are significantly less likely to occur in the setting of neural networks. We demonstrate our results on a synthetic dataset, and on the COCO dataset for the problem of patch classification with weak image level labels derived from captions.

Resilience

10.48550/arxiv.1812.07010

Cite

Citations (0)

Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching.

arXiv (Cornell University) (2019)

Tom Zahavy Shie Mannor

We study the neural-linear bandit model for solving sequential decision-making problems with high dimensional side information. Neural-linear bandits leverage the representation power of deep neural networks and combine it with efficient exploration mechanisms, designed for linear contextual bandits, on top of the last hidden layer. Since the representation is being optimized during learning, information regarding exploration with old features is lost. Here, we propose the first limited memory neural-linear bandit that is resilient to this phenomenon, which we term catastrophic forgetting. We evaluate our method on a variety of real-world data sets, including regression, classification, and sentiment analysis, and observe that our algorithm is resilient to catastrophic forgetting and achieves superior performance.

Leverage (statistics)

Representation

Deep Neural Networks

Source

Cite

Citations (13)

Multi-user lax communications: a multi-armed bandit approach

arXiv (Cornell University) (2015)

Orly Avner Shie Mannor

Inspired by cognitive radio networks, we consider a setting where multiple users share several channels modeled as a multi-user multi-armed bandit (MAB) problem. The characteristics of each channel are unknown and are different for each user. Each user can choose between the channels, but her success depends on the particular channel chosen as well as on the selections of other users: if two users select the same channel their messages collide and none of them manages to send any data. Our setting is fully distributed, so there is no central control. As in many communication systems, the users cannot set up a direct communication protocol, so information exchange must be limited to a minimum. We develop an algorithm for learning a stable configuration for the multi-user MAB problem. We further offer both convergence guarantees and experiments inspired by real communication networks, including comparison to state-of-the-art algorithms.

Multi-armed bandit

Multi-user

Control channel

10.48550/arxiv.1504.08167

Cite

Citations (0)

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

arXiv (Cornell University) (2019)

Lior Shani Yonathan Efroni Shie Mannor

Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be 'close' to one another, is iteratively solved. Nevertheless, TRPO has been considered a heuristic algorithm inspired by Conservative Policy Iteration (CPI). We show that the adaptive scaling mechanism used in TRPO is in fact the natural "RL version" of traditional trust-region methods from convex analysis. We first analyze TRPO in the planning setting, in which we have access to the model and the entire state space. Then, we consider sample-based TRPO and establish $\tilde O(1/\sqrt{N})$ convergence rate to the global optimum. Importantly, the adaptive scaling mechanism allows us to analyze TRPO in regularized MDPs for which we prove fast rates of $\tilde O(1/N)$, much like results in convex optimization. This is the first result in RL of better rates when regularizing the instantaneous cost or reward.

Trust region

10.48550/arxiv.1909.02769

Cite

Citations (0)

Stochastic decoding of low-density parity-check codes

Saeed Sharifi Tehrani Shie Mannor Warren J. Gross

Low-Density Parity-Check (LDPC) codes are one of the most powerful classes of error-control codes known to date. These codes have been considered for many recent digital communication applications. In this dissertation, we propose stochastic decoding of state-of-the-art LDPC codes and demonstrate it as a competitive approach to practical LDPC decoding algorithms.In stochastic decoding, probabilities are represented as streams of random bits using Bernoulli sequences in which the information is contained in the statistics of the bit stream. This representation results in low hardware-complexity processing nodes that perform computationally-intensive operations. However, stochastic decoding is prone to the acute problem of latching. This problem is caused by correlated bit streams within cycles in the code's factor graph, and significantly deteriorates the performance of stochastic LDPC decoders.We propose edge memories, tracking forecast memories, and majority-based tracking forecast memories to address the latching problem. These units efficiently extract the evolving statistics of stochastic bit streams and rerandomize them to disrupt latching. To the best of our knowledge, these methods are the first successful methods for stochastic decoding of state-of-the-art LDPC codes.We present novel decoder architectures and report on several hardware implementations. The most advanced reported implementation is a stochastic decoder that decodes the (2048,1723) LDPC code from the IEEE 802.3an standard. To the best of our knowledge, this decoder is the most silicon area-efficient and, with a maximum core throughput of 61.3 Gb/s, is one of the fastest fully parallel soft-decision LDPC decoders reported in the literature. We demonstrate the performance of this decoder in low bit-error-rate regimes.In addition to stochastic LDPC decoding, we propose the novel application of the stochastic approach for joint decoding of LDPC codes and partial-response channels that are considered in practical magnetic recording applications. Finally, we investigate the application of the stochastic approach for decoding linear block codes with high-density parity-check matrices on factor graphs. We consider Reed-Solomon, Bose-Chaudhuri-Hocquenghem, and block turbo codes. %%%%A ce jour, les codes Low-Density Parity-Check (LDPC) font partie des codes correcteurs d'erreurs les plus performants. Ces codes sont inclus dans differents standards de communications numeriques. Dans ce manuscrit, nous proposons d'utiliser le decodage stochastique pour les codes LDPC. D'autre part, nous demontrons que pour les codes LDPC, le decodage stochastique represente une alternative realiste aux algorithmes de decodage existants.Dans le processus de decodage stochastique, les probabilites sont representees sous forme de sequences de Bernoulli. L'information est contenue dans la statistique de ces flux binaires aleatoires. Cette representation particuliere permet d'executer des calculs intensifs avec une faible complexite materielle. Cependant le decodage stochastique…

Factor graph

Source

Cite

Citations (3)

Rotting Bandits

Neural Information Processing Systems (2017)

Nir Levine Koby Crammer Shie Mannor

The Multi-Armed Bandits (MAB) framework highlights the trade-off between acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation). In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward. The decision maker's objective is to maximize her cumulative expected reward over the time horizon. The MAB problem has been studied extensively, specifically under the assumption of the arms' rewards distributions being stationary, or quasi-stationary, over time. We consider a variant of the MAB framework, which we termed Rotting Bandits, where each arm's expected reward decays as a function of the number of times it has been pulled. We are motivated by many real-world scenarios such as online advertising, content recommendation, crowdsourcing, and more. We present algorithms, accompanied by simulations, and derive theoretical guarantees.

Crowdsourcing

Decision maker

Time horizon

Source

Cite

Citations (34)

Image Matters: Detecting Offensive and Non-Compliant Content / Logo in Product Images.

arXiv (Cornell University) (2019)

Shreyansh Gandhi Samrat Kokkula Abon Chaudhuri Alessandro Magnani Theban Stanley

Offensive

Logo (programming language)

Content (measure theory)

Source

Cite

Citations (4)

How to Allocate Resources For Features Acquisition

arXiv (Cornell University) (2016)

Oran Richman Shie Mannor

We study classification problems where features are corrupted by noise and where the magnitude of the noise in each feature is influenced by the resources allocated to its acquisition. This is the case, for example, when multiple sensors share a common resource (power, bandwidth, attention, etc.). We develop a method for computing the optimal resource allocation for a variety of scenarios and derive theoretical bounds concerning the benefit that may arise by non-uniform allocation. We further demonstrate the effectiveness of the developed method in simulations.

Feature (linguistics)

Resource constraints

Source

Cite

Citations (0)