Paper Name | Author(s) |
---|---|
Modelling Sparsity, Heterogeneity, Reciprocity And Community Structure In Temporal Interaction Data
The authors propose a novel class of network models for temporal dyadic interaction data. |
Xenia Miscouridou, Francois Caron, Yee Whye Teh |
The Lingering Of Gradients: How To Reuse Gradients Over Time
Classically, the time complexity of a first-order method is estimated by its number of gradient computations.In this paper, the authors study a more refined complexity by taking into account the ``lingering' of gradients: once a gradient is computed at $x_k$, the additional time to compute gradients at $x_{k+1},x_{k+2},dots$ may be reduced.The authors show how this improves the running time of gradient descent and SVRG. |
Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang |
Quadratic Decomposable Submodular Function Minimization
The authors introduce a new convex optimization problem, termed quadratic decomposable submodular function minimization. |
Pan Li, Niao He, Olgica Milenkovic |
On The Global Convergence Of Gradient Descent For Over-parameterized Models Using Optimal Transport
Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure.For these problems, the authors study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. |
Lénaïc Chizat, Francis Bach |
Leveraging The Exact Likelihood Of Deep Latent Variable Models
Deep latent variable models (DLVMs) combine the approximation abilities of deep neural networks and the statistical foundations of generative models.The purpose of this work is to study the general properties of this quantity and to show how they can be leveraged in practice. |
Pierre-Alexandre Mattei, Jes Frellsen |
DVAE#: Discrete Variational Autoencoders With Relaxed Boltzmann Priors
Boltzmann machines are powerful distributions that have been shown to be an effective prior over binary latent variables in variational autoencoders (VAEs).The authors propose two approaches for relaxing Boltzmann machines to continuous distributions that permit training with importance-weighted bounds. |
Arash Vahdat, Evgeny Andriyash, William Macready |
Amortized Inference Regularization
The variational autoencoder (VAE) is a popular model for density estimation and representation learning. |
Rui Shu, Hung Bui, Shengjia Zhao, Mykel J Kochenderfer, Stefano Ermon |
GumBolt: Extending Gumbel Trick To Boltzmann Priors
Boltzmann machines (BMs) are appealing candidates for powerful priors in variational autoencoders (VAEs), as they are capable of capturing nontrivial and multi-modal distributions over discrete variables.Here, the authors propose the GumBolt, a model that extends the Gumbel trick to BM priors in VAEs. |
Amir H Khoshaman, Mohammad Amin |
Constrained Generation Of Semantically Valid Graphs Via Regularizing Variational Autoencoders
Deep generative models have achieved remarkable success in various data domains, including images, time series, and natural languages.In this work, the authors propose a regularization framework for variational autoencoders as a step toward semantic validity. |
Tengfei Ma, Jie Chen, Cao Xiao |
Invertibility Of Convolutional Generative Networks From Partial Measurements
In this work, the authors present new theoretical results on convolutional generative neural networks, in particular their invertibility (i.e., the recovery of input latent code given the network output). |
Fangchang Ma, Ulas Ayaz, Sertac Karaman |
Glow: Generative Flow With Invertible 1x1 Convolutions
Flow-based generative models are conceptually attractive due to tractability of the exact log-likelihood, tractability of exact latent-variable inference, and parallelizability of both training and synthesis.In this paper the authors propose Glow, a simple type of generative flow using invertible 1x1 convolution. |
Durk Kingma, Prafulla Dhariwal |
Multimodal Generative Models For Scalable Weakly-Supervised Learning
Multiple modalities often co-occur when describing natural phenomena.Learning a joint representation of these modalities should yield deeper and more useful representations.Previous generative approaches to multi-modal input either do not learn a joint distribution or require additional computation to handle missing data. |
Mike Wu, Noah Goodman |
IntroVAE: Introspective Variational Autoencoders For Photographic Image Synthesis
The authors present a novel introspective variational autoencoder (IntroVAE) model for synthesizing high-resolution photographic images. |
Huaibo Huang, Zhihang Li, Ran He, Zhenan Sun, Tieniu Tan |
Towards Text Generation With Adversarially Learned Neural Outlines
Recent progress in deep generative models has been fueled by two paradigms -- autoregressive and adversarial models.The authors propose a combination of both approaches with the goal of learning generative models of text. |
Sandeep Subramanian, Sai Rajeswar Mudumba, Alessandro Sordoni, Adam Trischler, Aaron Courville, Chris Pal |
Unsupervised Image-to-Image Translation Using Domain-Specific Variational Information Bound
In this work, the authors propose an unsupervised image-to-image translation framework which maximizes a domain-specific variational information bound and learns the target domain-invariant representation of the two domain. |
Hadi Kazemi, Sobhan Soleymani, Fariborz Taherkhani, Seyed Iranmanesh, Nasser Nasrabadi |
Adversarial Scene Editing: Automatic Object Removal From Weak Supervision
While great progress has been made recently in automatic image manipulation, it has been limited to object centric images like faces or structured scene datasets.In this work, the authors take a step towards general scene-level image editing by developing an automatic interaction-free object removal model. |
Rakshith R Shetty, Mario Fritz, Bernt Schiele |
Generalizing Point Embeddings Using The Wasserstein Space Of Elliptical Distributions
Embedding complex objects as vectors in low dimensional spaces is a longstanding problem in machine learning.The authors propose in this work an extension of that approach, which consists in embedding objects as elliptical probability distributions, namely distributions whose densities have elliptical level sets. |
Boris Muzellec, Marco Cuturi |
Banach Wasserstein GAN
Wasserstein Generative Adversarial Networks (WGANs) can be used to generate realistic samples from complicated image distributions. |
Jonas Adler, Sebastian Lunz |
A Convex Duality Framework For GANs | Farzan Farnia, David Tse |
On The Convergence And Robustness Of Training GANs With Regularized Optimal Transport
Generative Adversarial Networks (GANs) are one of the most practical methods for learning data distributions.Consequently, the authors establish theoretical convergence guarantee to stationarity for a proposed class of GAN optimization algorithms. |
Maziar Sanjabi, Jimmy Ba, Meisam Razaviyayn, Jason Lee |
On Gradient Regularizers For MMD GANs
The authors propose a principled method for gradient-based regularization of the critic of GAN-like models trained by adversarially optimizing the kernel of a Maximum Mean Discrepancy (MMD). |
Michael Arbel, Dougal Sutherland, Mikołaj Bińkowski, Arthur Gretton |
PacGAN: The Power Of Two Samples In Generative Adversarial Networks
Generative adversarial networks (GANs) are a technique for learning generative models of complex data distributions from samples.The authors study a principled approach to handling mode collapse, which the authors call packing. |
Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh |
Are GANs Created Equal? A Large-Scale Study
Generative adversarial networks (GAN) are a powerful subclass of generative models.The authors conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. |
Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier Bousquet |
Disconnected Manifold Learning For Generative Adversarial Networks
Natural images may lie on a union of disjoint manifolds rather than one globally connected manifold, and this can cause several difficulties for the training of common Generative Adversarial Networks (GANs).Next, the authors show how using a collection of generators can address this problem, providing new insights into the success of such multi-generator GANs. |
Mahyar Khayatkhoei, Maneesh K. Singh, Ahmed Elgammal |
Hessian-based Analysis Of Large Batch Training And Robustness To Adversaries
Large batch size training of Neural Networks has been shown to incur accuracyloss when trained with the current methods.Here, the authors study large batch sizetraining through the lens of the Hessian operator and robust optimization. |
Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, Michael W Mahoney |
Fast And Effective Robustness Certification
The authors present a new method and system, called DeepZ, for certifying neural networkrobustness based on abstract interpretation. |
Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, Martin Vechev |
Graphical Generative Adversarial Networks
The authors propose Graphical Generative Adversarial Networks (Graphical-GAN) to model structured data. |
Chongxuan LI, Max Welling, Jun Zhu, Bo Zhang |
Deep Defense: Training DNNs With Improved Adversarial Robustness
Despite the efficacy on a variety of computer vision tasks, deep neural networks (DNNs) are vulnerable to adversarial attacks, limiting their applications in security-critical systems.To address this problem, the authors propose a training recipe named "deep defense". |
Ziang Yan, Yiwen Guo, Changshui Zhang |
Learning To Repair Software Vulnerabilities With Generative Adversarial Networks | Jacob Harer, Onur Ozdemir, Tomo Lazovich, Christopher Reale, Rebecca Russell, Louis Kim, Peter Chin |
Memory Replay GANs: Learning To Generate New Categories Without Forgetting
Previous works on sequential learning address the problem of forgetting in discriminative models.Addressing this problem, the authors propose Memory Replay GANs (MeRGANs), a conditional GAN framework that integrates a memory replay generator. |
Chenshen Wu, Luis Herranz, Xialei Liu, Yaxing Wang, Joost Van De Weijer, Bogdan Raducanu |
Unsupervised Attention-guided Image-to-Image Translation
Current unsupervised image-to-image translation techniques struggle to focus their attention on individual objects without altering the background or the way multiple objects interact within a scene. |
Youssef Alami Mejjati, Christian Richardt, James Tompkin, Darren Cosker, Kwang In Kim |
Conditional Adversarial Domain Adaptation
Adversarial learning has been embedded into deep networks to learn disentangled and transferable representations for domain adaptation. |
Mingsheng Long, Zhangjie CAO, Jianmin Wang, Michael I. Jordan |
Video-to-Video Synthesis
The authors study the problem of video-to-video synthesis. |
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Nikolai Yakovenko, Andrew Tao, Jan Kautz, Bryan Catanzaro |
Generalized Zero-Shot Learning With Deep Calibration Network | Shichen Liu, Mingsheng Long, Jianmin Wang, Michael I. Jordan |
Low-shot Learning Via Covariance-Preserving Adversarial Augmentation Networks
Deep neural networks suffer from over-fitting and catastrophic forgetting when trained with small data.In this work, the authors propose Covariance-Preserving Adversarial Augmentation Networks to overcome existing limits of low-shot learning. |
Hang Gao, Zheng Shou, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang |
Trading Robust Representations For Sample Complexity Through Self-supervised Visual Experience
Learning in small sample regimes is among the most remarkable features of the human perceptual system.The authors explore the idea of allowing artificial systems to learn representations of visual stimuli through weak supervision prior to downstream supervised tasks. |
Andrea Tacchetti, Stephen Voinea, Georgios Evangelopoulos |
TADAM: Task Dependent Adaptive Metric For Improved Few-shot Learning
Few-shot learning has become essential for producing models that generalize from few examples.The authors further propose a simple and effective way of conditioning a learner on the task sample set, resulting in learning a task-dependent metric space. |
Boris Oreshkin, Pau Rodríguez López, Alexandre Lacoste |
FishNet: A Versatile Backbone For Image, Region, And Pixel Level Prediction
The basic principles in designing convolutional neural network (CNN) structures for predicting objects on different levels, e.g., image-level, region-level, and pixel-level, are diverging. |
Shuyang Sun, Jiangmiao Pang, Jianping Shi, Shuai Yi, Wanli Ouyang |
Batch-Instance Normalization For Adaptively Style-Invariant Neural Networks
Real-world image recognition is often challenged by the variability of visual styles including object textures, lighting conditions, filter effects, etc.Extending this idea to general visual recognition problems, the authors present Batch-Instance Normalization (BIN) to explicitly normalize unnecessary styles from images. |
Hyeonseob Nam, Hyo-Eun Kim |
A^2-Nets: Double Attention Networks
Learning to capture long-range relations is fundamental to image/video recognition.In this work, the authors propose the “double attention block”, a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access features from the entire space efficiently. |
Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng |
Pelee: A Real-Time Object Detection System On Mobile Devices | Robert Wang, Tanner Bohn, Charles Ling |
PointCNN: Convolution On X-Transformed Points
The authors present a simple and general framework for feature learning from point cloud. |
Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, Baoquan Chen |
Deep Neural Networks With Box Convolutions
Box filters computed using integral images have been part of the computer vision toolset for a long time. |
Egor Burkov, Victor Lempitsky |
An Intriguing Failing Of Convolutional Neural Networks And The CoordConv Solution
Few ideas have enjoyed as large an impact on deep learning as convolution.For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate. |
Rosanne Liu, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank Frank, Alex Sergeev, Jason Yosinski |
3D Steerable CNNs: Learning Rotationally Equivariant Features In Volumetric Data
The authors present a convolutional network that is equivariant to rigid body motions. |
Maurice Weiler, Wouter Boomsma, Mario Geiger, Max Welling, Taco Cohen |
Moonshine: Distilling With Cheap Convolutions
Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit.The authors propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. |
Elliot J. Crowley, Gavin Gray, Amos Storkey |
Kalman Normalization: Normalizing Internal Representations Across Network Layers
As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. |
Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang, Liang Lin |
SplineNets: Continuous Neural Decision Graphs
The authors present SplineNets, a practical and novel approach for using conditioning in convolutional neural networks (CNNs). |
Cem Keskin, Shahram Izadi |
CapProNet: Deep Feature Learning Via Orthogonal Projections Onto Capsule Subspaces
In this paper, the authors formalize the idea behind capsule nets of using a capsule vector rather than a neuron activation to predict the label of samples.To this end, the authors propose to learn a group of capsule subspaces onto which an input feature vector is projected. |
Liheng Zhang, Marzieh Edraki, Guo-Jun Qi |
Which Neural Net Architectures Give Rise To Exploding And Vanishing Gradients?
The authors give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations. |
Boris Hanin |
Exact Natural Gradient In Deep Linear Networks And Its Application To The Nonlinear Case
Stochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limitations arising for ill-behaved objective functions.In cases where it could be estimated, the natural gradient has proven very effective at mitigating the catastrophic effects of pathological curvature in the objective function, but little is known theoretically about its convergence properties, and it has yet to find a practical implementation that would scale to very deep and large networks. |
Alberto Bernacchia, Mate Lengyel, Guillaume Hennequin |
Fast Approximate Natural Gradient Descent In A Kronecker Factored Eigenbasis
Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions.In the present work the authors draw inspiration from both to propose a novel approximation that is provably better than KFAC and amendable to cheap partial updates. |
Thomas George, César Laurent, Xavier Bouthillier, Nicolas Ballas, Pascal Vincent |
Post: Device Placement With Cross-Entropy Minimization And Proximal Policy Optimization | Yuanxiang Gao, Li Chen, Baochun Li |
Paraphrasing Complex Network: Network Compression Via Factor Transfer | Jangho Kim, Seonguk Park, Nojun Kwak |
Learning Compressed Transforms With Low Displacement Rank
The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual. |
Anna Thomas, Albert Gu, Tri Dao, Atri Rudra, Chris Ré |
Knowledge Distillation By On-the-Fly Native Ensemble
Knowledge distillation is effective to train the small and generalisable network models for meeting the low-memory and fast running requirements.In this work, the authors present an On-the-fly Native Ensemble (ONE) learning strategy for one-stage online distillation. |
Xu Lan, Xiatian Zhu, Shaogang Gong |
Scalable Methods For 8-bit Training Of Neural Networks
Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e.Additionally, as QNNs require batch-normalization to be trained at high precision, the authors introduce Range Batch-Normalization (BN) which has significantly higher tolerance to quantization noise and improved computational complexity. |
Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry |
Training Deep Models Faster With Robust, Approximate Importance Sampling
In theory, importance sampling speeds up stochastic gradient algorithms for supervised learning by prioritizing training examples.The authors propose a robust, approximate importance sampling procedure (RAIS) for stochastic gradient de- scent. |
Tyler Johnson, Carlos Guestrin |
Collaborative Learning For Deep Neural Networks
The authors introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost. |
Guocong Song, Wei Chai |
A Linear Speedup Analysis Of Distributed Deep Learning With Sparse And Quantized Communication
The large communication overhead has imposed a bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) for training deep neural networks.In this paper, the authors study the convergence rate of distributed SGD for non-convex optimization with two communication reducing strategies: sparse parameter averaging and gradient quantization. |
Peng Jiang, Gagan Agrawal |
Bayesian Distributed Stochastic Gradient Descent
The authors introduce Bayesian distributed stochastic gradient descent (BDSGD), a high-throughput algorithm for training deep neural networks on parallel clusters. |
Michael Teng, Frank Wood |
Regularizing By The Variance Of The Activations' Sample-Variances
Normalization techniques play an important role in supporting efficient and often more effective training of deep neural networks. |
Etai Littwin, Lior Wolf |
BML: A High-performance, Low-cost Gradient Synchronization Algorithm For DML Training
In distributed machine learning (DML), the network performance between machines significantly impacts the speed of iterative training.In this paper the authors propose BML, a new gradient synchronization algorithm with higher network performance and lower network cost than the current practice. |
Songtao Wang, Dan Li, Yang Cheng, Jinkun Geng, Yanshu Wang, Shuai Wang, Shu-Tao Xia, Jianping Wu |
L4: Practical Loss-based Stepsize Adaptation For Deep Learning
The authors propose a stepsize adaptation scheme for stochastic gradient descent.It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss.The authors demonstrate its capabilities by conclusively improving the performance of Adam and Momentum optimizers.The enhanced optimizers with default hyperparameters consistently outperform their constant stepsize counterparts, even the best ones, without a measurable increase in computational cost.The performance is validated on multiple architectures including dense nets, CNNs, ResNets, and the recurrent Differential Neural Computer on classical datasets MNIST, fashion MNIST, CIFAR10 and others. |
Michal Rolinek, Georg Martius |
Synaptic Strength For Convolutional Neural Network
Convolutional Neural Networks(CNNs) are both computation and memory inten-sive which hindered their deployment in mobile devices.Inspired by the relevantconcept in neural science literature, the authors propose Synaptic Pruning: a data-drivenmethod to prune connections between input and output feature maps with a newlyproposed class of parameters called Synaptic Strength. |
CHEN LIN, Zhao Zhong, Wu Wei, Junjie Yan |
ChannelNets: Compact And Efficient Convolutional Neural Networks Via Channel-Wise Convolutions
Convolutional neural networks (CNNs) have shown great capability of solving various artificial intelligence tasks.In this work, the authors propose to compress deep models by using channel-wise convolutions, which replace dense connections among feature maps with sparse ones in CNNs. |
Hongyang Gao, Zhengyang Wang, Shuiwang Ji |
Frequency-Domain Dynamic Pruning For Convolutional Neural Networks
Deep convolutional neural networks have demonstrated their powerfulness in a variety of applications.Considering that there are spatial redundancy within most filters in a CNN, the authors propose a frequency-domain dynamic pruning scheme to exploit the spatial correlations. |
Zhenhua Liu, Jizheng Xu, Xiulian Peng, Ruiqin Xiong |
TETRIS: TilE-matching The TRemendous Irregular Sparsity
Compressing neural networks by pruning weights with small magnitudes can significantly reduce the computation and storage cost. |
Yu Ji, Ling Liang, Lei Deng, Youyang Zhang, Youhui Zhang, Yuan Xie |
Heterogeneous Bitwidth Binarization In Convolutional Neural Networks
Recent work has shown that fast, compact low-bitwidth neural networks canbe surprisingly accurate.However, modern hardware allows efficient designs whereeach arithmetic instruction can have a custom bitwidth, motivating heterogeneousbinarization, where every parameter in the network may have a different bitwidth.In this paper, the authors show that it is feasible and useful to select bitwidths at theparameter granularity during training. |
Joshua Fromm, Shwetak Patel, Matthai Philipose |
HitNet: Hybrid Ternary Recurrent Neural Network | Peiqi Wang, Xinfeng Xie, Lei Deng, Guoqi Li, Dongsheng Wang, Yuan Xie |
A General Method For Amortizing Variational Filtering
The authors introduce the variational filtering EM algorithm, a simple, general-purpose method for performing variational inference in dynamical latent variable models using information from only past and present variables, i.e. |
Joe Marino, Milan Cvitkovic, Yisong Yue |
Multiple Instance Learning For Efficient Sequential Data Classification On Resource-constrained Devices
The authors study the problem of fast and efficient classification of sequential data (such astime-series) on tiny devices, which is critical for various IoT related applicationslike audio keyword detection or gesture detection. |
Don Dennis, Chirag Pabbaraju, Harsha Simhadri, Prateek Jain |
Navigating With Graph Representations For Fast And Scalable Decoding Of Neural Language Models
Neural language models (NLMs) have recently gained a renewed interest by achieving state-of-the-art performance across many natural language processing (NLP) tasks.However, NLMs are very computationally demanding largely due to the computational cost of the decoding process, which consists of a softmax layer over a large vocabulary.The authors observe that in the decoding of many NLP tasks, only the probabilities of the top-K hypotheses need to be calculated preciously and K is often much smaller than the vocabulary size.This paper proposes a novel softmax layer approximation algorithm, called Fast Graph Decoder (FGD), which quickly identifies, for a given context, a set of K words that are most likely to occur according to a NLM. |
Minjia Zhang, Wenhan Wang, Xiaodong Liu, Jianfeng Gao, Yuxiong He |
Representer Point Selection For Explaining Deep Neural Networks
The authors propose to explain the predictions of a deep neural network, by pointing to the set of what the authors call representer points in the training set, for a given test point prediction. |
Chih-Kuan Yeh, Joon Kim, Ian En-Hsu Yen, Pradeep Ravikumar |
Interpreting Neural Network Judgments Via Minimal, Stable, And Symbolic Corrections
The authors present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output. |
Xin Zhang, Armando Solar-Lezama, Rishabh Singh |
DropMax: Adaptive Variational Softmax
The authors propose DropMax, a stochastic version of softmax classifier which at each iteration drops non-target classes according to dropout probabilities adaptively decided for each instance. |
Hae Beom Lee, Juho Lee, Saehoon Kim, Eunho Yang, Sung Ju Hwang |
Out-of-Distribution Detection Using Multiple Semantic Label Representations
Deep Neural Networks are powerful models that attained remarkable results on a variety of tasks.The authors propose to use multiple semantic dense representations instead of sparse representation as the target label. |
Gabi Shalev, Yossi Adi, Yossi Keshet |
End-to-end Symmetry Preserving Inter-atomic Potential Energy Model For Finite And Extended Systems
Machine learning models are changing the paradigm of molecular modeling, which is a fundamental tool for material science, chemistry, and computational biology.Here the authors develop Deep Potential - Smooth Edition (DeepPot-SE), an end-to-end machine learning-based PES model, which is able to efficiently represent the PES for a wide variety of systems with the accuracy of ab initio quantum mechanics models. |
Linfeng Zhang, Jiequn Han, Han Wang, Wissam Saidi, Roberto Car, Weinan E |
Deep Predictive Coding Network With Local Recurrent Processing For Object Recognition
Inspired by "predictive coding" - a theory in neuroscience, the authors develop a bi-directional and dynamic neural network with local recurrent processing, namely predictive coding network (PCN).Feedback and feedforward connections enable adjacent layers to interact locally and recurrently to refine representations towards minimization of layer-wise prediction errors. |
Kuan Han, Haiguang Wen, Yizhen Zhang, Di Fu, Eugenio Culurciello, Zhongming Liu |
SLAYER: Spike Layer Error Reassignment In Time
Configuring deep Spiking Neural Networks (SNNs) is an exciting research avenue for low power spike event based computation.In this paper, the authors introduce a new general backpropagation mechanism for learning synaptic weights and axonal delays which overcomes the problem of non-differentiability of the spike function and uses a temporal credit assignment policy for backpropagating error to preceding layers. |
Sumit Bam Shrestha, Garrick Orchard |
DeepPINK: Reproducible Feature Selection In Deep Neural Networks
Deep learning has become increasingly popular in both supervised and unsupervised machine learning thanks to its outstanding empirical performance.In this paper, the authors describe a method to increase the interpretability and reproducibility of DNNs by incorporating the idea of feature selection with controlled error rate. |
Yang Lu, Yingying Fan, Jinchi Lv, William Stafford Noble |
Learning Long-range Spatial Dependencies With Horizontal Gated Recurrent Units
Progress in deep learning has spawned great successes in many engineering applications.The authors introduce a visual challenge, Pathfinder, and describe a novel recurrent neural network architecture called the horizontal gated recurrent unit (hGRU) to learn intrinsic horizontal connections -- both within and across feature columns. |
Drew Linsley, Junkyung Kim, Vijay Veerabadran, Charles Windolf, Thomas Serre |
Neural Interaction Transparency (NIT): Disentangling Learned Interactions For Improved Interpretability
Neural networks are known to model statistical interactions, but they entangle the interactions at intermediate hidden layers for shared representation learning. |
Michael Tsang, Hanpeng Liu, Sanjay Purushotham, Pavankumar Murali, Yan Liu |
A Bridging Framework For Model Optimization And Deep Propagation
Optimizing task-related mathematical model is one of the most fundamental methodologies in statistic and learning areas.However, generally designed schematic iterations may hard to investigate complex data distributions in real-world applications. |
Risheng Liu, Shichao Cheng, Xiaokun Liu, Long Ma, Xin Fan, Zhongxuan Luo |
The Importance Of Sampling InMeta-Reinforcement Learning
The authors interpret meta-reinforcement learning as the problem of learning how to quickly find a good sampling distribution in a new environment. |
Bradly Stadie, Ge Yang, Rein Houthooft, Peter Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever |
Latent Alignment And Variational Attention
Neural attention has become central to many state-of-the-art models in natural language processing and related domains.The authors further propose methods for reducing the variance of gradients to make these approaches computationally feasible. |
Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander Rush |
Variational Memory Encoder-Decoder
Introducing variability while maintaining coherence is a core task in learning to generate utterances in conversation.The authors empirically compare the proposed model against other recent approaches on various conversational datasets. |
Hung Le, Truyen Tran, Thin Nguyen, Svetha Venkatesh |
Relational Recurrent Neural Networks
Memory-based neural networks model temporal data by leveraging an ability to remember information for long periods. |
Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, Timothy Lillicrap |
Learning To Reason With Third Order Tensor Products
The authors combine Recurrent Neural Networks with Tensor Product Representations tolearn combinatorial representations of sequential data. |
Imanol Schlag, Jürgen Schmidhuber |
Reversible Recurrent Neural Networks
Recurrent neural networks (RNNs) provide state-of-the-art performance in processing sequential data but are memory intensive to train, limiting the flexibility of RNN models which can be trained. |
Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse |
Breaking The Activation Function Bottleneck Through Adaptive Parameterization
Standard neural network architectures are non-linear only by virtue of a simple element-wise activation function, making them both brittle and excessively large.The authors present an adaptive LSTM that advances the state of the art for the Penn Treebank and Wikitext-2 word-modeling tasks while using fewer parameters and converging in half as many iterations. |
Sebastian Flennerhag, Hujun Yin, John Keane, Mark Elliot |
BRITS: Bidirectional Recurrent Imputation For Time Series
Time series are widely used as signals in many classification/regression tasks.In this paper, the authors propose BRITS, a novel method based on recurrent neural networks for missing value imputation in time series data. |
Wei Cao, Dong Wei, Jian Li, Hao Zhou, Lei Li, Yitan Li |
Complex Gated Recurrent Neural Networks | Moritz Wolter, Angela Yao |
Middle-Out Decoding
Despite being virtually ubiquitous, sequence-to-sequence models are challenged by their lack of diversity and inability to be externally controlled.To address this issue, the authors propose a novel middle-out decoder architecture that begins from an initial middle-word and simultaneously expands the sequence in both directions. |
Shikib Mehri, Leonid Sigal |
Recurrently Controlled Recurrent Networks
Recurrent neural networks (RNNs) such as long short-term memory and gated recurrent units are pivotal building blocks across a broad spectrum of sequence modeling problems.This paper proposes a recurrently controlled recurrent network (RCRN) for expressive and powerful sequence encoding. |
Yi Tay, Anh Tuan Luu, Siu Cheung Hui |
Tree-to-tree Neural Networks For Program Translation | Xinyun Chen, Chang Liu, Dawn Song |
HOUDINI: Lifelong Learning As Program Synthesis
The authors present a neurosymbolic framework for the lifelong learning of algorithmic tasks that mix perception and procedural reasoning. |
Lazar Valkov, Dipak Chaudhari, Akash Srivastava, Charles Sutton, Swarat Chaudhuri |
Neural Guided Constraint Logic Programming For Program Synthesis
Synthesizing programs using example input/outputs is a classic problem in artificial intelligence.The authors present a method for solving Programming By Example (PBE) problems by using a neural model to guide the search of a constraint logic programming system called miniKanren. |
Lisa Zhang, Gregory Rosenblatt, Ethan Fetaya, Renjie Liao, William Byrd, Matthew Might, Raquel Urtasun, Richard Zemel |
Embedding Logical Queries On Knowledge Graphs
Learning low-dimensional embeddings of knowledge graphs is a powerful approach used to predict unobserved or missing edges between entities.Here the authors introduce a framework to efficiently make predictions about conjunctive logical queries -- a flexible but tractable subset of first-order logic -- on incomplete knowledge graphs. |
Will Hamilton, Payal Bajaj, Marinka Zitnik, Dan Jurafsky, Jure Leskovec |
Expanding Holographic Embeddings For Knowledge Completion
Neural models operating over structured spaces such as knowledge graphs require a continuous embedding of the discrete elements of this space (such as entities) as well as the relationships between them.The authors propose a new family of embeddings for knowledge graphs that interpolate between a method with high model complexity and one, namely Holographic embeddings (HolE), with low dimensionality and high training efficiency. |
Yexiang Xue, Yang Yuan, Zhitian Xu, Ashish Sabharwal |
On The Dimensionality Of Word Embedding
In this paper, the authors provide a theoretical understanding of word embedding and its dimensionality.Motivated by the unitary-invariance of word embedding, the authors propose the Pairwise Inner Product (PIP) loss, a novel metric on the dissimilarity between word embeddings. |
Zi Yin, Yuanyuan Shen |
Flexible Neural Representation For Physics Prediction
Humans have a remarkable capacity to understand the physical dynamics of objects in their environment, flexibly capturing complex structures and interactions at multiple levels of detail.Inspired by this ability, the authors propose a hierarchical particle-based object representation that covers a wide variety of types of three-dimensional objects, including both arbitrary rigid geometrical shapes and deformable materials. |
Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Li Fei-Fei, Josh Tenenbaum, Daniel Yamins |
Content Preserving Text Generation With Attribute Controls
In this work, the authors address the problem of modifying textual attributes of sentences.To ensure that the model generates content compatible sentences, the authors introduce a reconstruction loss which interpolates between auto-encoding and back-translation loss components. |
Lajanugen Logeswaran, Honglak Lee, Samy Bengio |
Recurrent Relational Networks
This paper is concerned with learning to solve tasks that require a chain of interde-pendent steps of relational inference, like answering complex questions about therelationships between objects, or solving puzzles where the smaller elements of asolution mutually constrain each other.The authors introduce the recurrent relational net-work, a general purpose module that operates on a graph representation of objects.As a generalization of Santoro et al. |
Rasmus Palm, Ulrich Paquet, Ole Winther |
GLoMo: Unsupervised Learning Of Transferable Relational Graphs
Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision.However, these approaches usually transfer unary features and largely ignore more structured graphical representations. |
Zhilin Yang, Jake Zhao, Bhuwan Dhingra, Kaiming He, William Cohen, Russ Salakhutdinov, Yann LeCun |
Predictive Uncertainty Estimation Via Prior Networks
Estimating how uncertain an AI system is in its predictions is important to improve the safety of such systems.This work proposes a new framework for modeling predictive uncertainty called Prior Networks (PNs) which explicitly models emph{distributional uncertainty}. |
Andrey Malinin, Mark Gales |
Adversarial Multiple Source Domain Adaptation
The authors propose new generalization bounds and algorithms under both classification and regression settings for unsupervised multiple source domain adaptation. |
Han Zhao, Shanghang Zhang, Guanhang Wu, José M. F. Moura, Joao P Costeira, Geoffrey Gordon |
Adversarial Examples That Fool Both Computer Vision And Time-Limited Humans
Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. |
Gamaleldin Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alexey Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein |
A Simple Cache Model For Image Recognition
Training large-scale image recognition models is computationally expensive.The authors propose to extract this extra class-relevant information using a simple key-value cache memory to improve the classification performance of the model at test time. |
Emin Orhan |
Co-teaching: Robust Training Of Deep Neural Networks With Extremely Noisy Labels
Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training.Therefore in this paper, the authors propose a new deep learning paradigm called 'Co-teaching' for combating with noisy labels. |
Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, Masashi Sugiyama |
Improved Network Robustness With Adversary Critic
Ideally, what confuses neural network should be confusing to humans.To address this gap in perception, the authors propose a novel approach for learning robust classifier. |
Alexander Matyasko, Lap-Pui Chau |
Unsupervised Learning Of Object Landmarks Through Conditional Image Generation
The authors propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision. |
Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi |
Multi-Task Learning As Multi-Objective Optimization
In multi-task learning, multiple tasks are solved jointly, sharing inductive bias between them.The authors therefore propose an upper bound for the multi-objective loss and show that it can be optimized efficiently. |
Ozan Sener, Vladlen Koltun |
Deep Anomaly Detection Using Geometric Transformations
The authors consider the problem of anomaly detection in images, and present a new detection technique. |
Izhak Golan, Ran El-Yaniv |
Practical Deep Stereo (PDS): Toward Applications-friendly Deep Stereo Matching
End-to-end deep-learning networks recently demonstrated extremely good performance for stereo matching. |
Stepan Tulyakov, Anton Ivanov, François Fleuret |
VideoCapsuleNet: A Simplified Network For Action Detection
The recent advances in Deep Convolutional Neural Networks (DCNNs) have shown extremely good results for video human action classification, however, action detection is still a challenging problem.In this work, the authors present a more elegant solution for action detection based on the recently developed capsule network. |
Kevin Duarte, Yogesh Rawat, Mubarak Shah |
With Friends Like These, Who Needs Adversaries?
The vulnerability of deep image classification networks to adversarial attack is now well known, but less well understood. |
Saumya Jetley, Nick Lord, Philip Torr |
Multi-Task Zipping Via Layer-wise Neuron Sharing
Future mobile devices are anticipated to perceive, understand and react to the world on their own by running multiple correlated deep neural networks on-device.The authors propose Multi-Task Zipping (MTZ), a framework to automatically merge correlated, pre-trained deep neural networks for cross-model compression. |
Xiaoxi He, Zimu Zhou, Lothar Thiele |
Learning Versatile Filters For Efficient Convolutional Neural Networks
This paper introduces versatile filters to construct efficient convolutional neural network. |
Yunhe Wang, Chang Xu, Chunjing XU, Chao Xu, Dacheng Tao |
Evolutionary Stochastic Gradient Descent For Optimization Of Deep Neural Networks
The authors propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. |
Xiaodong Cui, Wei Zhang, Zoltán Tüske, Michael Picheny |
Structure-Aware Convolutional Neural Networks
Convolutional neural networks (CNNs) are inherently subject to invariable filters that can only aggregate local inputs with the same topological structures. |
Jianlong Chang, Jie Gu, Lingfeng Wang, GAOFENG MENG, SHIMING XIANG, Chunhong Pan |
Global Gated Mixture Of Second-order Pooling For Improving Deep Convolutional Neural Networks
In most of existing deep convolutional neural networks (CNNs) for classification, global average (first-order) pooling (GAP) has become a standard module to summarize activations of the last convolution layer as final representation for prediction. |
Qilong Wang, Zilin Gao, Jiangtao Xie, Wangmeng Zuo, Peihua Li |
Hybrid Macro/Micro Level Backpropagation For Training Deep Spiking Neural Networks
Spiking neural networks (SNNs) are positioned to enable spatio-temporal information processing and ultra-low power event-driven neuromorphic hardware.The authors present a hybrid macro/micro level backpropagation (HM2-BP) algorithm for training multi-layer SNNs. |
Yingyezhe Jin, Wenrui Zhang, Peng Li |
Deep Neural Nets With Interpolating Function As Output Activation
The authors replace the output layer of deep neural nets, typically the softmax function, by a novel interpolating function.And the authors propose end-to-end training and testing algorithms for this new architecture. |
Bao Wang, Xiyang Luo, Zhen Li, Wei Zhu, Zuoqiang Shi, Stanley Osher |
Neural Edit Operations For Biological Sequences
The evolution of biological sequences, such as proteins or DNAs, is driven by the three basic edit operations: substitution, insertion, and deletion.Motivated by the recent progress of neural network models for biological tasks, the authors implement two neural network architectures that can treat such edit operations. |
Satoshi Koide, Keisuke Kawano, Takuro Kutsuna |
Improved Expressivity Through Dendritic Neural Networks
A typical biological neuron, such as a pyramidal neuron of the neocortex, receives thousands of afferent synaptic inputs on its dendrite tree and sends the efferent axonal output downstream.In typical artificial neural networks, dendrite trees are modeled as linear structures that funnel weighted synaptic inputs to the cell bodies. |
Xundong Wu, Xiangwen Liu, Wei Li, Qing Wu |
Neural Proximal Gradient Descent For Compressive Imaging
Recovering high-resolution images from limited sensory data typically leads to a serious ill-posed inverse problem, demanding inversion algorithms that effectively capture the prior information. |
Morteza Mardani, Qingyun Sun, David Donoho, Vardan Papyan, Hatef Monajemi, Shreyas Vasanawala, John Pauly |
Sigsoftmax: Reanalysis Of The Softmax Bottleneck
Softmax is an output activation function for modeling categorical probability distributions in many applications of deep learning.However, a recent study revealed that softmax can be a bottleneck of representational capacity of neural networks in language modeling (the softmax bottleneck). |
Sekitoshi Kanai, Yasuhiro Fujiwara, Yuki Yamanaka, Shuichi Adachi |
Visualizing The Loss Landscape Of Neural Nets
The authors explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. |
Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein |
Clebsch–Gordan Nets: A Fully Fourier Space Spherical Convolutional Neural Network
Recent work by Cohen et al.has achieved state-of-the-art results for learning spherical images in a rotation invariant way by using ideas from group representation theory and noncommutative harmonic analysis. |
Risi Kondor, Zhen Lin, Shubhendu Trivedi |
Adaptive Sampling Towards Fast Graph Representation Learning
Graph Convolutional Networks (GCNs) have become a crucial tool on learning representations of graph vertices. |
Wenbing Huang, Tong Zhang, Yu Rong, Junzhou Huang |
NAIS-Net: Stable Deep Networks From Non-Autonomous Differential Equations
This paper introduces Non-Autonomous Input-Output Stable Network (NAIS-Net), a very deep architecture where each stacked processing block is derived from a time-invariant non-autonomous dynamical system. |
Marco Ciccone, Marco Gallieri, Jonathan Masci, Christian Osendorfer, Faustino Gomez |
Scaling Provable Adversarial Defenses
Recent work has developed methods for learning deep network classifiers that are emph{provably} robust to norm-bounded adversarial perturbation; however, these methods are currently only possible for relatively small feedforward networks.First, the authors present a technique for extending these training procedures to much more general networks, with skip connections (such as ResNets) and general nonlinearities; the approach is fully modular, and can be implemented automatically analogously to automatic differentiation. |
Eric Wong, Frank Schmidt, Jan Hendrik Metzen, J. Zico Kolter |
Lipschitz Regularity Of Deep Neural Networks: Analysis And Efficient Estimation
Deep neural networks are notorious for being sensitive to small well-chosen perturbations, and estimating the regularity of such architectures is of utmost importance for safe and robust practical applications.Second, for sequential neural networks, the authors propose an improved algorithm named SeqLip that takes advantage of the linear computation graph to split the computation per pair of consecutive layers. |
Aladin Virmaux, Kevin Scaman |
Training DNNs With Hybrid Block Floating Point
The wide adoption of DNNs has given birth to unrelenting computing requirements, forcing datacenter operators to adopt domain-specific accelerators to train them.The authors identify block floating point (BFP) as a promising alternative representation since it exhibits wide dynamic range and enables the majority of DNN operations to be performed with fixed-point logic. |
Mario Drumond, Tao LIN, Martin Jaggi, Babak Falsafi |
Mesh-TensorFlow: Deep Learning For Supercomputers
Batch-splitting (data-parallelism) is the dominant distributed Deep Neural Network (DNN) training strategy, due to its universal applicability and its amenability to Single-Program-Multiple-Data (SPMD) programming.Unfortunately, efficient model-parallel algorithms tend to be complicated to discover, describe, and to implement, particularly on large clusters. |
Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman |
Thwarting Adversarial Examples: An $L_0$-Robust Sparse Fourier Transform
The authors give a new algorithm for approximating the Discrete Fourier transform of an approximately sparse signal that is robust to worst-case $L_0$ corruptions |
Mitali Bafna, Jack Murtagh, Nikhil Vyas |
Bayesian Adversarial Learning
Deep neural networks have been known to be vulnerable to adversarial attacks, raising lots of security concerns in the practical deployment.In this work, a novel robust training framework is proposed to alleviate this issue, Bayesian Robust Learning, in which a distribution is put on the adversarial data-generating distribution to account for the uncertainty of the adversarial data-generating process. |
Lincoln Ye, Zhanxing Zhu |
Dendritic Cortical Microcircuits Approximate The Backpropagation Algorithm
Deep learning has seen remarkable developments over the last years, many of them inspired by neuroscience.Here, the authors introduce a multilayer neuronal network model with simplified dendritic compartments in which error-driven synaptic plasticity adapts the network towards a global desired output. |
João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn |
Learning A Latent Manifold Of Odor Representations From Neural Responses In Piriform Cortex
A major difficulty in studying the neural mechanisms underlying olfactory perception is the lack of obvious structure in the relationship between odorants and the neural activity patterns they elicit. |
Anqi Wu, Stan Pashkovski, Sandeep Datta, Jonathan W Pillow |
Size-Noise Tradeoffs In Generative Networks
This paper investigates the ability of generative networks to convert their input noise distributions into other distributions.Firstly, the authors demonstrate a construction that allows ReLU networks to increase the dimensionality of their noise distribution by implementing a ``space-filling' function based on iterated tent maps. |
Bolton Bailey, Matus Telgarsky |
On Neuronal Capacity
The authors define the capacity of a learning machine to be the logarithm of the number (or volume) of the functions it can implement. |
Pierre Baldi, Roman Vershynin |
Learning Overparameterized Neural Networks Via Stochastic Gradient Descent On Structured Data
Neural networks have many successful applications, while much less theoretical understanding has been gained.Towards bridging this gap, the authors study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. |
Yuanzhi Li, Yingyu Liang |
Deep, Complex, Invertible Networks For Inversion Of Transmission Effects In Multimode Optical Fibres
The authors use complex-weighted, deep networks to invert the effects of multimode optical fibre distortion of a coherent input image.The authors generated experimental data based on collections of optical fibre responses to greyscale input images generated with coherent light, by measuring only image amplitude (not amplitude and phase as is typical) at the output of SI{1}{metre} and SI{10}{metre} long, SI{105}{micrometre} diameter multimode fibre. |
Oisín Moran, Piergiorgio Caramazza, Daniele Faccio, Rod Murray-Smith |
Learning Towards Minimum Hyperspherical Energy
Neural networks are a powerful class of nonlinear functions that can be trained end-to-end on various applications.While the over-parametrization nature in many neural networks renders the ability to fit complex functions and the strong representation power to handle challenging tasks, it also leads to highly correlated neurons that can hurt the generalization ability and incur unnecessary computation cost. |
Weiyang Liu, Rongmei Lin, Zhen Liu, Lixin Liu, Zhiding Yu, Bo Dai, Le Song |
Soft-Gated Warping-GAN For Pose-Guided Person Image Synthesis
Despite remarkable advances in image synthesis research, existing works often fail in manipulating images under the context of large geometric transformations.Synthesizing person images conditioned on arbitrary poses is one of the most representative examples where the generation quality largely relies on the capability of identifying and modeling arbitrary transformations on different body parts. |
Haoye Dong, Xiaodan Liang, Ke Gong, Hanjiang Lai, Jia Zhu, Jian Yin |
Deep Attentive Tracking Via Reciprocative Learning
Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data.In this paper, the authors propose a reciprocative learning algorithm to exploit visual attention for training deep classifiers. |
Shi Pu, Yibing Song, Chao Ma, Honggang Zhang, Ming-Hsuan Yang |
Robot Learning In Homes: Improving Generalization And Reducing Dataset Bias
Data-driven approaches to solving robotic tasks have gained a lot of traction in recent years.However, most existing policies are trained on large-scale datasets collected in curated lab settings. |
Abhinav Gupta, Adithyavairavan Murali, Dhiraj Prakashchand Gandhi, Lerrel Pinto |
A Flexible Model For Training Action Localization With Varying Levels Of Supervision
Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame.In this work the authors propose a unifying framework that can handle and combine varying types of less demanding weak supervision. |
Guilhem Chéron, Jean-Baptiste Alayrac, Ivan Laptev, Cordelia Schmid |
Bilinear Attention Networks
Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively.In this paper, the authors propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. |
Jin-Hwa Kim, Jaehyun Jun, Byoung-Tak Zhang |
MacNet: Transferring Knowledge From Machine Comprehension To Sequence-to-Sequence Models
Machine Comprehension (MC) is one of the core problems in natural language processing, requiring both understanding of the natural language and knowledge about the world.The authors propose MacNet: a novel encoder-decoder supplementary architecture to the widely used attention-based sequence-to-sequence models. |
Boyuan Pan, Yazheng Yang, Hao Li, Zhou Zhao, Yueting Zhuang, Deng Cai, Xiaofei He |
Diffusion Maps For Textual Network Embedding
Textual network embedding leverages rich text information associated with the network to learn low-dimensional vectorial representations of vertices.Rather than using typical natural language processing (NLP) approaches, recent research exploits the relationship of texts on the same edge to graphically embed text. |
Xinyuan Zhang, Yitong Li, Dinghan Shen, Lawrence Carin |
FRAGE: Frequency-Agnostic Word Representation
Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks. |
Chengyue Gong, Di He, Xu Tan, Tao Qin, Liwei Wang, Tie-Yan Liu |
A Retrieve-and-Edit Framework For Predicting Structured Outputs | Tatsunori Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang |
Unsupervised Text Style Transfer Using Language Models As Discriminators
Binary classifiers are employed as discriminators in GAN-based unsupervised style transfer models to ensure that transferred sentences are similar to sentences in the target domain.In this paper, the authors propose a technique of using a target domain language model as the discriminator to provide richer, token-level feedback during the learning process. |
Zichao Yang, Zhiting Hu, Chris Dyer, Eric Xing, Taylor Berg-Kirkpatrick |
Unsupervised Cross-Modal Alignment Of Speech And Text Embedding Spaces
Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision.The proposed framework learns the individual speech and text embedding spaces, and attempts to align the two spaces via adversarial training, followed by a refinement procedure. |
Yu-An Chung, Wei-Hung Weng, Schrasing Tong, Jim Glass |
GradiVeQ: Vector Quantization For Bandwidth-Efficient Gradient Aggregation In Distributed CNN Training
Data parallelism can boost the training speed of convolutional neural networks (CNN), but could suffer from significant communication costs caused by gradient aggregation.In this paper, the authors empirically demonstrate the strong linear correlations between CNN gradients, and propose a gradient vector quantization technique, named GradiVeQ, to exploit these correlations through principal component analysis (PCA) for substantial gradient dimension reduction. |
Mingchao Yu, Zhifeng Lin, Krishna Narra, Songze Li, Youjie Li, Nam Sung Kim, Alex Schwing, Murali Annavaram, Salman Avestimehr |
Gradient Sparsification For Communication-Efficient Distributed Optimization
Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed computational architectures. |
Jianqiao Wangni, Jialei Wang, Ji Liu, Tong Zhang |
Bayesian Multi-domain Learning For Cancer Subtype Discovery From Next-generation Sequencing Count Data
Precision medicine aims for personalized prognosis and therapeutics by utilizing recent genome-scale high-throughput profiling techniques, including next-generation sequencing (NGS).Second, compared to the number of involved molecules and system complexity, the number of available samples for studying complex disease, such as cancer, is often limited, especially considering disease heterogeneity. |
Ehsan Hajiramezanali, Siamak Zamani Dadaneh, Alireza Karbalayghareh, Mingyuan Zhou, Xiaoning Qian |
Parsimonious Quantile Regression Of Financial Asset Tail Dynamics Via Sequential Learning
The authors propose a parsimonious quantile regression framework to learn the dynamic tail behaviors of financial asset returns. |
Xing Yan, Weizhong Zhang, Lin Ma, Wei Liu, Qi Wu |
Global Geometry Of Multichannel Sparse Blind Deconvolution On The Sphere | Yanjun Li, Yoram Bresler |
Phase Retrieval Under A Generative Prior
The authors introduce a novel deep-learning inspired formulation of the extit{phase retrieval problem}, which asks to recover a signal $y_0 in R^n$ from $m$ quadratic observations, under structural assumptions on the underlying signal. |
Paul Hand, Oscar Leong, Vlad Voroninski |
Theoretical Linear Convergence Of Unfolded ISTA And Its Practical Weights And Thresholds
In recent years, unfolding iterative algorithms as neural networks has become an empirical success in solving sparse recovery problems.In this work, the authors study unfolded ISTA (Iterative Shrinkage Thresholding Algorithm) for sparse signal recovery. |
Xiaohan Chen, Jialin Liu, Zhangyang Wang, Wotao Yin |
Modern Neural Networks Generalize On Small Data Sets
In this paper, the authors use a linear program to empirically decompose fitted neural networks into ensembles of low-bias sub-networks.The authors then demonstrate this in practice by applying large neural networks, with hundreds of parameters per training observation, to a collection of 116 real-world data sets from the UCI Machine Learning Repository. |
Matthew Olson, Adi Wyner, Richard Berk |
Co-regularized Alignment For Unsupervised Domain Adaptation | Abhishek Kumar, Prasanna Sattigeri, Kahini Wadhawan, Leonid Karlinsky, Rogerio Feris, Bill Freeman, Gregory Wornell |
Neural Networks Trained To Solve Differential Equations Learn General Representations
The authors introduce a technique based on the singular vector canonical correlation analysis (SVCCA) for measuring the generality of neural network layers across a continuously-parametrized set of tasks. |
Martin Magill, Faisal Qureshi, Hendrick De Haan |
Spectral Filtering For General Linear Dynamical Systems
The authors give a polynomial-time algorithm for learning latent-state linear dynamical systems without system identification, and without assumptions on the spectral radius of the system's transition matrix. |
Elad Hazan, HOLDEN LEE, Karan Singh, Cyril Zhang, Yi Zhang |
Where Do You Think You're Going?: Inferring Beliefs About Dynamics From Behavior
Inferring intent from observed behavior has been studied extensively within the frameworks of Bayesian inverse planning and inverse reinforcement learning.The authors demonstrate in simulation and in a user study with 12 participants that this approach enables us to more accurately model human intent, and can be used in a variety of applications, including offering assistance in a shared autonomy framework and inferring human preferences. |
Sid Reddy, Anca Dragan, Sergey Levine |
Point Process Latent Variable Models Of Larval Zebrafish Behavior
A fundamental goal of systems neuroscience is to understand how neural activity gives rise to natural behavior.The authors develop a new class of probabilistic models to tackle this challenge in the study of larval zebrafish, an important model organism for neuroscience. |
Anuj Sharma, Scott Linderman, Robert Johnson, Florian Engert |
Provably Correct Automatic Sub-Differentiation For Qualified Programs
The emph{Cheap Gradient Principle}~citep{Griewank:2008:EDP:1455489} --- the computational cost of computing a $d$-dimensional vector of partial derivatives of a scalar function is nearly the same (often within a factor of $5$) as that of simply computing the scalar function itself --- is of central importance in optimization; it allows us to quickly obtain (high-dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures. |
Sham Kakade, Jason Lee |
Neural Ordinary Differential Equations
The authors introduce a new family of deep neural network models. |
Ricky Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud |
Information Constraints On Auto-Encoding Variational Bayes
Parameterizing the approximate posterior of a generative model with neural networks has become a common theme in recent machine learning research.The authors propose a framework for learning representations that relies on Auto-Encoding Variational Bayes and whose search space is constrained via kernel-based measures of independence. |
Romain Lopez, Jeff Regier, Michael I. Jordan, Nir Yosef |
Robustness Of Conditional GANs To Noisy Labels
The authors study the problem of learning conditional generators from noisy labeled samples, where the labels are corrupted by random noise. |
Kiran Thekumparampil, Ashish Khetan, Zinan Lin, Sewoong Oh |
Bias And Generalization In Deep Generative Models: An Empirical Study
In high dimensional settings, density estimation algorithms rely crucially on their inductive bias.In this paper the authors propose a framework to systematically investigate bias and generalization in deep generative models of images by probing the learning algorithm with carefully designed training datasets. |
Shengjia Zhao, Hongyu Ren, Arianna Yuan, Jiaming Song, Noah Goodman, Stefano Ermon |
Probabilistic Neural Programmed Networks For Scene Generation
In this paper the authors address the text to scene image generation problem.The authors propose PNP-Net, a variational auto-encoder framework that addresses these three challenges: it flexibly composes images with a dynamic network structure, learns a set of distribution transformers that can compose distributions based on semantics, and decodes samples from these distributions into realistic images. |
Zhiwei Deng, Jiacheng Chen, YIFANG FU, Greg Mori |
Step Size Matters In Deep Learning
Training a neural network with the gradient descent algorithm gives rise to a discrete-time nonlinear dynamical system.To elucidate the effects of the step size on training of neural networks, the authors study the gradient descent algorithm as a discrete-time dynamical system, and by analyzing the Lyapunov stability of different solutions, the authors show the relationship between the step size of the algorithm and the solutions that can be obtained with this algorithm. |
Kamil Nar, Shankar Sastry |
Neural Tangent Kernel: Convergence And Generalization In Neural Networks
At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods.The authors prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function (which maps input vectors to output vectors) follows the so-called kernel gradient associated with a new object, which the authors call the Neural Tangent Kernel (NTK). |
Arthur Jacot-Guillarmod, Clement Hongler, Franck Gabriel |
How Does Batch Normalization Help Optimization? | Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry |
Towards Robust Detection Of Adversarial Examples
Although the recent progress is substantial, deep learning methods can be vulnerable to the maliciously generated adversarial examples.In this paper, the authors present a novel training procedure and a thresholding test strategy, towards robust detection of adversarial examples. |
Tianyu Pang, Chao Du, Yinpeng Dong, Jun Zhu |
Training Neural Networks Using Features Replay | Zhouyuan Huo, Bin Gu, Heng Huang |
Faster Neural Networks Straight From JPEG
The simple, elegant approach of training convolutional neural networks (CNNs) directly from RGB pixels has enjoyed overwhelming empirical success.But can more performance be squeezed out of networks by using different input representations? |
Lionel Gueguen, Alex Sergeev, Ben Kadlec, Rosanne Liu, Jason Yosinski |
Hierarchical Graph Representation Learning With Differentiable Pooling
Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. |
Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, Jure Leskovec |
Bayesian Model-Agnostic Meta-Learning
Due to the inherent model uncertainty, learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning.In this paper, the authors propose a novel Bayesian model-agnostic meta-learning method. |
Jaesik Yoon, Taesup Kim, Ousmane Dia, Sungwoong Kim, Yoshua Bengio, Sungjin Ahn |
Evolved Policy Gradients
The authors propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. |
Rein Houthooft, Richard Chen, Phillip Isola, Bradly Stadie, Filip Wolski, OpenAI Jonathan Ho, Pieter Abbeel |
BourGAN: Generative Networks With Metric Embeddings
This paper addresses the mode collapse for generative adversarial networks (GANs). |
Chang Xiao, Peilin Zhong, Changxi Zheng |
Scalable End-to-End Autonomous Vehicle Testing Via Rare-event Simulation
While recent developments in autonomous vehicle (AV) technology highlight substantial progress, the authors lack tools for rigorous and scalable testing.The authors implement a simulation framework that can test an entire modern autonomous driving system, including, in particular, systems that employ deep-learning perception and control algorithms. |
Matthew O'Kelly, Aman Sinha, Hong Namkoong, Russ Tedrake, John Duchi |
Generalisation Of Structural Knowledge In The Hippocampal-entorhinal System
A central problem to understanding intelligence is the concept of generalisation.The authors propose that to generalise structural knowledge, the representations of the structure of the world, i.e. |
James Whittington, Timothy Muller, Shirely Mark, Caswell Barry, Tim Behrens |
Task-Driven Convolutional Recurrent Models Of The Visual System
Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet.The authors extended these design principles in an automated search over thousands of model architectures, which identified novel local recurrent cells and long-range feedback connections useful for object recognition. |
Aran Nayebi, Daniel Bear, Jonas Kubilius, Kohitij Kar, Surya Ganguli, David Sussillo, James J DiCarlo, Daniel Yamins |
Neural-Symbolic VQA: Disentangling Reasoning From Vision And Language Understanding
The authors marry two powerful ideas: deep representation learning for visual recognition and language understanding, and symbolic program execution for reasoning. |
Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Josh Tenenbaum |
Extracting Relationships By Multi-Domain Matching | Yitong Li, Michael Murias, Geraldine Dawson, David Carlson |
Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding
Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past.However, humans are often reminded of past memories or mental states which are associated with the current mental state.The authors consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state. |
Rosemary Ke, Anirudh Goyal ALIAS PARTH GOYAL, Olexa Bilaniuk, Jonathan Binas, Mike Mozer, Chris Pal, Yoshua Bengio |
Beauty-in-averageness And Its Contextual Modulations: A Bayesian Statistical Account
Understanding how humans perceive the likability of high-dimensional objects' such as faces is an important problem in both cognitive science and AI/ML.This statistical coding cost account explains both BiA, where facial blends generally have higher likelihood than ``parent faces', and UiA, when the preceding context or task restricts face representation to a task-relevant subset of features, thus redefining statistical typicality and encoding cost within that subspace. |
Chaitanya Ryali, Angela J Yu |
Stimulus Domain Transfer In Recurrent Models For Large Scale Cortical Population Prediction On Video
To better understand the representations in visual cortex, the authors need to generate better predictions of neural activity in awake animals presented with their ecological input: natural video. |
Fabee Sinz, Alexander Ecker, Paul Fahey, Edgar Walker, Erick Cobos, Emmanouil Froudarakis, Dimitri Yatsenko, Zachary Pitkow, Jacob Reimer, Andreas Tolias |
A Probabilistic Population Code Based On Neural Samples
Sensory processing is often characterized as implementing probabilistic inference: networks of neurons compute posterior beliefs over unobserved causes given the sensory inputs. |
Sabyasachi Shivkumar, Richard Lange, Ankani Chattoraj, Ralf Haefner |
CpSGD: Communication-efficient And Differentially-private Distributed SGD
Distributed stochastic gradient descent is an important subroutine in distributed learning.Several recent works have focused on reducing the communication cost or introducing privacy guarantees, but none of the proposed communication efficient methods are known to be privacy preserving and none of the known privacy mechanisms are known to be communication efficient. |
Naman Agarwal, Ananda Theertha Suresh, Felix Xinnan Yu, Sanjiv Kumar, Brendan McMahan |
Bounded-Loss Private Prediction Markets | Rafael Frongillo, Bo Waggoner |
Ex Ante Coordination And Collusion In Zero-sum Multi-player Extensive-form Games
Recent milestones in equilibrium computation, such as the success of Libratus, show that it is possible to compute strong solutions to two-player zero-sum games in theory and practice.The reasons for this are closely related to the fact that the team can be represented as a single player with imperfect recall. |
Gabriele Farina, Andrea Celli, Nicola Gatti, Tuomas Sandholm |
Model-Agnostic Private Learning
The authors design differentially private learning algorithms that are agnostic to the learning model assuming access to limited amount of unlabeled public data. |
Raef Bassily, Abhradeep Guha Thakurta, Om Thakkar |
Adversarially Robust Generalization Requires More Data
Machine learning models are often susceptible to adversarial perturbations of their inputs.To better understand this phenomenon, the authors study adversarially robust learning from the viewpoint of generalization. |
Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, Aleksander Madry |
Probabilistic Pose Graph Optimization Via Bingham Distributions And Tempered Geodesic MCMC
The authors introduce Tempered Geodesic Markov Chain Monte Carlo (TG-MCMC) algorithm for initializing pose graph optimization problems, arising in various scenarios such as SFM (structure from motion) or SLAM (simultaneous localization and mapping). |
Tolga Birdal, Umut Simsekli, Mustafa Onur Eken, Slobodan Ilic |
A Statistical Recurrent Model On The Manifold Of Symmetric Positive Definite Matrices
In a number of disciplines, the data (e.g., graphs, manifolds) to beanalyzed are non-Euclidean in nature.In this work, the authors study the setting where the data(or measurements) are ordered, longitudinal or temporal in nature andlive on a Riemannian manifold -- this setting is common in a varietyof problems in statistical machine learning, vision and medicalimaging. |
Rudrasis Chakraborty, Chun-Hao Yang, Xingjian Zhen, Monami Banerjee, Derek Archer, David Vaillancourt, Vikas Singh, Baba Vemuri |
Designing By Training: Acceleration Neural Network For Fast High-Dimensional Convolution
The high-dimensional convolution is widely used in various disciplines but has a serious performance problem due to its high computational complexity.Over the decades, people took a handmade approach to design fast algorithms for the Gaussian convolution. |
Longquan Dai, Liang Tang, Yuan Xie, Jinhui Tang |
Greedy Hash: Towards Fast Optimization For Accurate Hash Coding In CNN | Shupeng Su, Chao Zhang, Kai Han, Yonghong Tian |
Generative Probabilistic Novelty Detection With Adversarial Autoencoders
Novelty detection is the problem of identifying whether a new data point is considered to be an inlier or an outlier.The authors assume that training data is available to describe only the inlier distribution. |
Stanislav Pidhorskyi, Ranya Almohsen, Gianfranco Doretto |
Joint Sub-bands Learning With Clique Structures For Wavelet Domain Super-Resolution
Convolutional neural networks (CNNs) have recently achieved great success in single-image super-resolution (SISR).To solve these problems, the authors propose the Super-Resolution CliqueNet (SRCliqueNet) to reconstruct the high resolution (HR) image with better textural details in the wavelet domain. |
Zhisheng Zhong, Tiancheng Shen, Yibo Yang, Zhouchen Lin, Chao Zhang |
Compact Generalized Non-local Network
The non-local module is designed for capturing long-range spatio-temporal dependencies in images and videos. |
Kaiyu Yue, Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding, Fuxin Xu |
BinGAN: Learning Compact Binary Descriptors With A Regularized GAN
In this paper, the authors propose a novel regularization method for Generative Adversarial Networks that allows the model to learn discriminative yet compact binary representations of image patches (image descriptors). |
Maciej Zieba, Piotr Semberecki, Tarek El-Gaaly, Tomasz Trzcinski |
RenderNet: A Deep Convolutional Network For Differentiable Rendering From 3D Shapes
Traditional computer graphics rendering pipelines are designed for procedurallygenerating 2D images from 3D shapes with high performance. |
Thu H Nguyen-Phuoc, Chuan Li, Stephen Balaban, Yongliang Yang |
LF-Net: Learning Local Features From Images
The authors present a novel deep architecture and a training strategy to learn a local feature pipeline from scratch, using collections of images without the need for human supervision. |
Yuki Ono, Eduard Trulls, Pascal Fua, Kwang Yi |
Unsupervised Learning Of Shape And Pose With Differentiable Point Clouds
The authors address the problem of learning accurate 3D shape and camera pose from a collection of unlabeled category-specific images. |
Eldar Insafutdinov, Alexey Dosovitskiy |
Modelling And Unsupervised Learning Of Symmetric Deformable Object Categories
The authors propose a new approach to model and learn, without manual supervision, the symmetries of natural objects, such as faces or flowers, given only images as input. |
James Thewlis, Hakan Bilen, Andrea Vedaldi |
Multi-View Silhouette And Depth Decomposition For High Resolution 3D Object Representation
The authors consider the problem of scaling deep generative shape models to high-resolution.Drawing motivation from the canonical view representation of objects, the authors introduce a novel method for the fast up-sampling of 3D objects in voxel space through networks that perform super-resolution on the six orthographic depth projections. |
Edward Smith, Scott Fujimoto, David Meger |
Image Inpainting Via Generative Multi-column Convolutional Neural Networks
In this paper, the authors propose a generative multi-column network for image inpainting. |
Yi Wang, Xin Tao, Xiaojuan Qi, Xiaoyong Shen, Jiaya Jia |
Beyond Grids: Learning Graph Representations For Visual Recognition
The authors propose learning graph representations from 2D feature maps for visual recognition. |
Yin Li, Abhinav Gupta |
Foreground Clustering For Joint Segmentation And Localization In Videos And Images
This paper presents a novel framework in which video/image segmentation and localization are cast into a single optimization problem that integrates information from low level appearance cues with that of high level localization cues in a very weakly supervised manner. |
Abhishek Sharma |
LinkNet: Relational Embedding For Scene Graph
Objects and their relationships are critical contents for image understanding.In this paper, the authors present a novel method that improves scene graph generation by explicitly modeling inter-dependency among the entire object instances. |
Sanghyun Woo, Dahun Kim, Donghyeon Cho, In So Kweon |
Context-aware Synthesis And Placement Of Object Instances
Learning to insert an object instance into an image in a semantically coherentmanner is a challenging and interesting problem.In this paper, the authors propose an end-to-end trainable neural network for the task of inserting an object instance mask of a specified class into the semantic label map of an image. |
Donghoon Lee, Ming-Yu Liu, Ming-Hsuan Yang, Sifei Liu, Jinwei Gu, Jan Kautz |
Geometry-Aware Recurrent Neural Networks For Active Visual Recognition
The authors present recurrent geometry-aware neural networks that integrate visual in-formation across multiple views of a scene into 3D latent feature tensors, whilemaintaining an one-to-one mapping between 3D physical locations in the worldscene and latent feature locations. |
Ricson Cheng, Ziyan Wang, Katerina Fragkiadaki |
See And Think: Disentangling Semantic Scene Completion
Semantic scene completion predicts volumetric occupancy and object category of a 3D scene, which helps intelligent agents to understand and interact with the surroundings.In this work, the authors propose a disentangled framework, sequentially carrying out 2D semantic segmentation, 2D-3D reprojection and 3D semantic scene completion. |
Shice Liu, YU HU, Yiming Zeng, Qiankun Tang, Beibei Jin, Yinhe Han, Xiaowei Li |
Active Matting
Image matting is an ill-posed problem.To this end, the authors propose an active matting method with recurrent reinforcement learning. |
Xin Yang, Ke Xu, Shaozhe Chen, Shengfeng He, Baocai Yin Yin, Rynson Lau |
A Unified Feature Disentangler For Multi-Domain Image Translation And Manipulation
The authors present a novel and unified deep learning framework which is capable of learning domain-invariant representation from data across multiple domains. |
Alexander H. Liu, Yen-Cheng Liu, Yu-Ying Yeh, Yu-Chiang Frank Wang |
Turbo Learning For CaptionBot And DrawingBot
The authors study in this paper the problems of both image captioning andtext-to-image generation, and present a novel turbo learningapproach to jointly training an image-to-text generator (a.k.a.CaptionBot) and a text-to-image generator (a.k.a. |
Qiuyuan Huang, Pengchuan Zhang, Dapeng Wu, Lei Zhang |
Dialog-based Interactive Image Retrieval
Existing methods for interactive image retrieval have demonstrated the merit of integrating user feedback, improving retrieval results.In this paper, the authors introduce a new approach to interactive image search that enables users to provide feedback via natural language, allowing for more natural and effective interaction. |
Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, Rogerio Feris |
Hybrid Retrieval-Generation Reinforced Agent For Medical Image Report Generation
Generating long and coherent reports to describe medical images poses challenges to bridging visual patterns with informative human linguistic descriptions. |
Yuan Li, Xiaodan Liang, Zhiting Hu, Eric Xing |
Sequential Context Encoding For Duplicate Removal
Duplicate removal is a critical step to accomplish a reasonable amount of predictions in prevalent proposal-based object detection frameworks.In this work, the authors design a new two-stage framework to effectively select the appropriate proposal candidate for each object. |
Lu Qi, Shu Liu, Jianping Shi, Jiaya Jia |
Hybrid Knowledge Routed Modules For Large-scale Object Detection
Abstract The dominant object detection approaches treat the recognition of each region separately and overlook crucial semantic correlations between objects in one scene.Particularly, the authors present Hybrid Knowledge Routed Modules (HKRM) that incorporates the reasoning routed by two kinds of knowledge forms: an explicit knowledge module for structured constraints that are summarized with linguistic knowledge (e.g. |
ChenHan Jiang, Hang Xu, Xiaodan Liang, Liang Lin |
SNIPER: Efficient Multi-Scale Training
The authors present SNIPER, an algorithm for performing efficient multi-scale training in instance level visual recognition tasks. |
Bharat Singh, Mahyar Najibi, Larry Davis |
Revisiting Multi-Task Learning With ROCK: A Deep Residual Auxiliary Block For Visual Detection | Taylor Mordan, Nicolas THOME, Gilles Henaff, Matthieu Cord |
MetaAnchor: Learning To Detect Objects With Customized Anchors
The authors propose a novel and flexible anchor mechanism named MetaAnchor for object detection frameworks. |
Tong Yang, Xiangyu Zhang, Zeming Li, Wenqiang Zhang, Jian Sun |
Learning Hierarchical Semantic Image Manipulation Through Structured Representations
Understanding, reasoning, and manipulating semantic concepts of images have been a fundamental research problem for decades.In this work, the authors present a novel hierarchical framework for semantic image manipulation. |
Seunghoon Hong, Xinchen Yan, Honglak Lee, Thomas Huang |
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, And Camera Pose Estimation
Holistic 3D indoor scene understanding refers to jointly recovering the i) object bounding boxes, ii) room layout, and iii) camera pose, all in 3D.In this paper, the authors propose an end-to-end model that simultaneously solves all three tasks in real-time given only a single RGB image. |
Siyuan Huang, Siyuan Qi, Yinxue Xiao, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu |
3D-Aware Scene Manipulation Via Inverse Graphics
The authors aim to obtain an interpretable, expressive, and disentangled scene representation that contains comprehensive structural and textural information for each object. |
Shunyu Yao, Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, Bill Freeman, Josh Tenenbaum |
FD-GAN: Pose-guided Feature Distilling GAN For Robust Person Re-identification | Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li |
Sequence-to-Segment Networks For Segment Detection
Detecting segments of interest from an input sequence is a challenging problem which often requires not only good knowledge of individual target segments, but also contextual understanding of the entire input sequence and the relationships between the target segments.To address this problem, the authors propose the Sequence-to-Segment Network (S$^2$N), a novel end-to-end sequential encoder-decoder architecture. |
Zijun Wei, Boyu Wang, Minh Hoai Nguyen, Jianming Zhang, Zhe Lin, Xiaohui Shen, Radomir Mech, Dimitris Samaras |
Stacked Semantics-Guided Attention Model For Fine-Grained Zero-Shot Learning
Zero-Shot Learning (ZSL) is generally achieved via aligning the semantic relationships between the visual features and the corresponding class semantic descriptions.However, using the global features to represent fine-grained images may lead to sub-optimal results since they neglect the discriminative differences of local regions. |
Yunlong Yu, Zhong Ji, Yanwei Fu, Jichang Guo, Yanwei Pang, Zhongfei (Mark) Zhang |
DeepExposure: Learning To Expose Photos With Asynchronously Reinforced Adversarial Learning
The accurate exposure is the key of capturing high-quality photos in computational photography, especially for mobile phones that are limited by sizes of camera modules.Based on these sub-images, a local exposure for each sub-image is automatically learned by virtue of policy network sequentially while the reward of learning is globally designed for striking a balance of overall exposures. |
Runsheng Yu, Wenyu Liu, Yasen Zhang, Zhi Qu, Deli Zhao, Bo Zhang |
Self-Erasing Network For Integral Object Attention
Recently, adversarial erasing for weakly-supervised object attention has been deeply studied due to its capability in localizing integral object regions.To tackle such an issue as well as promote the quality of object attention, the authors introduce a simple yet effective Self-Erasing Network (SeeNet) to prohibit attentions from spreading to unexpected background regions. |
Andrew Hou, PengTao Jiang, Yunchao Wei, Ming-Ming Cheng |
Searching For Efficient Multi-Scale Architectures For Dense Image Prediction
The design of neural network architectures is an important component for achieving state-of-the-art performance with machine learning systems across a broad array of tasks. |
Maxwell Collins, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Bo Chen, Jon Shlens |
DifNet: Semantic Segmentation By Diffusion Networks
Deep Neural Networks (DNNs) have recently shown state of the art performance on semantic segmentation tasks, however, they still suffer from problems of poor boundary localization and spatial fragmented predictions.The proposed DifNet consistently produces improvements over the baseline models with the same depth and with the equivalent number of parameters, and also achieves promising performance on Pascal VOC and Pascal Context dataset. |
Peng Jiang, Fanglin Gu, Yunhai Wang, Changhe Tu, Baoquan Chen |
Learning A High Fidelity Pose Invariant Model For High-resolution Face Frontalization
Face frontalization refers to the process of synthesizing the frontal view of a face from a given profile.This paper proposes a High Fidelity Pose Invariant Model (HF-PIM) to produce photographic and identity-preserving results. |
Jie Cao, Yibo Hu, Hongwen Zhang, Ran He, Zhenan Sun |
Attention In Convolutional LSTM For Gesture Recognition
Convolutional long short-term memory (LSTM) networks have been widely used for action/gesture recognition, and different attention mechanisms have also been embedded into the LSTM or the convolutional LSTM (ConvLSTM) networks. |
Liang Zhang, Guangming Zhu, Lin Mei, Peiyi Shen, Syed Afaq Ali Shah, Mohammed Bennamoun |
Partially-Supervised Image Captioning
Image captioning models are becoming increasingly successful at describing the content of images in restricted domains.The authors then propose a novel algorithm for training sequence models, such as recurrent neural networks, on partially-specified sequences which the authors represent using finite state automata. |
Peter Anderson, Stephen Gould, Mark Johnson |
Learning To Specialize With Knowledge Distillation For Visual Question Answering
Visual Question Answering (VQA) is a notoriously challenging problem because it involves various heterogeneous tasks defined by questions within a unified framework.The authors present a principled algorithm to learn specialized models with knowledge distillation under a multiple choice learning (MCL) framework, where training examples are assigned dynamically to a subset of models for updating network parameters. |
Jonghwan Mun, Kimin Lee, Jinwoo Shin, Bohyung Han |
Chain Of Reasoning For Visual Question Answering
Reasoning plays an essential role in Visual Question Answering (VQA).This paper proposes a novel reasoning model for addressing these problems. |
Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong |
Learning Conditioned Graph Structures For Interpretable Visual Question Answering
Visual Question answering is a challenging problem requiring a combination of concepts from Computer Vision and Natural Language Processing.Nonetheless, very few rely on higher level image representations, which can capture semantic and spatial relationships. |
Will Norcliffe-Brown, Stathis Vafeias, Sarah Parisot |
Out Of The Box: Reasoning With Graph Convolution Nets For Factual Visual Question Answering | Medhini Narasimhan, Svetlana Lazebnik, Alex Schwing |
Overcoming Language Priors In Visual Question Answering With Adversarial Regularization
Modern Visual Question Answering (VQA) models have been shown to rely heavily on superficial correlations between question and answer words learned during training -- eg overwhelmingly reporting the type of room as kitchen or the sport being played as tennis, irrespective of the image. |
Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee |
Non-Local Recurrent Network For Image Restoration
Many classic methods have shown non-local self-similarity in natural images to be an effective prior for image restoration.In this paper, the authors propose a non-local recurrent network (NLRN) as the first attempt to incorporate non-local operations into a recurrent neural network (RNN) for image restoration. |
Ding Liu, Bihan Wen, Yuchen Fan, Chen Change Loy, Thomas Huang |
Neural Nearest Neighbors Networks
Non-local methods exploiting the self-similarity of natural signals have been well studied, for example in image analysis and restoration.To overcome this, the authors propose a continuous deterministic relaxation of KNN selection that maintains differentiability w.r.t. |
Tobias Plötz, Stefan Roth |
Training Deep Learning Based Denoisers Without Ground Truth Data
Recently developed deep-learning-based denoisers often outperform state-of-the-art conventional denoisers, such as the BM3D.In this article, the authors propose a method based on Stein’s unbiased risk estimator (SURE) for training deep neural network denoisers only based on the use of noisy images. |
Shakarim Soltanayev, Se Young Chun |
Adversarial Regularizers In Inverse Problems
Inverse Problems in medical imaging and computer vision are traditionally solved using purely model-based methods.The authors propose a new framework for applying data-driven approaches to inverse problems, using a neural network as a regularization functional. |
Sebastian Lunz, Carola Schoenlieb, Ozan Öktem |
Densely Connected Attention Propagation For Reading Comprehension
The authors propose DecaProp (Densely Connected Attention Propagation), a new densely connected neural architecture for reading comprehension (RC). |
Yi Tay, Anh Tuan Luu, Siu Cheung Hui, Jian Su |
Layer-Wise Coordination Between Encoder And Decoder For Neural Machine Translation
Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures.In this paper, the authors propose the concept of layer-wise coordination for NMT, which explicitly coordinates the learning of hidden representations of the encoder and decoder together layer by layer, gradually from low level to high level. |
Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, Tie-Yan Liu |
E-SNLI: Natural Language Inference With Natural Language Explanations
In order for machine learning to garner widespread public adoption, models must be able to provide interpretable and robust explanations for their decisions, as well as learn from human-provided explanations at train time.The authors further implement models that incorporate these explanations into their training process and output them at test time. |
Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom |
The Challenge Of Realistic Music Generation: Modelling Raw Audio At Scale
Realistic music generation is a challenging task.When building generative models of music that are learnt from data, typically high-level representations such as scores or MIDI are used that abstract away the idiosyncrasies of a particular performance. |
Sander Dieleman, Aaron Van Den Oord, Karen Simonyan |
Fully Neural Network Based Speech Recognition On Mobile And Embedded Devices
Real-time automatic speech recognition (ASR) on mobile and embedded devices has been of great interests for many years.The authors present real-time speech recognition on smartphones or embedded systems by employing recurrent neural network (RNN) based acoustic models, RNN based language models, and beam-search decoding. |
Jinhwan Park, Yoonho Boo, Iksoo Choi, Sungho Shin, Wonyong Sung |
Transfer Learning From Speaker Verification To Multispeaker Text-To-Speech Synthesis
The authors describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training. |
Ye Jia, Yu Zhang, Ron Weiss, Quan Wang, Jonathan Shen, Fei Ren, ZF Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu |
SING: Symbol-to-Instrument Neural Generator
Recent progress in deep learning for audio synthesis opensthe way to models that directly produce the waveform, shifting awayfrom the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation.Inthis work, the authors study the more computationally efficient alternative of generating the waveform frame-by-frame with large strides.The authors present a lightweight neural audio synthesizer for the original task of generating musical notes given desired instrument, pitch and velocity. |
Alexandre Defossez, Neil Zeghidour, Nicolas Usunier, Leon Bottou, Francis Bach |
Neural Voice Cloning With A Few Samples | Sercan Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou |
GroupReduce: Block-Wise Low-Rank Approximation For Neural Language Model Shrinking | Patrick Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh |
Dialog-to-Action: Conversational Question Answering Over A Large-Scale Knowledge Base
The authors present an approach to map utterances in conversation to logical forms, which will be executed on a large-scale knowledge base. |
Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, Jian Yin |
Generating Informative And Diverse Conversational Responses Via Adversarial Information Maximization
Responses generated by neural conversational models tend to lack informativeness and diversity.The authors present Adversarial Information Maximization (AIM), an adversarial learning framework that addresses these two related but distinct problems. |
Yizhe Zhang, Michel Galley, Jianfeng Gao, Zhe Gan, Xiujun Li, Chris Brockett, Bill Dolan |
Answerer In Questioner's Mind: Information Theoretic Approach To Goal-Oriented Visual Dialog | Sang-Woo Lee, Yu-Jung Heo, Byoung-Tak Zhang |
Trajectory Convolution For Action Recognition
How to leverage the temporal dimension is a key question in video analysis.In this work, the authors propose a new CNN architecture TrajectoryNet, which incorporates trajectory convolution, a new operation for integrating features along the temporal dimension, to replace the existing temporal convolution. |
Yue Zhao, Yuanjun Xiong, Dahua Lin |
Video Prediction Via Selective Sampling
(1) Video prediction can be approached as a stochastic process. (2) De-coupling combined loss functions into dedicatedly designed sub-networks encourages them to work in a collaborative way. |
Jingwei Xu, Bingbing Ni, Xiaokang Yang |
Unsupervised Learning Of Artistic Styles With Archetypal Style Analysis
The authors introduce an unsupervised learning approach to automatically dis-cover, summarize, and manipulate artistic styles from large collections of paintings. |
Daan Wynen, Cordelia Schmid, Julien Mairal |
Attacks Meet Interpretability: Attribute-steered Detection Of Adversarial Samples
Adversarial sample attacks perturb benign inputs to induce DNN misbehaviors.Therefore, the authors propose a novel adversarial sample detection technique for face recognition models, based on interpretability. |
Guanhong Tao, Shiqing Ma, Yingqi Liu, Xiangyu Zhang |
Speaker-Follower Models For Vision-and-Language Navigation
Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers. |
Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, LP Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell |
Neural Code Comprehension: A Learnable Representation Of Code Semantics
With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis.Most works attempt to process the code directly or use a syntactic tree representation, treating it like sentences written in a natural language. |
Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler |
MiME: Multilevel Medical Embedding Of Electronic Health Records For Predictive Healthcare
Deep learning models exhibit state-of-the-art performance for many predictive healthcare tasks using electronic health records (EHR) data.The authors propose Multilevel Medical Embedding (MiME) which learns the multilevel embedding of EHR data while jointly performing auxiliary prediction tasks that rely on this inherent EHR structure without the need for external labels. |
Edward Choi, Cao Xiao, Walter Stewart, Jimeng Sun |
Distilled Wasserstein Learning For Word Embedding And Topic Modeling
The authors propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. |
Hongteng Xu, Wenlin Wang, Wei Liu, Lawrence Carin |
Learning To Optimize Tensor Programs
The authors introduce a learning-based framework to optimize tensor programs for deep learning workloads. |
Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy |
Learning To Solve SMT Formulas
The authors present a new approach for learning to solve SMT formulas. |
Mislav Balunovic, Pavol Bielik, Martin Vechev |
Data Center Cooling Using Model-predictive Control
Despite impressive recent advances in reinforcement learning (RL), its deployment in real-world physical systems is often complicated by unexpected events, limited data, and the potential for expensive failures.In this paper, the authors describe an application of RL “in the wild” to the task of regulating temperatures and airflow inside a large-scale data center (DC). |
Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, MK Ryu, Greg Imwalle |
Bayesian Inference Of Temporal Task Specifications From Demonstrations
When observing task demonstrations, human apprentices are able to identify whether a given task is executed correctly long before they gain expertise in actually performing that task.Inspired by this, the authors present Bayesian specification inference, a probabilistic model for inferring task specification as a temporal logic formula. |
Ankit Shah, Pritish Kamath, Julie A Shah, Shen Li |
Training Deep Neural Networks With 8-bit Floating Point Numbers
The state-of-the-art hardware platforms for training deep neural networks are moving from traditional single precision (32-bit) computations towards 16 bits of precision - in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations. |
Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, Kailash Gopalakrishnan |
Snap ML: A Hierarchical Framework For Machine Learning
The authors describe a new software framework for fast training of generalized linear models. |
Celestine Dünner, Thomas Parnell, Dimitrios Sarigiannis, Nikolas Ioannou, Andreea Anghel, Gummadi Ravi, Madhusudanan Kandasamy , Haris Pozidis |
Learning Filter Widths Of Spectral Decompositions With Wavelets
Time series classification using deep neural networks, such as convolutional neural networks (CNN), operate on the spectral decomposition of the time series computed using a preprocessing step.The authors propose the wavelet deconvolution (WD) layer as an efficient alternative to this preprocessing step that eliminates a significant number of hyperparameters. |
Haidar Khan, Bulent Yener |
A Likelihood-Free Inference Framework For Population Genetic Data Using Exchangeable Neural Networks
An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data.Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. |
Jeffrey Chan, Valerio Perrone, Jeffrey Spence, Paul Jenkins, Sara Mathieson, Yun Song |
Latent Gaussian Activity Propagation: Using Smoothness And Structure To Separate And Localize Sounds In Large Noisy Environments
The authors present an approach for simultaneously separating and localizingmultiple sound sources using recorded microphone data. |
Daniel Johnson, Daniel Gorelik, Ross E Mawhorter, Kyle Suver, Weiqing Gu, Steven Xing, Cody Gabriel, Peter Sankhagowit |
Inferring Latent Velocities From Weather Radar Data Using Gaussian Processes
Archived data from the US network of weather radars hold detailed information about bird migration over the last 25 years, including very high-resolution partial measurements of velocity.This paper presents a Gaussian process (GP) model to reconstruct high-resolution full velocity fields across the entire US. |
Rico Angell, Dan Sheldon |
Bayesian Nonparametric Spectral Estimation
Spectral estimation (SE) aims to identify how the energy of a signal (e.g., a time series) is distributed across different frequencies.In this context, the authors propose a joint probabilistic model for signals, observations and spectra, where SE is addressed as an inference problem. |
Felipe Tobar |
Learning A Warping Distance From Unlabeled Time Series Using Sequence Autoencoders
Measuring similarities between unlabeled time series trajectories is an important problem in many domains such as medicine, economics, and vision.In this paper, the authors propose an end-to-end framework, autowarp, that optimizes and learns a good metric given unlabeled trajectories. |
Abubakar Abid, James Zou |
Precision And Recall For Time Series
Classical anomaly detection is principally concerned with point-based anomalies, those anomalies that occur at a single point in time.Motivated by this observation, the authors present a new mathematical model to evaluate the accuracy of time series classification algorithms. |
Nesime Tatbul, Tae Jun Lee, Stan Zdonik, Mejbah Alam, Justin Gottschlich |
Deep Generative Markov State Models
The authors propose a deep generative Markov State Model (DeepGenMSM) learning framework for inference of metastable dynamical systems and prediction of trajectories. |
Hao Wu, Andreas Mardt, Luca Pasquali, Frank Noe |
Doubly Robust Bayesian Inference For Non-Stationary Streaming Data With $\beta$-Divergences
The authors present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with $eta$-divergences. |
Jeremias Knoblauch, Jack E Jewson, Theo Damoulas |
Regularization Learning Networks: Deep Learning For Tabular Datasets
Despite their impressive performance, Deep Neural Networks (DNNs) typically underperform Gradient Boosting Trees (GBTs) on many tabular-dataset learning tasks.The authors propose that applying a different regularization coefficient to each weight might boost the performance of DNNs by allowing them to make more use of the more relevant inputs. |
Ira Shavitt, Eran Segal |
Generative Modeling For Protein Structures
Analyzing the structure and function of proteins is a key part of understanding biology at the molecular and cellular level.In addition, a major engineering challenge is to design new proteins in a principled and methodical way. |
Namrata Anand, Possu Huang |
Geometry Based Data Generation
The authors propose a new type of generative model for high-dimensional data that learns a manifold geometry of the data, rather than density, and can generate points evenly along this manifold. |
Ofir Lindenbaum, Jay Stanley, Guy Wolf, Smita Krishnaswamy |
Learning Concave Conditional Likelihood Models For Improved Analysis Of Tandem Mass Spectra
The most widely used technology to identify the proteins present in a complex biological sample is tandem mass spectrometry, which quickly produces a large collection of spectra representative of the peptides (i.e., protein subsequences) present in the original sample. |
John T Halloran, David M Rocke |
Generalizing Tree Probability Estimation Via Bayesian Networks
Probability estimation is one of the fundamental tasks in statistics and machine learning. |
Cheng Zhang, Frederick A Matsen IV |
Learning Temporal Point Processes Via Reinforcement Learning
Social goods, such as healthcare, smart city, and information networks, often produce ordered event data in continuous time.To alleviate the risk of model-misspecification in MLE, the authors propose to generate samples from the generative model and monitor the quality of the samples in the process of training until the samples and the real data are indistinguishable. |
Shuang Li, Benjamin Xiao, Shixiang Zhu, Nan Du, Yao Xie, Le Song |
Exponentially Weighted Imitation Learning For Batched Historical Data
The authors consider deep policy learning with only batched historical trajectories.To solve this problem, the authors propose a monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space. |
Qing Wang, Jiechao Xiong, Lei Han, Peng Sun, Han Liu, Tong Zhang |
Evidential Deep Learning To Quantify Classification Uncertainty
Deterministic neural nets have been shown to learn effective predictors on a wide range of machine learning problems.Orthogonally to Bayesian neural nets that indirectly infer prediction uncertainty through weight uncertainties, the authors propose explicit modeling of the same using the theory of subjective logic. |
Murat Sensoy, Lance Kaplan, Melih Kandemir |
Explanations Based On The Missing: Towards Contrastive Explanations With Pertinent Negatives
In this paper the authors propose a novel method that provides contrastive explanations justifying the classification of an input by a black box classifier such as a deep neural network. |
Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, Payel Das |
Towards Robust Interpretability With Self-Explaining Neural Networks
Most recent work on interpretability of complex machine learning models has focused on estimating a-posteriori explanations for previously trained models around specific predictions.The authors propose three desiderata for explanations in general -- explicitness, faithfulness, and stability -- and show that existing methods do not satisfy them. |
David Alvarez Melis, Tommi Jaakkola |
Model Agnostic Supervised Local Explanations
Model interpretability is an increasingly important component of practical machine learning.One of the main challenges in interpretability is designing explanation systems that can capture aspects of each of these explanation types, in order to develop a more thorough understanding of the model. |
Gregory Plumb, Denali Molitor, Ameet Talwalkar |
To Trust Or Not To Trust A Classifier
The authors propose a new score, called the {\it trust score}, which measures the agreement between the classifier and a modified nearest-neighbor classifier on the testing example. |
Heinrich Jiang, Been Kim, Melody Guan, Maya Gupta |
Predict Responsibly: Improving Fairness And Accuracy By Learning To Defer
In many machine learning applications, there are multiple decision-makers involved, both automated and human.The authors propose a learning algorithm which accounts for potential biases held by external decision-makers in a system. |
David Madras, Toni Pitassi, Richard Zemel |
Hunting For Discriminatory Proxies In Linear Regression Models
A machine learning model may exhibit discrimination when used to make decisions involving people.In this paper the authors formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies. |
Samuel Yeom, Anupam Datta, Matt Fredrikson |
Empirical Risk Minimization Under Fairness Constraints
The authors address the problem of algorithmic fairness: ensuring that sensitive information does not unfairly influence the outcome of a classifier.The authors present an approach based on empirical risk minimization, which incorporates a fairness constraint into the learning problem. |
Michele Donini, Luca Oneto, Shai Ben-David, John Shawe-Taylor, Massimiliano Pontil |
Approximation Algorithms For Stochastic Clustering
The authors consider stochastic settings for clustering, and develop provably-good (approximation) algorithms for a number of these notions. |
David Harris, Shi Li, Aravind Srinivasan, Khoa Trinh, Thomas Pensyl |
Re-evaluating Evaluation
Progress in machine learning is measured by careful evaluation on problems of outstanding common interest.Deliberate or accidental cherry picking is increasingly likely, and designing well-balanced evaluation suites requires increasing effort. |
David Balduzzi, Karl Tuyls, Julien Perolat, Thore Graepel |
Does Mitigating ML's Impact Disparity Require Treatment Disparity?
Following precedent in employment discrimination law, two notions of disparity are widely-discussed in papers on fairness and ML.In this paper, the authors show that: (i) when sensitive and (nominally) nonsensitive features are correlated, DLPs will indirectly implement treatment disparity, undermining the policy desiderata they are designed to address; (ii) when group membership is partly revealed by other features, DLPs induce within-class discrimination; and (iii) in general, DLPs provide suboptimal trade-offs between accuracy and impact parity. |
Zachary Lipton, Julian McAuley, Alexandra Chouldechova |
Enhancing The Accuracy And Fairness Of Human Decision Making
Societies often rely on human experts to take a wide variety of decisions affecting their members, from jail-or-release decisions taken by judges and stop-and-frisk decisions taken by police officers to accept-or-reject decisions taken by academics. |
Isabel Valera, Adish Singla, Manuel Gomez Rodriguez |
The Price Of Fair PCA: One Extra Dimension
The authors investigate whether the standard dimensionality reduction technique of PCA inadvertently produces data representations with different fidelity for two different populations. |
Samira Samadi, Tao (Uthaipon) Tantipongpipat, Jamie Morgenstern, Mohit Singh, Santosh Vempala |
Practical Methods For Graph Two-Sample Testing
Hypothesis testing for graphs has been an important tool in applied research fields for more than two decades, and still remains a challenging problem as one often needs to draw inference from few replicates of large graphs. |
Debarghya Ghoshdastidar, Ulrike Von Luxburg |
Topkapi: Parallel And Fast Sketches For Finding Top-K Frequent Elements
Identifying the top-K frequent items is one of the most common and important operations in large data processing systems.As a result, several solutions have been proposed to solve this problem approximately. |
Ankush Mandal, He Jiang, ANSHUMALI Shrivastava, Vivek Sarkar |
KDGAN: Knowledge Distillation With Generative Adversarial Networks
The authors propose a three-player game named KDGAN consisting of a classifier, a teacher, and a discriminator. |
Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi |
Modeling Dynamic Missingness Of Implicit Feedback For Recommendation
Implicit feedback is widely used in collaborative filtering methods for recommendation.To model and exploit the dynamics of missingness, the authors propose a latent variable named ``emph{user intent}' to govern the temporal changes of item missingness, and a hidden Markov model to represent such a process. |
Menghan Wang, Mingming Gong, Xiaolin Zheng, Kun Zhang |
Gamma-Poisson Dynamic Matrix Factorization Embedded With Metadata Influence
A conjugate Gamma-Poisson model for Dynamic Matrix Factorization incorporated with metadata influence (mGDMF for short) is proposed to effectively and efficiently model massive, sparse and dynamic data in recommendations. |
Trong Dinh Thac Do, Longbing Cao |
Non-metric Similarity Graphs For Maximum Inner Product Search
In this paper the authors address the problem of Maximum Inner Product Search (MIPS) that is currently the computational bottleneck in a large number of machine learning applications.The authors propose to solve the MIPS problem with the usage of similarity graphs, i.e., graphs where each vertex is connected to the vertices that are the most similar in terms of some similarity function. |
Stanislav Morozov, Artem Babenko |
Norm-Ranging LSH For Maximum Inner Product Search
Neyshabur and Srebro proposed SIMPLE-LSH, which is the state-of-the-art hashing based algorithm for maximum inner product search (MIPS). |
Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, James Cheng |
A Dual Framework For Low-rank Tensor Completion
One of the popular approaches for low-rank tensor completion is to use the latent trace norm regularization.The experiments illustrate the efficacy of the proposed algorithm on several real-world datasets across applications. |
Madhav Nimishakavi, Pratik Kumar Jawanpuria, Bamdev Mishra |
Low-Rank Tucker Decomposition Of Large Tensors Using TensorSketch
The authors propose two randomized algorithms for low-rank Tucker decomposition of tensors. |
Osman Malik, Stephen Becker |
Dropping Symmetry For Fast Symmetric Nonnegative Matrix Factorization
Symmetric nonnegative matrix factorization (NMF)---a special but important class of the general NMF---is demonstrated to be useful for data analysis and in particular for various clustering tasks.Unfortunately, designing fast algorithms for Symmetric NMF is not as easy as for the nonsymmetric counterpart, the latter admitting the splitting property that allows efficient alternating-type algorithms. |
Zhihui Zhu, Xiao Li, Kai Liu, Qiuwei Li |
Semidefinite Relaxations For Certifying Robustness To Adversarial Examples
Despite their impressive performance on diverse tasks, neural networks fail catastrophically in the presence of adversarial inputs—imperceptibly but adversarially perturbed versions of natural inputs.In this paper, the authors propose a new semidefinite relaxation for certifying robustness that applies to arbitrary ReLU networks. |
Aditi Raghunathan, Jacob Steinhardt, Percy Liang |
Privacy Amplification By Subsampling: Tight Analyses Via Couplings And Divergences
Differential privacy comes equipped with multiple analytical tools for thedesign of private data analyses. |
Borja Balle, Gilles Barthe, Marco Gaboardi |
Differentially Private Testing Of Identity And Closeness Of Discrete Distributions
The authors study the fundamental problems of identity testing (goodness of fit), and closeness testing (two sample test) of distributions over $k$ elements, under differential privacy. |
Jayadev Acharya, Ziteng Sun, Huanyu Zhang |
Differentially Private K-Means With Constant Multiplicative Error
The authors design new differentially private algorithms for the Euclidean k-means problem, both in the centralized model and in the local model of differential privacy. |
Uri Stemmer, Haim Kaplan |
Local Differential Privacy For Evolving Data
There are now several large scale deployments of differential privacy used to collect statistical information about users. |
Matthew Joseph, Aaron Roth, Jonathan Ullman, Bo Waggoner |
Adversarial Attacks On Stochastic Bandits
The authors study adversarial attacks that manipulate the reward signals to control the actions chosen by a stochastic multi-armed bandit algorithm. |
Kwang-Sung Jun, Lihong Li, Yuzhe Ma, Jerry Zhu |
Distributed Learning Without Distress: Privacy-Preserving Empirical Risk Minimization
Distributed learning allows a group of independent data owners to collaboratively learn a model over their data sets without exposing their private data.The authors present a distributed learning approach that combines differential privacy with secure multi-party computation. |
Bargav Jayaraman, Lingxiao Wang, David Evans, Quanquan Gu |
A Spectral View Of Adversarially Robust Features
Given the apparent difficulty of learning models that are robust to adversarial perturbations, the authors propose tackling the simpler problem of developing adversarially robust features. |
Shivam Garg, Vatsal Sharan, Brian Zhang, Gregory Valiant |
Efficient Formal Safety Analysis Of Neural Networks
Neural networks are increasingly deployed in real-world safety-critical domains such as autonomous driving, aircraft collision avoidance, and malware detection.In this paper, the authors present a new efficient approach for rigorously checking different safety properties of neural networks that significantly outperforms existing approaches by multiple orders of magnitude. |
Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, Suman Jana |
Contamination Attacks And Mitigation In Multi-Party Machine Learning
Machine learning is data hungry; the more data a model has access to in training, the more likely it is to perform well at inference time. |
Jamie Hayes, Olga Ohrimenko |
Explaining Deep Learning Models -- A Bayesian Non-parametric Approach
Understanding and interpreting how machine learning (ML) models make decisions have been a big challenge.While recent research has proposed various technical approaches to provide some clues as to how an ML model makes individual predictions, they cannot provide users with an ability to inspect a model as a complete entity. |
Wenbo Guo, Sui Huang, Yunzhe Tao, Xinyu (X.Y.) Xing, Lin Lin |
Data-Driven Clustering Via Parameterized Lloyd's Families
Algorithms for clustering points in metric spaces is a long-studied area of research. |
Maria-Florina Balcan, Travis Dick, Colin White |
Manifold Structured Prediction
Structured prediction provides a general framework to deal with supervised problems where the outputs have semantically rich structure.Specifically, the authors study a structured prediction approach to manifold-valued regression. |
Alessandro Rudi, Carlo Ciliberto, GianMaria Marconi, Lorenzo Rosasco |
Loss Surfaces, Mode Connectivity, And Fast Ensembling Of DNNs
The loss functions of deep neural networks are complex and their geometric properties are not well understood.The authors introduce a training procedure to discover these high-accuracy pathways between modes. |
Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, Andrew Wilson |
Masking: A New Perspective Of Noisy Supervision
It is important to learn various types of classifiers given training data with noisy labels.In this paper, the authors propose a human-assisted approach called 'Masking' that conveys human cognition of invalid class transitions and naturally speculates the structure of the noise transition matrix. |
Bo Han, Jiangchao Yao, Gang Niu, Mingyuan Zhou, Ivor Tsang, Ya Zhang, Masashi Sugiyama |
Supervising Unsupervised Learning
The authors introduce a framework to transfer knowledge acquired from a repository of (heterogeneous) supervised datasets to new unsupervised datasets. |
Vikas Garg, Adam Kalai |
One-Shot Unsupervised Cross Domain Translation | Sagie Benaim, Lior Wolf |
Neural Architecture Search With Bayesian Optimisation And Optimal Transport
Bayesian Optimisation (BO) refers to a class of methods for global optimisationof a function f which is only accessible via point evaluations. |
Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, Eric Xing |
Adapted Deep Embeddings: A Synthesis Of Methods For K-Shot Inductive Transfer Learning | Tyler Scott, Karl Ridgeway, Mike Mozer |
Completing State Representations Using Spectral Learning
A central problem in dynamical system modeling is state discovery—that is, finding a compact summary of the past that captures the information needed to predict the future.Predictive State Representations (PSRs) enable clever spectral methods for state discovery; however, while consistent in the limit of infinite data, these methods often suffer from poor performance in the low data regime. |
Nan Jiang, Alex Kulesza, Satinder Singh |
Data-Efficient Hierarchical Reinforcement Learning
Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve more complex tasks.Yet, the majority of current HRL methods require careful task-specific design and on-policy training, making them difficult to apply in real-world scenarios. |
Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, Sergey Levine |
The Cluster Description Problem - Complexity Results, Formulations And Approximations
Consider the situation where you are given an existing $k$-way clustering $pi$. |
Ian Davidson, Antoine Gourru, S Ravi |
Balanced Policy Evaluation And Learning
The authors present a new approach to the problems of evaluating and learning personalized decision policies from observational data of past contexts, decisions, and outcomes. |
Nathan Kallus |
Exponentiated Strongly Rayleigh Distributions
Strongly Rayleigh (SR) measures are discrete probability distributions over the subsets of a ground set.The authors introduce in this paper Exponentiated Strongly Rayleigh (ESR) measures, which sharpen (or smoothen) the negative dependence property of SR measures via a single parameter (the exponent) that can intuitively understood as an inverse temperature. |
Zelda Mariet, Suvrit Sra, Stefanie Jegelka |
Parsimonious Bayesian Deep Networks
Combining Bayesian nonparametrics and a forward model selection strategy, the authors construct parsimonious Bayesian deep networks (PBDNs) that infer capacity-regularized network architectures from the data and require neither cross-validation nor fine-tuning when training the model. |
Mingyuan Zhou |
Stein Variational Gradient Descent As Moment Matching
Stein variational gradient descent (SVGD) is a non-parametric inference algorithm that evolves a set of particles to fit a given distribution of interest. |
Qiang Liu, Dilin Wang |
Hamiltonian Variational Auto-Encoder
Variational Auto-Encoders (VAE) have become very popular techniques to performinference and learning in latent variable models as they allow us to leverage the richrepresentational power of neural networks to obtain flexible approximations of theposterior of latent variables as well as tight evidence lower bounds (ELBO). |
Anthony L Caterini, Arnaud Doucet, Dino Sejdinovic |
Predictive Approximate Bayesian Computation Via Saddle Points
Approximate Bayesian computation (ABC) is an important methodology for Bayesian inference when the likelihood function is intractable.In this paper, the authors introduce an optimization-based ABC framework that addresses these deficiencies. |
Yingxiang Yang, Bo Dai, Negar Kiyavash, Niao He |
Importance Weighting And Variational Inference
Recent work used importance sampling ideas for better variational bounds on likelihoods. |
Justin Domke, Dan Sheldon |
Orthogonally Decoupled Variational Gaussian Processes
Gaussian processes (GPs) provide a powerful non-parametric framework for reasoning over functions.Despite appealing theory, its superlinear computational and memory complexities have presented a long-standing challenge. |
Hugh Salimbeni, Ching-An Cheng, Byron Boots, Marc Deisenroth |
ATOMO: Communication-efficient Learning Via Atomic Sparsification
Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes.To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. |
Hongyi Wang, Scott Sievert, Shengchao Liu, Micky Charles, Dimitrios Papailiopoulos, Stephen Wright |
Sparsified SGD With Memory
Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e.Various recent works proposed to use quantization or sparsification techniques to reduce the amount of data that needs to be communicated, for instance by only sending the most significant entries of the stochastic gradient (top-k sparsification). |
Sebastian Stich, Jean-Baptiste Cordonnier, Martin Jaggi |
SEGA: Variance Reduction Via Gradient Sketching
The authors propose a novel randomized first order optimization method---SEGA (SkEtched GrAdient method)---which progressively throughout its iterations builds a variance-reduced estimate of the gradient from random linear measurements (sketches) of the gradient provided at each iteration by an oracle. |
Filip Hanzely, Konstantin Mishchenko, Peter Richtarik |
Non-monotone Submodular Maximization In Exponentially Fewer Iterations
In this paper the authors consider parallelization for applications whose objective can beexpressed as maximizing a non-monotone submodular function under a cardinality constraint. |
Eric Balkanski, Adam Breuer, Yaron Singer |
Stochastic Primal-Dual Method For Empirical Risk Minimization With O(1) Per-Iteration Complexity
Regularized empirical risk minimization problem with linear predictor appears frequently in machine learning.In this paper, the authors propose a new stochastic primal-dual method to solve this class of problems. |
Conghui Tan, Tong Zhang, Shiqian Ma, Ji Liu |
Rest-Katyusha: Exploiting The Solution's Structure Via Scheduled Restart Schemes
The authors propose a structure-adaptive variant of the state-of-the-art stochastic variance-reduced gradient algorithm Katyusha for regularized empirical risk minimization. |
Junqi Tang, Mohammad Golbabaee, Francis Bach, Mike E Davies |
Inexact Trust-region Algorithms On Riemannian Manifolds
The authors consider an inexact variant of the popular Riemannian trust-region algorithm for structured big-data minimization problems.The proposed algorithm approximates the gradient and the Hessian in addition to the solution of a trust-region sub-problem. |
Hiroyuki Kasai, Bamdev Mishra |
On Markov Chain Gradient Descent
Stochastic gradient methods are the workhorse (algorithms) of large-scale optimization problems in machine learning, signal processing, and other computational sciences and engineering.To obtain these results, the authors introduce a new technique that varies the mixing levels of the Markov chains. |
Tao Sun, Yuejiao Sun, Wotao Yin |
Gradient Descent Meets Shift-and-Invert Preconditioning For Eigenvector Computation
Shift-and-invert preconditioning, as a classic acceleration technique for the leading eigenvector computation, has received much attention again recently, owing to fast least-squares solvers for efficiently approximating matrix inversions in power iterations.The authors present a novel convergence analysis for the constant step-size setting that achieves a rate at $ ilde{O}(sqrt{frac{lambda_{1}}{lambda_{1}-lambda_{p+1}}})$, where $lambda_{i}$ represents the $i$-th largest eigenvalue of the given real symmetric matrix and $p$ is the multiplicity of $lambda_{1}$. |
Zhiqiang Xu |
Global Non-convex Optimization With Discretized Diffusions
An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems.This allows us to design diffusions suitable for globally optimizing convex and non-convex functions not covered by the existing Langevin theory. |
Murat A Erdogdu, Lester Mackey, Ohad Shamir |
A Theory On The Absence Of Spurious Solutions For Nonconvex And Nonsmooth Optimization
The authors study the set of continuous functions that admit no spurious local optima (i.e. |
Cedric Josz, Yi Ouyang, Richard Zhang, Javad Lavaei, Somayeh Sojoudi |
The Limit Points Of (Optimistic) Gradient Descent In Min-Max Optimization
Motivated by applications in Optimization, Game Theory, and the training of Generative Adversarial Networks, the convergence properties of first order methods in min-max problems have received extensive study. |
Constantinos Daskalakis, Ioannis Panageas |
Porcupine Neural Networks: Approximating Neural Network Landscapes
Neural networks have been used prominently in several machine learning and statistics applications.In particular, for two-layer neural networks the authors introduce Porcupine Neural Networks (PNNs) whose weight vectors are constrained to lie over a finite set of lines. |
Soheil Feizi, Hamid Javadi, Jesse Zhang, David Tse |
Adding One Neuron Can Eliminate All Bad Local Minima
One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima.In this paper, the authors study the landscape of neural networks for binary classification tasks. |
SHIYU LIANG, Ruoyu Sun, Jason Lee, R. Srikant |
Improving Explorability In Variational Inference With Annealed Variational Objectives | Chin-Wei Huang, Shawn Tan, Alexandre Lacoste, Aaron Courville |
Sequential Attend, Infer, Repeat: Generative Modelling Of Moving Objects
The authors present Sequential Attend, Infer, Repeat (SQAIR), an interpretable deep generative model for image sequences.It can reliably discover and track objects through the sequence; it can also conditionally generate future frames, thereby simulating expected motion of objects. |
Adam Kosiorek, Hyunjik Kim, Yee Whye Teh, Ingmar Posner |
Delta-encoder: An Effective Sample Synthesis Method For Few-shot Object Recognition
Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision.In this work, the authors propose a simple yet effective method for few-shot (and one-shot) object recognition. |
Eli Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Abhishek Kumar, Rogerio Feris, Raja Giryes, Alex Bronstein |
Joint Active Feature Acquisition And Classification With Variable-Size Set Encoding | Hajin Shim, Sung Ju Hwang, Eunho Yang |
PCA Of High Dimensional Random Walks With Comparison To Neural Network Training | Joseph Antognini, Jascha Sohl-Dickstein |
Insights On Representational Similarity In Neural Networks With Canonical Correlation | Ari Morcos, Maithra Raghu, Samy Bengio |
Adversarial Vulnerability For Any Classifier
Despite achieving impressive performance, state-of-the-art classifiers remain highly vulnerable to small, imperceptible, adversarial perturbations.In this paper, the authors study the phenomenon of adversarial perturbations under the assumption that the data is generated with a smooth generative model. |
Alhussein Fawzi, Hamza Fawzi, Omar Fawzi |
Sanity Checks For Saliency Maps
Saliency methods have emerged as a popular tool to highlight features in an inputdeemed relevant for the prediction of a learned model.Several saliency methodshave been proposed, often guided by visual appeal on image data. |
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim |
MetaGAN: An Adversarial Approach To Few-Shot Learning
In this paper, the authors propose a conceptually simple and general framework called MetaGAN for few-shot learning problems. |
Ruixiang ZHANG, Tong Che, Zoubin Ghahramani, Yoshua Bengio, Yangqiu Song |
Deep Generative Models With Learnable Knowledge Constraints
The broad set of deep generative models (DGMs) has achieved remarkable advances. |
Zhiting Hu, Zichao Yang , Russ Salakhutdinov, LIANHUI Qin, Xiaodan Liang, Haoye Dong, Eric Xing |
Learning Attractor Dynamics For Generative Memory
A central challenge faced by memory systems is the robust retrieval of a stored pattern in the presence of interference due to other stored patterns and noise. |
Yan Wu, Greg Wayne, Karol Gregor, Timothy Lillicrap |
Fast Deep Reinforcement Learning Using Online Adjustments From The Past
The authors propose Ephemeral Value Adjusments (EVA): a means of allowing deep reinforcement learning agents to rapidly adapt to experience in their replay buffer.EVA shifts the value predicted by a neural network with an estimate of the value function found by prioritised sweeping over experience tuples from the replay buffer near the current state. |
Steven Hansen, Alexander Pritzel, Pablo Sprechmann, Andre Barreto, Charles Blundell |
Blockwise Parallel Decoding For Deep Autoregressive Models
Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years.To overcome this limitation, the authors propose a novel blockwise parallel decoding scheme in which the authors make predictions for multiple time steps in parallel then back off to the longest prefix validated by a scoring model. |
Mitchell Stern, Noam Shazeer, Jakob Uszkoreit |
Automatic Program Synthesis Of Long Programs With A Learned Garbage Collector | Amit Zohar, Lior Wolf |
The Global Anchor Method For Quantifying Linguistic Shifts And Domain Adaptation
Language is dynamic, constantly evolving and adapting with respect to time, domain or topic.In this paper, the authors introduce the global anchor method for detecting corpus-level language shifts. |
Zi Yin, Vin Sachidananda, Balaji Prabhakar |
End-to-End Differentiable Physics For Learning And Control
The authors present a differentiable physics engine that can be integrated as a module in deep neural networks for end-to-end learning. |
Filipe De Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, J. Zico Kolter |
Neural Arithmetic Logic Units
Neural networks can learn to represent and manipulate numerical information, but they seldom generalize well outside of the range of numerical values encountered during training. |
Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, Phil Blunsom |
Reinforced Continual Learning
Most artificial intelligence models are limited in their ability to solve new tasks faster, without forgetting previously acquired knowledge.In this work, a novel approach for continual learning is proposed, which searches for the best neural architecture for each coming task via sophisticatedly designed reinforcement learning strategies. |
Ju Xu, Zhanxing Zhu |
Poison Frogs! Targeted Clean-Label Poisoning Attacks On Neural Networks | Ali Shafahi, Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, Tom Goldstein |
Generalizing To Unseen Domains Via Adversarial Data Augmentation | Riccardo Volpi, Hong Namkoong, Ozan Sener, John Duchi, Vittorio Murino, Silvio Savarese |
On The Local Hessian In Back-propagation
Back-propagation (BP) is the foundation for successfully training deep neural networks.Meanwhile, BP often works well when combining with ``designing tricks' like orthogonal initialization, batch normalization and skip connection. |
Huishuai Zhang, Wei Chen, Tie-Yan Liu |
Lipschitz-Margin Training: Scalable Certification Of Perturbation Invariance For Deep Neural Networks
High sensitivity of neural networks against malicious perturbations on inputs causes security concerns.From the relationship between the Lipschitz constants and prediction margins, the authors present a computationally efficient calculation technique to lower-bound the size of adversarial perturbations that can deceive networks, and that is widely applicable to various complicated networks. |
Yusuke Tsuzuku, Issei Sato, Masashi Sugiyama |
Tangent: Automatic Differentiation Using Source-code Transformation For Dynamically Typed Array Programming
The need to efficiently calculate first- and higher-order derivatives of increasingly complex models expressed in Python has stressed or exceeded the capabilities of available tools.The authors implement and demonstrate these ideas in the Tangent software library for Python, the first AD framework for a dynamic language that uses SCT. |
Bart Van Merrienboer, Dan Moldovan, Alexander Wiltschko |
Simple, Distributed, And Accelerated Probabilistic Programming
The authors describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. |
Dustin Tran, Matthew Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, Alexey Radul, Matthew Johnson, Rif A. Saurous |
Power-law Efficient Neural Codes Provide General Link Between Perceptual Bias And Discriminability | Michael Morais, Jonathan W Pillow |
DropBlock: A Regularization Method For Convolutional Networks
Deep neural networks often work well when they are over-parameterized and trained with a massive amount of noise and regularization, such as weight decay and dropout.In this paper, the authors introduce DropBlock, a form of structured dropout, where units in a contiguous region of a feature map are dropped together. |
Golnaz Ghiasi, Tsung-Yi Lin, Quoc V Le |
Learning Sparse Neural Networks Via Sensitivity-driven Regularization
The ever-increasing number of parameters in deep neural networks poses challenges for memory-limited applications.their relevance to the network output) and introduce a regularization term that gradually lowers the absolute value of parameters with low sensitivity. |
Enzo Tartaglione, Skjalg Lepsøy, Attilio Fiandrotti, Gianluca Francini |
Critical Initialisation For Deep Signal Propagation In Noisy Rectifier Neural Networks | Arnu Pretorius, Elan Van Biljon, Steve Kroon, Herman Kamper |
The Streaming Rollout Of Deep Networks - Towards Fully Model-parallel Execution
Deep neural networks, and in particular recurrent networks, are promising candidates to control autonomous agents that interact in real-time with the physical world.In this study, the authors present a theoretical framework to describe rollouts, the level of model-parallelization they induce, and demonstrate differences in solving specific tasks. |
Volker Fischer, Jan Koehler, Thomas Pfeil |
The Spectrum Of The Fisher Information Matrix Of A Single-Hidden-Layer Neural Network
An important factor contributing to the success of deep learning has been the remarkable ability to optimize large neural networks using simple first-order optimization algorithms like stochastic gradient descent.In this work, the authors extend a recently-developed framework for studying spectra of nonlinear random matrices to characterize an important measure of curvature, namely the eigenvalues of the Fisher information matrix. |
Jeffrey Pennington, Pratik Worah |
Learning Optimal Reserve Price Against Non-myopic Bidders
The authors consider the problem of learning optimal reserve price in repeated auctions against non-myopic bidders, who may bid strategically in order to gain in future rounds even if the single-round auctions are truthful.The authors introduce algorithms that obtain small regret against non-myopic bidders either when the market is large, i.e., no bidder appears in a constant fraction of the rounds, or when the bidders are impatient, i.e., they discount future utility by some factor mildly bounded away from one. |
Jinyan Liu, Zhiyi Huang, Xiangning Wang |
Beyond Log-concavity: Provable Guarantees For Sampling Multi-modal Distributions Using Simulated Tempering Langevin Monte Carlo
A key task in Bayesian machine learning is sampling from distributions that are only specified up to a partition function (i.e., constant of proportionality). |
HOLDEN LEE, Andrej Risteski, Rong Ge |
On Binary Classification In Extreme Regions
In pattern recognition, a random label Y is to be predicted based upon observing a random vector X valued in $mathbb{R}^d$ with d>1 by means of a classification rule with minimum probability of error.Beyond theoretical results, numerical experiments are presented in order to illustrate the relevance of the approach developed. |
Hamid JALALZAI, Stephan Clémençon, Anne Sabourin |
PAC-Bayes Bounds For Stable Algorithms With Instance-dependent Priors
PAC-Bayes bounds have been proposed to get risk estimates based on a training sample. |
Omar Rivasplata, Csaba Szepesvari, John Shawe-Taylor, Emilio Parrado-Hernandez, Shiliang Sun |
Improved Algorithms For Collaborative PAC Learning
The authors study a recent model of collaborative PAC learning where $k$ players with $k$ different tasks collaborate to learn a single classifier that works for all tasks. |
Huy Nguyen, Lydia Zakynthinou |
Data Amplification: A Unified And Competitive Approach To Property Estimation
Estimating properties of discrete distributions is a fundamental problem in statistical learning.The authors design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just 2n samples to achieve the performance attained by the empirical estimator with nsqrt{log n} samples. |
Yi HAO, Alon Orlitsky, Ananda Theertha Suresh, Yihong Wu |
Mean-field Theory Of Graph Neural Networks In Graph Partitioning
A theoretical performance analysis of the graph neural network (GNN) is presented. |
Tatsuro Kawamoto, Masashi Tsubaki, Tomoyuki Obuchi |
Statistical Mechanics Of Low-rank Tensor Decomposition
Often, large, high dimensional datasets collected across multiplemodalities can be organized as a higher order tensor. |
Jonathan Kadmon, Surya Ganguli |
Plug-in Estimation In High-Dimensional Linear Inverse Problems: A Rigorous Analysis
Estimating a vector $mathbf{x}$ from noisy linear measurements $mathbf{Ax+w}$ often requires use of prior knowledge or structural constraintson $mathbf{x}$ for accurate reconstruction.Several recent works have considered combining linear least-squares estimation with a generic or plug-in ``denoiser" function that can be designed in a modular manner based on the prior knowledge about $mathbf{x}$. |
Alyson Fletcher, Parthe Pandit, Sundeep Rangan, Subrata Sarkar, Phil Schniter |
The Description Length Of Deep Learning Models
Deep learning models often have more parameters than observations, and still perform well.This is sometimes described as a paradox. |
Léonard Blier, Yann Ollivier |
Deepcode: Feedback Codes Via Deep Learning
The design of codes for communicating reliably over a statistically well defined channel is an important endeavor involving deep mathematical research and wide- ranging practical applications. |
Hyeji Kim, Yihan Jiang, Sreeram Kannan, Sewoong Oh, Pramod Viswanath |
Binary Rating Estimation With Graph Side Information | Kwangjun Ahn, Kangwook Lee, Hyunseung Cha, Changho Suh |
Optimization Of Smooth Functions With Noisy Observations: Local Minimax Rates
The authors consider the problem of global optimization of an unknown non-convex smooth function with noisy zeroth-order feedback.The authors propose a local minimax framework to study the fundamental difficulty of optimizing smooth functions with adaptive function evaluations. |
Yining Wang, Sivaraman Balakrishnan, Aarti Singh |
Parameters As Interacting Particles: Long Time Convergence And Asymptotic Error Scaling Of Neural Networks
The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. |
Grant Rotskoff, Eric Vanden-Eijnden |
Towards Understanding Acceleration Tradeoff Between Momentum And Asynchrony In Nonconvex Stochastic Optimization
Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) have been widely used in distributed machine learning, e.g., training large collaborative filtering systems and deep neural networks.Therefore, the authors propose to analyze the algorithm through a simpler but nontrivial nonconvex problems --- streaming PCA. |
Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao |
Asymptotic Optimality Of Adaptive Importance Sampling | François Portier, Bernard Delyon |
Inference Aided Reinforcement Learning For Incentive Mechanism Design In Crowdsourcing | Zehong Hu, Yitao Liang, Jie Zhang, Zhao Li, Yang Liu |
A Game-Theoretic Approach To Recommendation Systems With Strategic Content Providers
The authors introduce a game-theoretic approach to the study of recommendation systems with strategic content providers. |
Omer Ben-Porat, Moshe Tennenholtz |
A Mathematical Model For Optimal Decisions In A Representative Democracy
Direct democracy, where each voter casts one vote, fails when the average voter competence falls below 50%.Representative democracy, where voters choose representatives to vote, can be an elixir in both these situations. |
Malik Magdon-Ismail, Lirong Xia |
Universal Growth In Production Economies
The authors study a simple variant of the von Neumann model of an expanding economy, in which multiple producers make goods according to their production function. |
Simina Branzei, Ruta Mehta, Noam Nisan |
Convex Elicitation Of Continuous Properties
A property or statistic of a distribution is said to be elicitable if it can be expressed as the minimizer of some loss function in expectation.Recent work shows that continuous real-valued properties are elicitable if and only if they are identifiable, meaning the set of distributions with the same property value can be described by linear constraints. |
Jessie Finocchiaro, Rafael Frongillo |
Contextual Pricing For Lipschitz Buyers
The authors investigate the problem of learning a Lipschitz function from binary feedback.The problem is motivated by extit{contextual dynamic pricing}, where a firm must sell a stream of differentiated products to a collection of buyers with non-linear valuations for the items and observes only whether the item was sold or not at the posted price. |
Jieming Mao, Renato Leme, Jon Schneider |
Learning In Games With Lossy Feedback
The authors consider a game-theoretical multi-agent learning problem where the feedback information can be lost during the learning process and rewards are given by a broad class of games known as variationally stable games.The authors propose a simple variant of the classical online gradient descent algorithm, called reweighted online gradient descent (ROGD) and show that in variationally stable games, if each agent adopts ROGD, then almost sure convergence to the set of Nash equilibria is guaranteed, even when the feedback loss is asynchronous and arbitrarily corrrelated among agents. |
Zhengyuan Zhou, Panayotis Mertikopoulos, Susan Athey, Nicholas Bambos, Peter W Glynn, Yinyu Ye |
Multiplicative Weights Updates With Constant Step-Size In Graphical Constant-Sum Games | Yun Kuen Cheung |
Solving Large Sequential Games With The Excessive Gap Technique
There has been tremendous recent progress on equilibrium-finding algorithms for zero-sum imperfect-information extensive-form games, but there has been a puzzling gap between theory and practice.Experiments with first-order methods have only been conducted on small- and medium-sized games because those methods are complicated to implement in this setting, and because CFR variants have been enhanced extensively for over a decade they perform well in practice. |
Christian Kroer, Gabriele Farina, Tuomas Sandholm |
Practical Exact Algorithm For Trembling-hand Equilibrium Refinements In Games
Nash equilibrium strategies have the known weakness that they do not prescribe rational play in situations that are reached with zero probability according to the strategies themselves, for example, if players have made mistakes.In this paper, the authors design an exact polynomial-time algorithm for finding trembling-hand equilibria in zero-sum extensive-form games. |
Gabriele Farina, Nicola Gatti, Tuomas Sandholm |
Improving Online Algorithms Via ML Predictions
In this work the authors study the problem of using machine-learned predictions to improve performance of online algorithms. |
Manish Purohit, Zoya Svitkina, Ravi Kumar |
Variance-Reduced Stochastic Gradient Descent On Streaming Data
The authors present an algorithm STRSAGA for efficiently maintaining a machine learning model over data points that arrive over time, quickly updating the model as new training data is observed. |
Ellango Jothimurugesan, Ashraf Tahmasbi, Phillip Gibbons, Srikanta Tirthapura |
Regret Bounds For Robust Adaptive Control Of The Linear Quadratic Regulator
The authors consider adaptive control of the Linear Quadratic Regulator (LQR), where anunknown linear system is controlled subject to quadratic costs.Leveraging recentdevelopments in the estimation of linear systems and in robust controller synthesis,the authors present the first provably polynomial time algorithm that achieves sub-linearregret on this problem. |
Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, Stephen Tu |
PAC-learning In The Presence Of Adversaries
The existence of evasion attacks during the test phase of machine learning algorithms represents a significant challenge to both their deployment and understanding. |
Daniel Cullina, Arjun Nitin Bhagoji, Prateek Mittal |
Tight Bounds For Collaborative PAC Learning Via Multiplicative Weights
The authors study the collaborative PAC learning problem recently proposed in Blum et al.~\cite{BHPQ17}, in which we have $k$ players and they want to learn a target function collaboratively, such that the learned function approximates the target function well on all players' distributions simultaneously. |
Jiecao Chen, Qin Zhang, Yuan Zhou |
Understanding Weight Normalized Deep Neural Networks With Rectified Linear Units
This paper presents a general framework for norm-based capacity control for $L_{p,q}$ weight normalized deep neural networks. |
Yixi Xu, Xiao Wang |
Generalization Bounds For Uniformly Stable Algorithms
Uniform stability of a learning algorithm is a classical notion of algorithmic stability introduced to derive high-probability bounds on the generalization error (Bousquet and Elisseeff, 2002). |
Vitaly Feldman, Jan Vondrak |
Minimax Statistical Learning With Wasserstein Distances
As opposed to standard empirical risk minimization (ERM), distributionally robust optimization aims to minimize the worst-case risk over a larger ambiguity set containing the original empirical distribution of the training data.In this work, the authors describe a minimax framework for statistical learning with ambiguity sets given by balls in Wasserstein space. |
Jaeho Lee, Maxim Raginsky |
Differentially Private Uniformly Most Powerful Tests For Binomial Data
The authors derive uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of Differential Privacy (DP), optimizing finite sample performance. |
Jordan Awan, Aleksandra Slavković |
Sketching Method For Large Scale Combinatorial Inference
The authors present computationally efficient algorithms to test various combinatorial structures of large-scale graphical models. |
Will Wei Sun, Junwei Lu, Han Liu |
An Improved Analysis Of Alternating Minimization For Structured Multi-Response Regression
Multi-response linear models aggregate a set of vanilla linear models by assuming correlated noise across them, which has an unknown covariance structure.In this work, the authors present a resampling-free analysis for the alternating minimization algorithm applied to the multi-response regression. |
Sheng Chen, Arindam Banerjee |
MixLasso: Generalized Mixed Regression Via Convex Atomic-Norm Regularization
The authors consider a generalization of mixed regression where the response is an additive combination of several mixture components.In this work, the authors study a novel convex estimator emph{MixLasso} for the estimation of generalized mixed regression, based on an atomic norm specifically constructed to regularize the number of mixture components. |
Ian En-Hsu Yen, Wei-Cheng Lee, Kai Zhong, Sung-En Chang, Pradeep Ravikumar, Shou-De Lin |
A Theory-Based Evaluation Of Nearest Neighbor Models Put Into Practice
In the $k$-nearest neighborhood model ($k$-NN), the authors are given a set of points $P$, and the authors shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric.Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit. |
Hendrik Fichtenberger, Dennis Rohde |
Sharp Bounds For Generalized Uniformity Testing
The authors study the problem of generalized uniformity testing of a discrete probability distribution: Given samples from a probability distribution p over an unknown size discrete domain Ω, the authors want to distinguish, with probability at least 2/3, between the case that p is uniform on some subset of Ω versus ε-far, in total variation distance, from any such uniform distribution. |
Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart |
Testing For Families Of Distributions Via The Fourier Transform
The authors study the general problem of testing whether an unknown discrete distribution belongs to a specified family of distributions. |
Alistair Stewart, Ilias Diakonikolas, Clement Canonne |
High Dimensional Linear Regression Using Lattice Basis Reduction
The authors consider a high dimensional linear regression problem where the goal is to efficiently recover an unknown vector eta^* from n noisy linear observations Y=X eta^+W in R^n, for known X in R^{n imes p} and unknown W in R^n.The authors propose a new polynomial-time algorithm for this task which is based on the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction algorithm. |
Ilias Zadik, David Gamarnik |
$\ell_1$-regression With Heavy-tailed Distributions
In this paper, the authors consider the problem of linear regression with heavy-tailed distributions.To address the challenge that both the input and output could be heavy-tailed, the authors propose a truncated minimization problem, and demonstrate that it enjoys an $O(sqrt{d/n})$ excess risk, where $d$ is the dimensionality and $n$ is the number of samples. |
Lijun Zhang, Zhi-Hua Zhou |
Constant Regret, Generalized Mixability, And Mirror Descent
The authors consider the setting of prediction with expert advice; a learner makes predictions by aggregating those of a group of experts. |
Zakaria Mhammedi, Robert Williamson |
Stochastic Composite Mirror Descent: Optimal Bounds With High Probabilities
The authors study stochastic composite mirror descent, a class of scalable algorithms able to exploit the geometry and composite structure of a problem. |
Yunwen Lei, Ke Tang |
Uniform Convergence Of Gradients For Non-Convex Learning And Optimization
The authors investigate 1) the rate at which refined properties of the empirical risk---in particular, gradients---converge to their population counterparts in standard non-convex learning tasks, and 2) the consequences of this convergence for optimization.The authors propose vector-valued Rademacher complexities as a simple, composable, and user-friendly tool to derive dimension-free uniform convergence bounds for gradients in non-convex learning problems. |
Dylan Foster, Ayush Sekhari, Karthik Sridharan |
Fast Rates Of ERM And Stochastic Approximation: Adaptive To Error Bound Conditions
Error bound conditions (EBC) are properties that characterize the growth of an objective function when a point is moved away from the optimal set. |
Mingrui Liu, Xiaoxuan Zhang, Lijun Zhang, Jing Rong, Tianbao Yang |
Nearly Tight Sample Complexity Bounds For Learning Mixtures Of Gaussians Via Sample Compression Schemes
The authors prove that ϴ(k d^2 / ε^2) samples are necessary and sufficient for learning a mixture of k Gaussians in R^d, up to error ε in total variation distance. |
Hassan Ashtiani, Shai Ben-David, Nick Harvey, Christopher Liaw, Abbas Mehrabian, Yaniv Plan |
Early Stopping For Nonparametric Testing
Early stopping of iterative algorithms is an algorithmic regularization method to avoid over-fitting in estimation and classification. |
Meimei Liu, Guang Cheng |
Chaining Mutual Information And Tightening Generalization Bounds
Bounding the generalization error of learning algorithms has a long history, which yet falls short in explaining various generalization successes including those of deep learning.In this paper, the authors introduce a technique to combine chaining and mutual information methods, to obtain a generalization bound that is both algorithm-dependent and that exploits the dependencies between the hypotheses. |
Amir Asadi, Emmanuel Abbe, Sergio Verdu |
Dimensionality Reduction Has Quantifiable Imperfections: Two Geometric Bounds
In this paper, the authors investigate Dimensionality reduction (DR) maps in an information retrieval setting from a quantitative topology point of view.The authors therefore propose a new measure based on Wasserstein distance that comes with similar theoretical guarantee. |
Kry Lui, Gavin Weiguang Ding, Ruitong Huang, Robert McCann |
Minimax Estimation Of Neural Net Distance
An important class of distance metrics proposed for training generative adversarial networks (GANs) is the integral probability metric (IPM), in which the neural net distance captures the practical GAN training via two neural networks. |
Kaiyi Ji, Yingbin Liang |
Quantifying Learning Guarantees For Convex But Inconsistent Surrogates
The authors study consistency properties of machine learning methods based on minimizing convex surrogates. |
Kirill Struminsky, Simon Lacoste-Julien, Anton Osokin |
Learning Signed Determinantal Point Processes Through The Principal Minor Assignment Problem
Symmetric determinantal point processes (DPP) are a class of probabilistic models that encode the random selection of items that have a repulsive behavior. |
Victor-Emmanuel Brunel |
Data-dependent PAC-Bayes Priors Via Differential Privacy
The Probably Approximately Correct (PAC) Bayes framework (McAllester, 1999) can incorporate knowledge about the learning algorithm and (data) distribution through the use of distribution-dependent priors, yielding tighter generalization bounds on data-dependent posteriors.Using this flexibility, however, is difficult, especially when the data distribution is presumed to be unknown. |
Gintare Karolina Dziugaite, Dan Roy |
Computationally And Statistically Efficient Learning Of Causal Bayes Nets Using Path Queries
Causal discovery from empirical data is a fundamental problem in many scientific domains.In this paper the authors first propose a polynomial time algorithm for learning the exact correctly-oriented structure of the transitive reduction of any causal Bayesian network with high probability, by using interventional path queries. |
Kevin Bello, Jean Honorio |
PAC-Bayes Tree: Weighted Subtrees With Guarantees
The authors present a weighted-majority classification approach over subtrees of a fixed tree, which provably achieves excess-risk of the same order as the best tree-pruning. |
Stan D Nguyen, Samory Kpotufe |
A Loss Framework For Calibrated Anomaly Detection
Given samples from a probability distribution, anomaly detection is the problem of determining if a given point lies in a low-density region. |
Aditya Menon, Robert Williamson |
On Oracle-Efficient PAC RL With Rich Observations
The authors study the computational tractability of PAC reinforcement learning with rich observations. |
Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert Schapire |
Adversarial Risk And Robustness: General Definitions And Implications For The Uniform Distribution
The authors study adversarial perturbations when the instances are uniformly distributed over {0,1}^n. |
Dimitrios Diochnos, Saeed Mahloujifar, Mohammad Mahmoody |
Learning From Discriminative Feature Feedback
The authors consider the problem of learning a multi-class classifier from labels as well as simple explanations that the authors call "discriminative features".The authors present an efficient online algorithm for learning from such feedback and the authors give tight bounds on the number of mistakes made during the learning process. |
Sanjoy Dasgupta, Sivan Sabato, Nick Roberts, Akansha Dey |
How To Start Training: The Effect Of Initialization And Architecture
The authors identify and study two common failure modes for early training in deep ReLU nets. |
Boris Hanin, David Rolnick |
Bilevel Distance Metric Learning For Robust Image Recognition
Metric learning, aiming to learn a discriminative Mahalanobis distance matrix M that can effectively reflect the similarity between data samples, has been widely studied in various image recognition problems.In this paper, the authors integrate both feature extraction and metric learning into one joint optimization framework and propose a new bilevel distance metric learning model. |
Jie Xu, Lei Luo, Cheng Deng, Heng Huang |
Deep Non-Blind Deconvolution Via Generalized Low-Rank Approximation
In this paper, the authors present a deep convolutional neural network to capture the inherent properties of image degradation, which can handle different kernels and saturated pixels in a unified framework. |
Wenqi Ren, Jiawei Zhang, Lin Ma, Jinshan Pan, Xiaochun Cao, Wangmeng Zuo, Wei Liu, Ming-Hsuan Yang |
Unsupervised Depth Estimation, 3D Face Rotation And Replacement
The authors present an unsupervised approach for learning to estimate three dimensional (3D) facial structure from a single image while also predicting 3D viewpoint transformations that match a desired pose and facial geometry.The authors achieve this by inferring the depth of facial keypoints of an input image in an unsupervised manner, without using any form of ground-truth depth information. |
Joel Ruben Antony Moniz, Christopher Beckham, Simon Rajotte, Sina Honari, Chris Pal |
Neighbourhood Consensus Networks
The authors address the problem of finding reliable dense correspondences between a pair of images. |
Ignacio Rocco, Mircea Cimpoi, Relja Arandjelović, Akihiko Torii, Tomas Pajdla, Josef Sivic |
Recurrent Transformer Networks For Semantic Correspondence
The authors present recurrent transformer networks (RTNs) for obtaining dense correspondences between semantically similar images. |
Seungryong Kim, Stephen Lin, SANG RYUL JEON, Dongbo Min, Kwanghoon Sohn |
Deep Network For The Integrated 3D Sensing Of Multiple People In Natural Images
The authors present MubyNet -- a feed-forward, multitask, bottom up system for the integrated localization, as well as 3d pose and shape estimation, of multiple people in monocular images. |
Andrei Zanfir, Elisabeta Marinoiu, Mihai Zanfir, Alin-Ionut Popa, Cristian Sminchisescu |
A Neural Compositional Paradigm For Image Captioning
Mainstream captioning models often follow a sequential structure to generate cap-tions, leading to issues such as introduction of irrelevant semantics, lack of diversity in the generated captions, and inadequate generalization performance.In this paper, the authors present an alternative paradigm for image captioning, which factorizes the captioning procedure into two stages: (1) extracting an explicit semantic representation from the given image; and (2) constructing the caption based on a recursive compositional procedure in a bottom-up manner. |
Bo Dai, Sanja Fidler, Dahua Lin |
Visual Memory For Robust Path Following
Humans routinely retrace a path in a novel environment both forwards and backwards despite uncertainty in their motion.In this paper, the authors present an approach for doing so. |
Ashish Kumar, Saurabh Gupta, David Fouhey, Sergey Levine, Jitendra Malik |
Learning To Exploit Stability For 3D Scene Parsing
Human scene understanding uses a variety of visual and non-visual cues to perform inference on object types, poses, and relations.The authors then present a novel architecture for 3D scene parsing named Prim R-CNN, learning to predict bounding boxes as well as their 3D size, translation, and rotation. |
Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu |
Learning To Decompose And Disentangle Representations For Video Prediction | Jun-Ting Hsieh, Bingbin Liu, De-An Huang, Li Fei-Fei, Juan Carlos Niebles |
Weakly Supervised Dense Event Captioning In Videos
Dense event captioning aims to detect and describe all events of interest contained in a video. |
Xuguang Duan, Wenbing Huang, Chuang Gan, Jingdong Wang, Wenwu Zhu, Junzhou Huang |
Text-Adaptive Generative Adversarial Networks: Manipulating Images With Natural Language
This paper addresses the problem of manipulating images using natural language description.In this paper, the authors propose the text-adaptive generative adversarial network (TAGAN) to generate semantically manipulated images while preserving text-irrelevant contents. |
Seonghyeon Nam, Yunji Kim, Seon Joo Kim |
A Probabilistic U-Net For Segmentation Of Ambiguous Images
Many real-world vision problems suffer from inherent ambiguities.To this end the authors propose a generative segmentation model based on a combination of a U-Net with a conditional variational autoencoder that is capable of efficiently producing an unlimited number of plausible hypotheses. |
Simon Kohl, Bernardino Romera-Paredes, Clemens Meyer, Jeffrey De Fauw, Joseph R. Ledsam, Klaus Maier-Hein, Ali Eslami, Danilo Jimenez Rezende, Olaf Ronneberger |
Forward Modeling For Partial Observation Strategy Games - A StarCraft Defogger
The authors formulate the problem of defogging as state estimation and future state prediction from previous, partial observations in the context of real-time strategy games.The authors propose to employ encoder-decoder neural networks for this task, and introduce proxy tasks and baselines for evaluation to assess their ability of capturing basic game rules and high-level dynamics. |
Gabriel Synnaeve, Zeming Lin, Jonas Gehring, Dan Gant, Vegard Mella, Vasil Khalidov, Nicolas Carion, Nicolas Usunier |
Adversarial Text Generation Via Feature-Mover's Distance
Generative adversarial networks (GANs) have achieved significant success in generating real-valued data.Instead of using the standard GAN objective, the authors propose to improve text-generation GAN via a novel approach inspired by optimal transport. |
Liqun Chen, Shuyang Dai, Chenyang Tao, Haichao Zhang, Zhe Gan, Dinghan Shen, Yizhe Zhang, Guoyin Wang, Ruiyi Zhang, Lawrence Carin |
Virtual Class Enhanced Discriminative Embedding Learning
Recently, learning discriminative features to improve the recognition performances gradually becomes the primary goal of deep learning, and numerous remarkable works have emerged.In this paper, the authors propose a novel yet extremely simple method Virtual Softmax to enhance the discriminative property of learned features by injecting a dynamic virtual negative class into the original softmax. |
Binghui Chen, Weihong Deng, Haifeng Shen |
Learning Pipelines With Limited Data And Domain Knowledge: A Study In Parsing Physics Problems
As machine learning becomes more widely used in practice, the authors need new methods to build complex intelligent systems that integrate learning with existing software, and with domain knowledge encoded as rules.As a case study, the authors present such a system that learns to parse Newtonian physics problems in textbooks. |
Mrinmaya Sachan, Kumar Avinava Dubey, Tom Mitchell, Dan Roth, Eric Xing |
Pipe-SGD: A Decentralized Pipelined SGD Framework For Distributed Deep Net Training
Distributed training of deep nets is an important technique to address some of the present day computing challenges like memory consumption and computational demands. |
Youjie Li, Mingchao Yu, Songze Li, Salman Avestimehr, Nam Sung Kim, Alex Schwing |
MULAN: A Blind And Off-Grid Method For Multichannel Echo Retrieval
This paper addresses the general problem of blind echo retrieval, i.e., given M sensors measuring in the discrete-time domain M mixtures of K delayed and attenuated copies of an unknown source signal, can the echo location and weights be recovered? |
Helena Peic Tukuljac, Antoine Deleforge, Remi Gribonval |
Diminishing Returns Shape Constraints For Interpretability And Regularization
The authors investigate machine learning models that can provide diminishing returns and accelerating returns guarantees to capture prior knowledge or policies about how outputs should depend on inputs. |
Maya Gupta, Dara Bahri, Andy Cotter, Kevin Canini |
Fairness Through Computationally-Bounded Awareness
The authors study the problem of fair classification within the versatile framework of Dwork et al. |
Michael Kim, Omer Reingold, Guy Rothblum |
On Preserving Non-discrimination When Combining Expert Advice
The authors study the interplay between sequential decision making and avoiding discrimination against protected groups, when examples arrive online and do not follow distributional assumptions. |
Avrim Blum, Suriya Gunasekar, Thodoris Lykouris, Nati Srebro |
Fairness Behind A Veil Of Ignorance: A Welfare Analysis For Automated Decision Making
The authors draw attention to an important, yet largely overlooked aspect of evaluating fairness for automated decision making systems---namely risk and welfare considerations |
Hoda Heidari, Claudio Ferrari, Krishna Gummadi, Andreas Krause |
Inequity Aversion Improves Cooperation In Intertemporal Social Dilemmas
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. |
Edward Hughes, Joel Leibo, Matthew Phillips, Karl Tuyls, Edgar Dueñez-Guzman, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin McKee, Raphael Koster, Heather Roff, Thore Graepel |
On Misinformation Containment In Online Social Networks
The widespread online misinformation could cause public panic and serious economic damages.Motivated by realistic scenarios, the authors present the first analysis of the misinformation containment problem for the case when an arbitrary number of cascades are allowed. |
Amo Tong, Ding-Zhu Du, Weili Wu |
Found Graph Data And Planted Vertex Covers
A typical way in which network data is recorded is to measure all interactions involving a specified set of core nodes, which produces a graph containing this core together with a potentially larger set of fringe nodes that link to the core.Interactions between nodes in the fringe, however, are not present in the resulting graph data. |
Austin Benson, Jon Kleinberg |
Inferring Networks From Random Walk-Based Node Similarities
Digital presence in the world of online social media entails significant privacy risks. |
Jeremy Hoskins, Cameron Musco, Christopher Musco, Babis Tsourakakis |
Fast Greedy MAP Inference For Determinantal Point Process To Improve Recommendation Diversity
The determinantal point process (DPP) is an elegant probabilistic model of repulsion with applications in various machine learning tasks including summarization and search.To overcome the computational challenge, in this paper, the authors propose a novel algorithm to greatly accelerate the greedy MAP inference for DPP. |
Laming Chen, Guoxin Zhang, Eric Zhou |
Unorganized Malicious Attacks Detection
Recommender systems have attracted much attention during the past decade.This attack style occurs in many real applications, yet relevant study remains open. |
Ming Pang, Wei Gao, Min Tao, Zhi-Hua Zhou |
Scalable Robust Matrix Factorization With Nonconvex Loss
Robust matrix factorization (RMF), which uses the $ell_1$-loss, often outperforms standard matrix factorization using the $ell_2$-loss, particularly when outliers are present. |
Quanming Yao, James Kwok |
Differentially Private Robust Low-Rank Approximation
In this paper, the authors study the following robust low-rank matrix approximation problem: given a matrix $A in R^{n imes d}$, find a rank-$k$ matrix $B$, while satisfying differential privacy, such that $ orm{ A - B }_p leq alpha mathsf{OPT}_k(A) + au,$ where $ orm{ M }_p$ is the entry-wise $ell_p$-norm and $mathsf{OPT}_k(A):=min_{mathsf{rank}(X) leq k} orm{ A - X}_p$. |
Raman Arora, Vladimir Braverman, Jalaj Upadhyay |
Differential Privacy For Growing Databases
The large majority of differentially private algorithms focus on the static setting, where queries are made on an unchanging database. |
Rachel Cummings, Sara Krehbiel, Kevin A Lai, Tao (Uthaipon) Tantipongpipat |
Efficient Neural Network Robustness Certification With General Activation Functions
Finding minimum distortion of adversarial examples and thus certifying robustness in neural networks classifiers is known to be a challenging problem.To address this issue, in this paper the authors introduce CROWN, a general framework to certify robustness of neural networks with general activation functions. |
Huan Zhang, Lily Weng, Pin-Yu Chen, Cho-Jui Hsieh, Luca Daniel |
Spectral Signatures In Backdoor Attacks
A recent line of work has uncovered a new form of data poisoning: so-called backdoor attacks. |
Brandon Tran, Jerry Li, Aleksander Madry |
Constructing Unrestricted Adversarial Examples With Generative Models
Adversarial examples are typically constructed by perturbing an existing data point within a small matrix norm, and current defense methods are focused on guarding against this type of attack.In this paper, the authors propose a new class of adversarial examples that are synthesized entirely from scratch using a conditional generative model, without being restricted to norm-bounded perturbations. |
Yang Song, Rui Shu, Nate Kushman, Stefano Ermon |
Sublinear Time Low-Rank Approximation Of Distance Matrices
Let $PP={ p_1, p_2, ldots p_n }$ and $QQ = { q_1, q_2 ldots q_m }$ be two point sets in an arbitrary metric space.Let $AA$ represent the $m imes n$ pairwise distance matrix with $AA_{i,j} = d(p_i, q_j)$. |
Ainesh Bakshi, David Woodruff |
Leveraged Volume Sampling For Linear Regression
Suppose an n x d design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested. |
Michal Derezinski, Manfred Warmuth, Daniel Hsu |
Revisiting $(\epsilon, \gamma, \tau)$-similarity Learning For Domain Adaptation
Similarity learning is an active research area in machine learning that tackles the problem of finding a similarity function tailored to an observable data sample in order to achieve efficient classification.In this paper, the authors propose to extend the theoretical analysis of similarity learning to the domain adaptation setting, a particular situation occurring when the similarity is learned and then deployed on samples following different probability distributions. |
Sofiane Dhouib, Ievgen Redko |
Differential Properties Of Sinkhorn Approximation For Learning With Wasserstein Distance
Applications of optimal transport have recently gained remarkable attention as a result of the computational advantages of entropic regularization. |
Giulia Luise, Alessandro Rudi, Massimiliano Pontil, Carlo Ciliberto |
Algorithms And Theory For Multiple-Source Adaptation
The authors present a number of novel contributions to the multiple-source adaptation problem. |
Judy Hoffman, Mehryar Mohri, Ningshan Zhang |
Synthesize Policies For Transfer And Adaptation Across Tasks And Environments
The ability to transfer in reinforcement learning is key towards building an agent of general artificial intelligence.The authors propose a novel compositional neural network architecture which depicts a meta rule for composing policies from environment and task embeddings. |
Hexiang Hu, Liyu Chen, Boqing Gong, Fei Sha |
Acceleration Through Optimistic No-Regret Dynamics
The authors consider the problem of minimizing a smooth convex function by reducing the optimization to computing the Nash equilibrium of a particular zero-sum convex-concave game. |
Jun-Kun Wang, Jacob Abernethy |
Efficient Online Portfolio With Logarithmic Regret
The authors study the decades-old problem of online portfolio management and propose the first algorithm with logarithmic regret that is not based on Cover's Universal Portfolio algorithm and admits much faster implementation. |
Haipeng Luo, Chen-Yu Wei, Kai Zheng |
Almost Optimal Algorithms For Linear Stochastic Bandits With Heavy-Tailed Payoffs
In linear stochastic bandits, it is commonly assumed that payoffs are with sub-Gaussian noises.In this paper, under a weaker assumption on noises, the authors study the problem of underline{lin}ear stochastic {underline b}andits with h{underline e}avy-{underline t}ailed payoffs (LinBET), where the distributions have finite moments of order $1+epsilon$, for some $epsilonin (0,1]$. |
Han Shao, Xiaotian Yu, Irwin King, Michael Lyu |
A Smoothed Analysis Of The Greedy Algorithm For The Linear Contextual Bandit Problem
Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. |
Sampath Kannan, Jamie Morgenstern, Aaron Roth, Bo Waggoner, Zhiwei Steven Wu |
Exploration In Structured Reinforcement Learning
The authors address reinforcement learning problems with finite state and action spaces where the underlying MDP has some known structure that could be potentially exploited to minimize the exploration rates of suboptimal (state, action) pairs. |
Jungseul Ok, Alexandre Proutiere, Damianos Tranos |
Near Optimal Exploration-Exploitation In Non-Communicating Markov Decision Processes
While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e.g., in mountain car, the product space of speed and position contains configurations that are not physically reachable). |
Ronan Fruit, Matteo Pirotta, Alessandro Lazaric |
A Block Coordinate Ascent Algorithm For Mean-Variance Optimization | Tengyang Xie, Bo Liu, Yangyang Xu, Mohammad Ghavamzadeh, Yinlam Chow, Daoming Lyu, Daesub Yoon |
Learning Safe Policies With Expert Guidance
The authors propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function may be difficult to specify. |
Jessie Huang, Fa Wu, Doina Precup, Yang Cai |
M-Walk: Learning To Walk Over Graphs Using Monte Carlo Tree Search | Yelong Shen, Jianshu Chen, Po-Sen Huang, Yuqing Guo, Jianfeng Gao |
Is Q-Learning Provably Efficient?
Model-free reinforcement learning (RL) algorithms directly parameterize and update value functions or policies, bypassing the modeling of the environment. |
Chi Jin, Zeyuan Allen-Zhu, Sebastien Bubeck, Michael I. Jordan |
Variational Inverse Control With Events: A General Framework For Data-Driven Reward Definition
The design of a reward function often poses a major practical challenge to real-world applications of reinforcement learning. |
Justin Fu, Avi Singh, Dibya Ghosh, Larry Yang, Sergey Levine |
An Off-policy Policy Gradient Theorem Using Emphatic Weightings
Policy gradient methods are widely used for control in reinforcement learning, particularly for the continuous action setting.There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence of the policy gradient theorem which provides a simplified form for the gradient. |
Ehsan Imani, Eric Graves, Martha White |
Near-Optimal Time And Sample Complexities For Solving Markov Decision Processes With A Generative Model
In this paper the authors consider the problem of computing an $epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided the authors can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time. |
Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye |
Monte-Carlo Tree Search For Constrained POMDPs
Monte-Carlo Tree Search (MCTS) has been successfully applied to very large POMDPs, a standard model for stochastic sequential decision-making problems.In this paper, the authors present CC-POMCP (Cost-Constrained POMCP), an online MCTS algorithm for large CPOMDPs that leverages the optimization of LP-induced parameters and only requires a black-box simulator of the environment. |
Jongmin Lee, Geon-hyeong Kim, Pascal Poupart, Kee-Eung Kim |
Equality Of Opportunity In Classification: A Causal Approach
The Equalized Odds (for short, EO) is one of the most popular measures of discrimination used in the supervised learning setting. |
Justin Zhang, Elias Bareinboim |
Confounding-Robust Policy Improvement
The authors study the problem of learning personalized decision policies from observational data while accounting for possible unobserved confounding in the data-generating process. |
Nathan Kallus, Angela Zhou |
Causal Discovery From Discrete Data Using Hidden Compact Representation
Causal discovery from a set of observations is one of the fundamental problems across several disciplines.In this paper the authors make an attempt to find a way to solve this problem by assuming a two-stage causal process: the first stage maps the cause to a hidden variable of a lower cardinality, and the second stage generates the effect from the hidden representation. |
Ruichu Cai, Jie Qiao, Kun Zhang, Zhenjie Zhang, Zhifeng Hao |
Dirichlet Belief Networks For Topic Structure Learning
Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures.Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously studied. |
He Zhao, Lan Du, Wray Buntine, Mingyuan Zhou |
Approximate Knowledge Compilation By Online Collapsed Importance Sampling
The authors introduce collapsed compilation, a novel approximate inference algorithm for discrete probabilistic graphical models. |
Tal Friedman, Guy Van Den Broeck |
Proximal Graphical Event Models
Event datasets include events that occur irregularly over the timeline and are prevalent in numerous domains.The authors introduce proximal graphical event models (PGEM) as a representation of such datasets. |
Debarun Bhattacharjya, Shankar Subramanian, Tian Gao |
Dynamic Network Model From Partial Observations
Can evolving networks be inferred and modeled without directly observing their nodes and edges?The authors propose a novel framework for providing a non-parametric dynamic network model---based on a mixture of coupled hierarchical Dirichlet processes---based on data capturing cascade node infection times. |
Elahe Ghalebi, Baharan Mirzasoleiman, Radu Grosu, Jure Leskovec |
HOGWILD!-Gibbs Can Be PanAccurate | Constantinos Daskalakis, Nishanth Dikkala, Siddhartha Jayanti |
DAGs With NO TEARS: Continuous Optimization For Structure Learning
Estimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes.In this paper, the authors introduce a fundamentally different strategy: the authors formulate the structure learning problem as a purely continuous optimization problem over real matrices that avoids this combinatorial constraint entirely. |
Xun Zheng, Bryon Aragam, Pradeep Ravikumar, Eric Xing |
Mean Field For The Stochastic Blockmodel: Optimization Landscape And Convergence Issues
Variational approximation has been widely used in large-scale Bayesian inference recently, the simplest kind of which involves imposing a mean field assumption to approximate complicated latent structures. |
Soumendu Sundar Mukherjee, Purnamrita Sarkar, Y. X. Rachel Wang, Bowei Yan |
Coupled Variational Bayes Via Optimization Embedding
Variational inference plays a vital role in learning graphical models, especially on large-scale datasets.In this paper, the authors proposed coupled variational Bayes which exploits the primal-dual view of the ELBO with the variational distribution class generated by an optimization procedure, which is termed optimization embedding. |
Bo Dai, Hanjun Dai, Niao He, Weiyang Liu, Zhen Liu, Jianshu Chen, Lin Xiao, Le Song |
Stochastic Nonparametric Event-Tensor Decomposition
Tensor decompositions are fundamental tools for multiway data analysis.To address these issues, the authors formulate event-tensors, to preserve the complete temporal information for multiway data, and propose a novel Bayesian nonparametric decomposition model. |
Shandian Zhe, Yishuai Du |
GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference With GPU Acceleration
Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware.The authors present an efficient and general approach to GP inference based on Blackbox Matrix-Matrix multiplication (BBMM). |
Jacob Gardner, Geoff Pleiss, Kilian Weinberger, David Bindel, Andrew Wilson |
Heterogeneous Multi-output Gaussian Process Prediction
The authors present a novel extension of multi-output Gaussian processes for handling heterogeneous outputs. |
Pablo Moreno-Muñoz, Antonio Artés, Mauricio Álvarez |
Probabilistic Matrix Factorization For Automated Machine Learning
In order to achieve state-of-the-art performance, modern machine learning techniques require careful data pre-processing and hyperparameter tuning.In this paper, the authors propose to solve this meta-learning task by combining ideas from collaborative filtering and Bayesian optimization. |
Nicolo Fusi, Rishit Sheth, Melih Elibol |
Stochastic Expectation Maximization With Variance Reduction
Expectation-Maximization (EM) is a popular tool for learning latent variable models, but the vanilla batch EM does not scale to large data sets because the whole data set is needed at every E-step.In this paper, the authors propose a variance reduced stochastic EM (sEM-vr) algorithm inspired by variance reduced stochastic gradient descent algorithms. |
Jianfei Chen, Jun Zhu, Yee Whye Teh, Tong Zhang |
Generative Neural Machine Translation
The authors introduce Generative Neural Machine Translation (GNMT), a latent variable architecture which is designed to model the semantics of the source and target sentences. |
Harshil Shah, David Barber |
Sparse Covariance Modeling In High Dimensions With Gaussian Processes
This paper studies statistical relationships among components of high-dimensional observations varying across non-random covariates. |
Rui Li, Kishan KC, Feng Cui, Justin Domke, Anne Haake |
Variational Learning On Aggregate Outputs With Gaussian Processes
While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global mapping of disease, only have access to outputs at a much coarser level than that of the inputs.The authors propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. |
Ho Chung Law, Dino Sejdinovic, Ewan Cameron, Tim Lucas, Seth Flaxman, Katherine Battle, Kenji Fukumizu |
Learning Invariances Using The Marginal Likelihood
In many supervised learning tasks, learning what changes do not affect the predic-tion target is as crucial to generalisation as learning what does.Data augmentationis a common way to enforce a model to exhibit an invariance: training data is modi-fied according to an invariance designed by a human and added to the training data.The authors argue that invariances should be incorporated the model structure, and learnedusing themarginal likelihood, which can correctly reward the reduced complexityof invariant models. |
Mark Van Der Wilk, Matthias Bauer, Ti John, James Hensman |
Dirichlet-based Gaussian Processes For Large-scale Calibrated Classification
This paper studies the problem of deriving fast and accurate classification algorithms with uncertainty quantification.To this aim, the authors propose a novel regression approach where the labels are obtained through the interpretation of classification labels as the coefficients of a degenerate Dirichlet distribution. |
Dimitrios Milios, Raffaello Camoriano, Pietro Michiardi, Lorenzo Rosasco, Maurizio Filippone |
Regret Bounds For Meta Bayesian Optimization With An Unknown Gaussian Process Prior
Bayesian optimization usually assumes that a Bayesian prior is given. |
Zi Wang, Beomjoon Kim, Leslie Kaelbling |
Efficient High Dimensional Bayesian Optimization With Additivity And Quadrature Fourier Features
The authors develop an efficient and provably no-regret Bayesian optimization (BO) algorithm for optimization of black-box functions in high dimensions. |
Mojmir Mutny, Andreas Krause |
Adversarially Robust Optimization With Gaussian Processes
In this paper, the authors consider the problem of Gaussian process (GP) optimization with an added robustness requirement: The returned point may be perturbed by an adversary, and the authors require the function value to remain as high as possible even after this perturbation.This problem is motivated by settings in which the underlying functions during optimization and implementation stages are different, or when one is interested in finding an entire region of good inputs rather than only a single point. |
Ilija Bogunovic, Jonathan Scarlett, Stefanie Jegelka, Volkan Cevher |
Multi-objective Maximization Of Monotone Submodular Functions With Cardinality Constraint
The authors consider the problem of multi-objective maximization of monotone submodular functions subject to cardinality constraint, often formulated as $max_{|A|=k}min_{iin{1,dots,m}}f_i(A)$. |
Rajan Udwani |
Variational PDEs For Acceleration On Manifolds And Application To Diffeomorphisms | Ganesh Sundaramoorthi, Anthony Yezzi |
Zeroth-order (Non)-Convex Stochastic Optimization Via Conditional Gradient And Gradient Updates
In this paper, the authors propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization. |
Krishnakumar Balasubramanian, Saeed Ghadimi |
Computing Higher Order Derivatives Of Matrix And Tensor Expressions
Optimization is an integral part of most machine learning systems and most numerical optimization schemes rely on the computation of derivatives.Here, the authors close this fundamental gap and present an algorithmic framework for computing matrix and tensor derivatives that extends seamlessly to higher order derivatives. |
Soeren Laue, Matthias Mitterreiter, Joachim Giesen |
How SGD Selects The Global Minima In Over-parameterized Learning: A Dynamical Stability Perspective
The question of which global minima are accessible by a stochastic gradient decent (SGD) algorithm with specific learning rate and batch size is studied from the perspective of dynamical stability.The concept of non-uniformity is introduced, which, together with sharpness, characterizes the stability property of a global minimum and hence the accessibility of a particular SGD algorithm to that global minimum. |
Lei Wu, Chao Ma, Weinan E |
The Effect Of Network Width On The Performance Of Large-batch Training
Distributed implementations of mini-batch stochastic gradient descent (SGD) suffer from communication overheads, attributed to the high frequency of gradient updates inherent in small-batch training. |
Lingjiao Chen, Hongyi Wang, Jinman Zhao, Dimitrios Papailiopoulos, Paraschos Koutris |
COLA: Decentralized Linear Learning
Decentralized machine learning is a promising emerging paradigm in view of global challenges of data ownership and privacy.The authors consider learning of linear classification and regression models, in the setting where the training data is decentralized over many user devices, and the learning algorithm must run on-device, on an arbitrary communication network, without a central coordinator.The authors propose COLA, a new decentralized training algorithm with strong theoretical guarantees and superior practical performance. |
Lie He, An (Yatao) Bian, Martin Jaggi |
Distributed Stochastic Optimization Via Adaptive SGD
Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data.In this paper, the authors propose an efficient distributed stochastic optimization method by combining adaptivity with variance reduction techniques. |
Ashok Cutkosky, Róbert Busa-Fekete |
Non-Ergodic Alternating Proximal Augmented Lagrangian Algorithms With Optimal Rates
The authors develop two new non-ergodic alternating proximal augmented Lagrangian algorithms (NEAPAL) to solve a class of nonsmooth constrained convex optimization problems. |
Quoc Tran Dinh |
Breaking The Span Assumption Yields Fast Finite-Sum Minimization
In this paper, the authors show that SVRG and SARAH can be modified to be fundamentally faster than all of the other standard algorithms that minimize the sum of $n$ smooth functions, such as SAGA, SAG, SDCA, and SDCA without duality.Moreover, the authors present lower bound results that show this speedup is optimal, and provide analysis to help explain why this speedup exists. |
Robert Hannah, Yanli Liu, Daniel O'Connor, Wotao Yin |
Optimization For Approximate Submodularity
The authors consider the problem of maximizing a submodular function when given access to its approximate version.The authors describe a technique which the authors call the sampledmean approximation that yields strong guarantees for maximization of submodular functions from approximate surrogates under cardinality and intersection of matroid constraints. |
Yaron Singer, Avinatan Hassidim |
Submodular Maximization Via Gradient Ascent: The Case Of Deep Submodular Functions
The authors study the problem of maximizing deep submodular functions (DSFs) subject to a matroid constraint. |
Wenruo Bai, William Stafford Noble, Jeff Bilmes |
Maximizing Induced Cardinality Under A Determinantal Point Process
Determinantal point processes (DPPs) are well-suited to recommender systems where the goal is to generate collections of diverse, high-quality items. |
Jennifer Gillenwater, Alex Kulesza, Sergei Vassilvitskii, Zelda Mariet |
Efficient Algorithms For Non-convex Isotonic Regression Through Submodular Optimization
The authors consider the minimization of submodular functions subject to ordering constraints.The authors propose new discretization schemes that lead to simple and efficient algorithms based on zero-th, first, or higher order oracles; these algorithms also lead to improvements without isotonic constraints. |
Francis Bach |
Revisiting Decomposable Submodular Function Minimization With Incidence Relations
The authors introduce a new approach to decomposable submodular function minimization (DSFM) that exploits incidence relations. |
Pan Li, Olgica Milenkovic |
Coordinate Descent With Bandit Sampling
Coordinate descent methods minimize a cost function by updating a single decision variable (corresponding to one coordinate) at a time.Therefore, the authors propose a new adaptive method for coordinate descent. |
Farnood Salehi, Patrick Thiran, Ecelis Celis |
Accelerated Stochastic Matrix Inversion: General Theory And Speeding Up BFGS Rules For Faster Second-Order Optimization
The authors present the first accelerated randomized algorithm for solving linear systems in Euclidean spaces. |
Robert Gower, Filip Hanzely, Peter Richtarik, Sebastian Stich |
Stochastic Cubic Regularization For Fast Nonconvex Optimization
This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak]. |
Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeff Regier, Michael I. Jordan |
On The Local Minima Of The Empirical Risk | Chi Jin, Lydia T. Liu, Rong Ge, Michael I. Jordan |
Stochastic Nested Variance Reduced Gradient Descent For Nonconvex Optimization
The authors study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. |
Dongruo Zhou, Pan Xu, Quanquan Gu |
NEON2: Finding Local Minima Via First-Order Oracles
The authors propose a reduction for non-convex optimization that can (1) turn an stationary-point finding algorithm into an local-minimum finding one, and (2) replace the Hessian-vector product computations with only gradient computations. |
Zeyuan Allen-Zhu, Yuanzhi Li |
How Much Restricted Isometry Is Needed In Nonconvex Matrix Recovery?
When the linear measurements of an instance of low-rank matrix recoverysatisfy a restricted isometry property (RIP) --- i.e. |
Richard Zhang, Cedric Josz, Somayeh Sojoudi, Javad Lavaei |
Escaping Saddle Points In Constrained Optimization
In this paper, the authors study the problem of escaping from saddle points in smoothnonconvex optimization problems subject to a convex set $mathcal{C}$. |
Aryan Mokhtari, Asuman Ozdaglar, Ali Jadbabaie |
Analysis Of Krylov Subspace Solutions Of Regularized Non-Convex Quadratic Problems
The authors provide convergence rates for Krylov subspace solutions to the trust-region and cubic-regularized (nonconvex) quadratic problems. |
Yair Carmon, John Duchi |
SPIDER: Near-Optimal Non-Convex Optimization Via Stochastic Path-Integrated Differential Estimator
In this paper, the authors propose a new technique named extit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interests with significantly reduced computational cost. |
Cong Fang, Chris Junchi Li, Zhouchen Lin, Tong Zhang |
Natasha 2: Faster Non-Convex Optimization Than SGD
The authors design a stochastic algorithm to find $varepsilon$-approximate local minima of any smooth nonconvex function in rate $O(varepsilon^{-3.25})$, with only oracle access to stochastic gradients. |
Zeyuan Allen-Zhu |
Zeroth-Order Stochastic Variance Reduction For Nonconvex Optimization | Sijia Liu, Bhavya Kailkhura, Pin-Yu Chen, Paishun Ting, Shiyu Chang, Lisa Amini |
Structured Local Minima In Sparse Blind Deconvolution
Blind deconvolution is a ubiquitous problem of recovering two unknown signals from their convolution. |
Yuqian Zhang, Henry Kuo, John Wright |
Algorithmic Regularization In Learning Deep Homogeneous Models: Layers Are Automatically Balanced
The authors study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear, ReLU or Leaky ReLU activation. |
Simon Du, Wei Hu, Jason Lee |
Are ResNets Provably Better Than Linear Predictors?
A residual network (or ResNet) is a standard deep neural net architecture, with state-of-the-art performance across numerous applications. |
Ohad Shamir |
Adaptive Methods For Nonconvex Optimization
Adaptive gradient methods that rely on scaling gradients down by the square root of exponential moving averages of past squared gradients, such RMSProp, Adam, Adadelta have found wide application in optimizing the nonconvex problems that arise in deep learning. |
Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar |
Alternating Optimization Of Decision Trees, With Application To Learning Sparse Oblique Trees
Learning a decision tree from data is a difficult optimization problem. |
Miguel A. Carreira-Perpinan, Pooya Tavallali |
GILBO: One Metric To Measure Them All
The authors propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund). |
Alex Alemi, Ian Fischer |
Isolating Sources Of Disentanglement In Variational Autoencoders
The authors decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables.The authors use this to motivate the beta-TCVAE (Total Correlation Variational Autoencoder) algorithm, a refinement and plug-in replacement of the beta-VAE for learning disentangled representations, requiring no additional hyperparameters during training. |
Ricky Chen, Xuechen Li, Roger Grosse, David Duvenaud |
On GANs And GMMs
A longstanding problem in machine learning is to find unsupervised methods that can learn the statistical structure of high dimensional signals. |
Eitan Richardson, Yair Weiss |
Assessing Generative Models Via Precision And Recall
Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. |
Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, Sylvain Gelly |
Gather-Excite: Exploiting Feature Context In Convolutional Neural Networks
While the use of bottom-up local operators in convolutional neural networks (CNNs) matches well some of the statistics of natural images, it may also prevent such models from capturing contextual long-range feature interactions.In this work, the authors propose a simple, lightweight approach for better context exploitation in CNNs. |
Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Andrea Vedaldi |
Uncertainty-Aware Attention For Reliable Interpretation And Prediction
Attention mechanism is effective in both focusing the deep learning models on relevant features and interpreting them.To overcome this limitation, the authors introduce the notion of input-dependent uncertainty to the attention mechanism, such that it generates attention for each feature with varying degrees of noise based on the given input, to learn larger variance on instances it is uncertain about. |
Jay Heo, Hae Beom Lee, Saehoon Kim, Juho Lee, Kwang Joon Kim, Eunho Yang, Sung Ju Hwang |
Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks
The authors introduce the Recurrent Marginal Structural Network - a sequence-to-sequence architecture for forecasting a patient's expected response to a series of planned treatments. |
Bryan Lim, Ahmed M. Alaa, Mihaela Van Der Schaar |
Backpropagation With Callbacks: Foundations For Efficient And Expressive Differentiable Programming
Training of deep learning models depends on gradient descent and end-to-enddifferentiation. |
Fei Wang, James Decker, Xilun Wu, Gregory Essertel, Tiark Rompf |
Recurrent World Models Facilitate Policy Evolution
A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. |
David Ha, Jürgen Schmidhuber |
Long Short-term Memory And Learning-to-learn In Networks Of Spiking Neurons
Recurrent networks of spiking neurons (RSNNs) underlie the astounding computing and learning capabilities of the brain.One is that RSNNs in the brain are not randomly connected or designed according to simple rules, and they do not start learning as a tabula rasa network. |
Guillaume Bellec, Darjan Salaj, Anand Subramoney, Robert Legenstein, Wolfgang Maass |
Distributed Weight Consolidation: A Brain Segmentation Case Study
Collecting the large datasets needed to train deep neural networks can be very difficult, particularly for the many applications for which sharing and pooling data is complicated by practical, ethical, or legal concerns.In this paper, the authors introduce distributed weight consolidation (DWC), a continual learning method to consolidate the weights of separate neural networks, each trained on an independent dataset. |
Patrick McClure, Charles Zheng, Jakub Kaczmarzyk, John Rogers-Lee, Satra Ghosh, Dylan Nielson, Peter A Bandettini, Francisco Pereira |
Learning To Play With Intrinsically-Motivated, Self-Aware Agents
Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals.The authors seek to mathematically formalize these abilities using a neural network that implements curiosity-driven intrinsic motivation. |
Nick Haber, Damian Mrowca, Stephanie Wang, Li Fei-Fei, Daniel Yamins |
Gradient Descent For Spiking Neural Networks
Most large-scale network models use neurons with static nonlinearities that produce analog output, despite the fact that information processing in the brain is predominantly carried out by dynamic neurons that produce discrete pulses called spikes.Here, the authors present a gradient descent method for optimizing spiking network models by introducing a differentiable formulation of spiking dynamics and deriving the exact gradient calculation. |
Ben Huh, Terrence J Sejnowski |
Demystifying Excessively Volatile Human Learning: A Bayesian Persistent Prior And A Neural Approximation
Understanding how humans and animals learn about statistical regularities in stable and volatile environments, and utilize these regularities to make predictions and decisions, is an important problem in neuroscience and psychology.Because exact Bayesian inference in this setting, an example of switching state space models, is computationally intense, a number of approximately Bayesian and heuristic algorithms have been proposed to account for learning/prediction in the brain. |
Chaitanya Ryali, Gautam Reddy, Angela J Yu |
Temporal Alignment And Latent Gaussian Process Factor Inference In Population Spike Trains
The authors introduce a novel scalable approach to identifying common latent structure in neural population spike-trains, which allows for variability both in the trajectory and in the rate of progression of the underlying computation. |
Lea Duncker, Maneesh Sahani |
Information-based Adaptive Stimulus Selection To Optimize Communication Efficiency In Brain-Computer Interfaces
Stimulus-driven brain-computer interfaces (BCIs), such as the P300 speller, rely on using a sequence of sensory stimuli to elicit specific neural responses as control signals, while a user attends to relevant target stimuli that occur within the sequence.In current BCIs, the stimulus presentation schedule is typically generated in a pseudo-random fashion. |
Boyla Mainsah, Dmitry Kalika, Leslie Collins, Siyuan Liu, Chandra Throckmorton |
Model-based Targeted Dimensionality Reduction For Neuronal Population Data
Summarizing high-dimensional data using a small number of parameters is a ubiquitous first step in the analysis of neuronal population activity.Recently developed methods use "targeted" approaches that work by identifying multiple, distinct low-dimensional subspaces of activity that capture the population response to individual experimental task variables, such as the value of a presented stimulus or the behavior of the animal. |
Mikio Aoi, Jonathan W Pillow |
Objective And Efficient Inference For Couplings In Neuronal Networks
Inferring directional couplings from the spike data of networks is desired in various scientific fields such as neuroscience.Here, the authors apply a recently proposed objective procedure to the spike data obtained from the Hodgkin-Huxley type models and in vitro neuronal networks cultured in a circular structure. |
Yu Terada, Tomoyuki Obuchi, Takuya Isomura, Yoshiyuki Kabashima |
The Emergence Of Multiple Retinal Cell Types Through Efficient Coding Of Natural Movies
One of the most striking aspects of early visual processing in the retina is the immediate parcellation of visual information into multiple parallel pathways, formed by different retinal ganglion cell types each tiling the entire visual field. |
Samuel Ocko, Jack Lindsey, Surya Ganguli, Stephane Deny |
Benefits Of Over-parameterization With EM
Expectation Maximization (EM) is among the most popular algorithms for maximum likelihood estimation, but it is generally only guaranteed to find its stationary points of the log-likelihood objective.The goal of this article is to present theoretical and empirical evidence that over-parameterization can help EM avoid spurious local optima in the log-likelihood. |
Ji Xu, Daniel Hsu, Arian Maleki |
On Coresets For Logistic Regression
Coresets are one of the central methods to facilitate the analysis of large data.To deal with intractable worst-case instances the authors introduce a complexity measure $mu(X)$, which quantifies the hardness of compressing a data set for logistic regression. |
Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David Woodruff |
On Learning Markov Chains
The problem of estimating an unknown discrete distribution from its samples is a fundamental tenet of statistical learning. |
Yi HAO, Alon Orlitsky, Venkatadheeraj Pichapati |
Contextual Stochastic Block Models
The authors provide the first information theoretical tight analysis for inference of latent community structure given a sparse graph along with high dimensional node covariates, correlated with the same latent communities. |
Yash Deshpande, Subhabrata Sen, Andrea Montanari, Elchanan Mossel |
Estimators For Multivariate Information Measures In General Probability Spaces | Arman Rahimzamani, Himanshu Asnani, Pramod Viswanath, Sreeram Kannan |
Blind Deconvolutional Phase Retrieval Via Convex Programming
The authors consider the task of recovering two real or complex $m$-vectors from phaseless Fourier measurements of their circular convolution. |
Ali Ahmed, Alireza Aghasi, Paul Hand |
Entropy Rate Estimation For Markov Chains With Large State Space
Entropy estimation is one of the prototypical problems in distribution property testing. |
Yanjun Han, Jiantao Jiao, Chuan-Zheng Lee, Tsachy Weissman, Yihong Wu, Tiancheng Yu |
Bandit Learning In Concave N-Person Games
This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. |
Mario Bravo, David Leslie, Panayotis Mertikopoulos |
Depth-Limited Solving For Imperfect-Information Games
A fundamental challenge in imperfect-information games is that states do not have well-defined values.This paper introduces a principled way to conduct depth-limited solving in imperfect-information games by allowing the opponent to choose among a number of strategies for the remainder of the game at the depth limit. |
Noam Brown, Tuomas Sandholm, Brandon Amos |
The Physical Systems Behind Optimization Algorithms | Lin Yang, Raman Arora, Vladimir Braverman, Tuo Zhao |
The Nearest Neighbor Information Estimator Is Adaptively Near Minimax Rate-Optimal
The authors analyze the Kozachenko–Leonenko (KL) fixed k-nearest neighbor estimator for the differential entropy. |
Jiantao Jiao, Weihao Gao, Yanjun Han |
Robust Learning Of Fixed-Structure Bayesian Networks
The authors investigate the problem of learning Bayesian networks in a robust model where an $epsilon$-fraction of the samples are adversarially corrupted.In this work, the authors study the fully observable discrete case where the structure of the network is given. |
Yu Cheng, Ilias Diakonikolas, Daniel Kane, Alistair Stewart |
Information-theoretic Limits For Community Detection In Network Models
The authors analyze the information-theoretic limits for the recovery of node labels in several network models. |
Chuyang Ke, Jean Honorio |
Generalizing Graph Matching Beyond Quadratic Assignment Model
Graph matching has received persistent attention over decades, which can be formulated as a quadratic assignment problem (QAP).The authors show that a large family of functions, which the authors define as Separable Functions, can approximate discrete graph matching in the continuous domain asymptotically by varying the approximation controlling parameters. |
Tianshu Yu, Junchi Yan, Yilin Wang, Wei Liu, Baoxin Li |
Improving Simple Models With Confidence Profiles
In this paper, the authors propose a new method called ProfWeight for transferring information from a pre-trained deep neural network that has a high test accuracy to a simpler interpretable model or a very shallow network of low complexity and a priori low test accuracy. |
Amit Dhurandhar, Karthikeyan Shanmugam, Ronny Luss, Peder A Olsen |
Online Learning With An Unknown Fairness Metric
The authors consider the problem of online learning in the linear contextual bandits setting, but in which there are also strong individual fairness constraints governed by an unknown similarity metric.This is intended to represent the interventions of a regulator who "knows unfairness when he sees it" but nevertheless cannot enunciate a quantitative fairness metric over individuals. |
Stephen Gillen, Christopher Jung, Michael Kearns, Aaron Roth |
Legendre Decomposition For Tensors
The authors present a novel nonnegative tensor decomposition method, called Legendre decomposition, which factorizes an input tensor into a multiplicative combination of parameters. |
Mahito Sugiyama, Hiroyuki Nakahara, Koji Tsuda |
The Price Of Privacy For Low-rank Factorization
In this paper, the authors study what price one has to pay to release emph{differentially private low-rank factorization} of a matrix. |
Jalaj Upadhyay |
Empirical Risk Minimization In Non-interactive Local Differential Privacy Revisited
In this paper, the authors revisit the Empirical Risk Minimization problem in the non-interactive local model of differential privacy.Then, the authors propose player-efficient algorithms with $1$-bit communication complexity and $O(1)$ computation cost for each player. |
Di Wang, Marco Gaboardi, Jinhui Xu |
Differentially Private Change-Point Detection
The change-point detection problem seeks to identify distributional changes at an unknown change-point k* in a stream of data.The authors study the statistical problem of change-point problem through the lens of differential privacy. |
Sara Krehbiel, Rachel Cummings, Wanrong Zhang, Yajun Mei, Rui Tuo |
Scalable Laplacian K-modes
The authors advocate Laplacian K-modes for joint clustering and density mode finding,and propose a concave-convex relaxation of the problem, which yields a parallelalgorithm that scales up to large datasets and high dimensions. |
Imtiaz Ziko, Eric Granger, Ismail Ben Ayed |
Geometrically Coupled Monte Carlo Sampling
Monte Carlo sampling in high-dimensional, low-sample settings is important in many machine learning tasks.The authors improve current methods for sampling in Euclidean spaces by avoiding independence, and instead consider ways to couple samples. |
Mark Rowland, Krzysztof Choromanski, François Chalus, Aldo Pacchiano, Tamas Sarlos, Richard E Turner, Adrian Weller |
Continuous-time Value Function Approximation In Reproducing Kernel Hilbert Spaces
Motivated by the success of reinforcement learning (RL) for discrete-time tasks such as AlphaGo and Atari games, there has been a recent surge of interest in using RL for continuous-time control of physical systems (cf.many challenging tasks in OpenAI Gym and DeepMind Control Suite).Since discretization of time is susceptible to error, it is methodologically more desirable to handle the system dynamics directly in continuous time.However, very few techniques exist for continuous-time RL and they lack flexibility in value function approximation.In this paper, the authors propose a novel framework for model-based continuous-time value function approximation in reproducing kernel Hilbert spaces.The resulting framework is so flexible that it can accommodate any kind of kernel-based approach, such as Gaussian processes and kernel adaptive filters, and it allows us to handle uncertainties and nonstationarity without prior knowledge about the environment or what basis functions to employ.The authors demonstrate the validity of the presented framework through experiments. |
Motoya Ohnishi, Masahiro Yukawa, Mikael Johansson, Masashi Sugiyama |
Faster Online Learning Of Optimal Threshold For Consistent F-measure Optimization
In this paper, the authors consider online F-measure optimization (OFO).To advance OFO, the authors propose an efficient online algorithm based on simultaneously learning a posterior probability of class and learning an optimal threshold by minimizing a stochastic strongly convex function with unknown strong convexity parameter. |
Xiaoxuan Zhang, Mingrui Liu, Xun Zhou, Tianbao Yang |
Reducing Network Agnostophobia
Agnostophobia, the fear of the unknown, can be experienced by deep learning engineers while applying their networks to real-world applications.The authors also introduce a new evaluation metric that focuses on comparing the performance of multiple approaches in scenarios where such unseen classes or unknowns are encountered. |
Akshay Raj Dhamija, Manuel Günther, Terrance Boult |
Life-Long Disentangled Representation Learning With Cross-Domain Latent Homologies
The authors propose a novel algorithm for unsupervised representation learning from piece-wise stationary visual data: Variational Autoencoder with Shared Embeddings (VASE). |
Alessandro Achille, Tom Eccles, Loic Matthey, Chris Burgess, Nick Watters, Alexander Lerchner, Irina Higgins |
Near-Optimal Policies For Dynamic Multinomial Logit Assortment Selection Models
In this paper the authors consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. |
Yining Wang, Xi Chen, Yuan Zhou |
The Everlasting Database: Statistical Validity At A Fair Price
The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries.The authors propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional samples. |
Blake Woodworth, Vitaly Feldman, Saharon Rosset, Nati Srebro |
Scalar Posterior Sampling With Applications
The authors propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. |
Georgios Theocharous, Zheng Wen, Yasin Abbasi, Nikos Vlassis |
Iterative Value-Aware Model Learning
This paper introduces a model-based reinforcement learning (MBRL) framework that incorporates the underlying decision problem in learning the transition model of the environment. |
Amir-massoud Farahmand |
A Lyapunov-based Approach To Safe Reinforcement Learning
In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints.The authors define and present a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local linear constraints. |
Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh |
Temporal Regularization For Markov Decision Process
Several applications of Reinforcement Learning suffer from instability due to highvariance. |
Pierre Thodoroff, Audrey Durand, Joelle Pineau, Doina Precup |
Maximum Causal Tsallis Entropy Imitation Learning
In this paper, the authors propose a novel maximum causal Tsallis entropy (MCTE) framework for imitation learning which can efficiently learn a sparse multi-modal policy distribution from demonstrations. |
Kyungjae Lee, Sungjoon Choi, Songhwai Oh |
Policy Optimization Via Importance Sampling
Policy optimization is an effective reinforcement learning approach to solve continuous control tasks.However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate. |
Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli |
Reinforcement Learning Of Theorem Proving
The authors introduce a theorem proving algorithm that uses practically no domain heuristics for guiding its connection-style proof search. |
Cezary Kaliszyk, Josef Urban, Henryk Michalewski, Miroslav Olšák |
Simple Random Search Of Static Linear Policies Is Competitive For Reinforcement Learning
Model-free reinforcement learning aims to offer off-the-shelf solutions for controlling dynamical systems without requiring models of the system dynamics.The authors introduce a model-free random search algorithm for training static, linear policies for continuous control problems. |
Horia Mania, Aurelia Guy, Benjamin Recht |
Meta-Gradient Reinforcement Learning
The goal of reinforcement learning algorithms is to estimate and/or optimisethe value function. |
Zhongwen Xu, Hado Van Hasselt, David Silver |
Reinforcement Learning For Solving The Vehicle Routing Problem
The authors present an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning. |
MohammadReza Nazari, Afshin Oroojlooy, Lawrence Snyder, Martin Takac |
Learn What Not To Learn: Action Elimination With Deep Reinforcement Learning
Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.In this work, the authors propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates sub-optimal actions. |
Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J Mankowitz, Shie Mannor |
REFUEL: Exploring Sparse Features In Deep Reinforcement Learning For Fast Disease Diagnosis
This paper proposes REFUEL, a reinforcement learning method with two techniques: {em reward shaping} and {em feature rebuilding}, to improve the performance of online symptom checking for disease diagnosis. |
Yu-Shao Peng, Kai-Fu Tang, Hsuan-Tien Lin, Edward Chang |
Learning Plannable Representations With Causal InfoGAN | Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel |
Improving Exploration In Evolution Strategies For Deep Reinforcement Learning Via A Population Of Novelty-Seeking Agents
Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g.This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration. |
Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth Stanley, Jeff Clune |
Transfer Of Deep Reactive Policies For MDP Planning
Domain-independent probabilistic planners input an MDP description in a factored representation language such as PPDDL or RDDL, and exploit the specifics of the representation for faster planning. |
Aniket (Nick) Bajpai, Sankalp Garg, Mausam |
Q-learning With Nearest Neighbors
The authors consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. |
Devavrat Shah, Qiaomin Xie |
Distributed Multitask Reinforcement Learning With Quadratic Convergence
Multitask reinforcement learning (MTRL) suffers from scalability issues when the number of tasks or trajectories grows large.Recent methods exploited the connection between MTRL and general consensus to propose scalable solutions. |
Rasul Tutunov, Dongho Kim, Haitham Bou Ammar |
Breaking The Curse Of Horizon: Infinite-Horizon Off-Policy Estimation
The authors consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy. |
Qiang Liu, Lihong Li, Ziyang Tang, Denny Zhou |
Constrained Cross-Entropy Method For Safe Reinforcement Learning
The authors study a safe reinforcement learning problem in which the constraints are defined as the expected cost over finite-length trajectories. |
Min Wen, Ufuk Topcu |
Representation Balancing MDPs For Off-policy Policy Evaluation
The authors study the problem of off-policy policy evaluation (OPPE) in RL. |
Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo A Faisal, Finale Doshi-Velez, Emma Brunskill |
Dual Policy Iteration
Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [1], AlphaGo-Zero from [2]).In this work the authors study this Dual Policy Iteration (DPI) strategy in an alternating optimization framework and provide a convergence analysis that extends existing API theory. |
Wen Sun, Geoffrey Gordon, Byron Boots, J. Bagnell |
Occam's Razor Is Insufficient To Infer The Preferences Of Irrational Agents
Inverse reinforcement learning (IRL) attempts to infer human rewards or preferences from observed behavior. |
Stuart Armstrong, Sören Mindermann |
Transfer Of Value Functions Via Variational Methods | Andrea Tirinzoni, Rafael Rodriguez Sanchez, Marcello Restelli |
Reinforcement Learning With Multiple Experts: A Bayesian Model Combination Approach
Potential based reward shaping is a powerful technique for accelerating convergence of reinforcement learning algorithms. |
Mike Gimelfarb, Scott Sanner, Chi-Guhn Lee |
Online Robust Policy Learning In The Presence Of Unknown Adversaries
The growing prospect of deep reinforcement learning (DRL) being used in cyber-physical systems has raised concerns around safety and robustness of autonomous agents.This paper introduces a Meta-Learned Advantage Hierarchy (MLAH) framework that is attack model-agnostic and more suited to reinforcement learning, via handling the attacks in the decision space (as opposed to data space) and directly mitigating learned bias introduced by the adversary. |
Aaron Havens, Zhanhong Jiang, Soumik Sarkar |
A Bayesian Approach To Generative Adversarial Imitation Learning
Generative adversarial training for imitation learning has shown promising results on high-dimensional and continuous control tasks.Although this approach has shown to robustly learn to imitate even with scarce demonstration, one must still address the inherent challenge that collecting trajectory samples in each iteration is a costly operation. |
Wonseok Jeon, Seokin Seo, Kee-Eung Kim |
Verifiable Reinforcement Learning Via Policy Extraction
While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies.The authors propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). |
Osbert Bastani, Yewen Pu, Armando Solar-Lezama |
Deep Reinforcement Learning Of Marked Temporal Point Processes
In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time.Can the authors design online interventions that will help humans achieve certain goals in such asynchronous setting? |
Utkarsh Upadhyay, Abir De, Manuel Gomez Rodriguez |
On Learning Intrinsic Rewards For Policy Gradient Methods
In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. |
Zeyu Zheng, Junhyuk Oh, Satinder Singh |
Evolution-Guided Policy Gradient In Reinforcement Learning
Deep Reinforcement Learning (DRL) algorithms have been successfully applied to a range of challenging control tasks.In this paper, the authors introduce Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into the EA population periodically to inject gradient information into the EA. |
Shaw Khadka, Kagan Tumer |
Meta-Reinforcement Learning Of Structured Exploration Strategies
Exploration is a fundamental challenge in reinforcement learning (RL).In this work, westudy how prior tasks can inform an agent about how to explore effectively in newsituations. |
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine |
Diversity-Driven Exploration Strategy For Deep Reinforcement Learning
Efficient exploration remains a challenging research problem in reinforcement learning, especially when an environment contains large state spaces, deceptive local optima, or sparse rewards.To tackle this problem, the authors present a diversity-driven approach for exploration, which can be easily combined with both off- and on-policy reinforcement learning algorithms. |
Zhang-Wei Hong, Ariel Shann, Shih-Yang Su, Yi-Hsiang Chang, Tsu-Jui Fu, Chun-Yi Lee |
Genetic-Gated Networks For Deep Reinforcement Learning
The authors introduce the Genetic-Gated Networks (G2Ns), simple neural networks that combine a gate vector composed of binary genetic genes in the hidden layer(s) of networks. |
Simyung Chang, John Yang, Jaeseok Choi, Nojun Kwak |
Memory Augmented Policy Optimization For Program Synthesis And Semantic Parsing
The authors present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. |
Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc V Le, Ni Lao |
Hardware Conditioned Policies For Multi-Robot Transfer Learning
Deep reinforcement learning could be used to learn dexterous robotic policies but it is challenging to transfer them to new robots with vastly different hardware properties.The authors propose a novel approach called Hardware Conditioned Policies where the authors train a universal policy conditioned on a vector representation of robot hardware. |
Tao Chen, Adithyavairavan Murali, Abhinav Gupta |
Reward Learning From Human Preferences And Demonstrations In Atari
To solve complex real-world problems with reinforcement learning, the authors cannot rely on manually specified reward functions.Additionally, the authors investigate the fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels. |
Jan Leike, Borja Ibarz, Dario Amodei, Geoffrey Irving, Shane Legg |
Graph Convolutional Policy Network For Goal-Directed Molecular Graph Generation
Generating novel graph structures that optimize given objectives while obeying some given underlying rules is fundamental for chemistry, biology and social science research.However, designing models that finds molecules that optimize desired properties while incorporating highly complex and non-differentiable rules remains to be a challenging task. |
Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, Jure Leskovec |
Visual Reinforcement Learning With Imagined Goals
For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires.In this paper, the authors propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies. |
Ashvin Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine |
Playing Hard Exploration Games By Watching YouTube
Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse.However, these demonstrations are typically collected under artificial conditions, i.e. |
Yusuf Aytar, Tobias Pfaff, David Budden, Thomas Paine, Ziyu Wang, Nando De Freitas |
Unsupervised Video Object Segmentation For Deep Reinforcement Learning
The authors present a new technique for deep reinforcement learning that automatically detects moving objects and uses the relevant information for action selection. |
Vik Goel, Jameson Weng, Pascal Poupart |
Learning To Navigate In Cities Without A Map
Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. |
Piotr Mirowski, Matt Grimes, Mateusz Malinowski, Karl Moritz Hermann, Keith Anderson, Denis Teplyashin, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell |
Learning Abstract Options | Matthew Riemer, Miao Liu, Gerald Tesauro |
Object-Oriented Dynamics Predictor
Generalization has been one of the major challenges for learning dynamics models in model-based reinforcement learning.In this paper, the authors present a novel object-oriented framework, called object-oriented dynamics predictor (OODP), which decomposes the environment into objects and predicts the dynamics of objects conditioned on both actions and object-to-object relations. |
Guangxiang Zhu, Zhiao Huang, Chongjie Zhang |
A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents | YAN ZHENG, Zhaopeng Meng, Jianye Hao, Zongzhang Zhang, Tianpei Yang, Changjie Fan |
Learning Attentional Communication For Multi-Agent Cooperation
Communication could potentially be an effective way for multi-agent cooperation.To tackle these difficulties, in this paper, the authors propose an attentional communication model that learns when communication is needed and how to integrate shared information for cooperative decision making. |
Jiechuan Jiang, Zongqing Lu |
Deep Dynamical Modeling And Control Of Unsteady Fluid Flows
The design of flow control systems remains a challenge due to the nonlinear nature of the equations that govern fluid flow. |
Jeremy Morton, Antony Jameson, Mykel J Kochenderfer, Freddie Witherden |
Adaptive Skip Intervals: Temporal Abstraction For Recurrent Dynamical Models
The authors introduce a method which enables a recurrent dynamics model to be temporally abstract. |
Alexander Neitz, Giambattista Parascandolo, Stefan Bauer, Bernhard Schölkopf |
Zero-Shot Transfer With Deictic Object-Oriented Representation In Reinforcement Learning
Object-oriented representations in reinforcement learning have shown promise in transfer learning, with previous research introducing a propositional object-oriented framework that has provably efficient learning bounds with respect to sample complexity. |
Ofir Marom, Benjamin Rosman |
Total Stochastic Gradient Algorithms And Applications In Reinforcement Learning
Backpropagation and the chain rule of derivatives have been prominent; however,the total derivative rule has not enjoyed the same amount of attention. The authors show how the total derivative rule leads to an intuitive visual framework forcreating gradient estimators on graphical models. |
Paavo Parmas |
Fighting Boredom In Recommender Systems With Linear Reinforcement Learning
A common assumption in recommender systems (RS) is the existence of a best fixed recommendation strategy.Such strategy may be simple and work at the item level (e.g., in multi-armed bandit it is assumed one best fixed arm/item exists) or implement more sophisticated RS (e.g., the objective of A/B testing is to find thebest fixed RS and execute it thereafter). |
Romain WARLOP, Alessandro Lazaric, Jérémie Mary |
Randomized Prior Functions For Deep Reinforcement Learning | Ian Osband, Jaslanides Aslanides, Cassirer Cassirer |
Scalable Coordinated Exploration In Concurrent Reinforcement Learning
The authors consider a team of reinforcement learning agents that concurrently operate in a common environment, and the authors develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. |
Maria Dimakopoulou, Ian Osband, Benjamin Van Roy |
Context-dependent Upper-confidence Bounds For Directed Exploration
Directed exploration strategies for reinforcement learning are critical for learning an optimal policy in a minimal number of interactions with the environment. |
Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White |
Multi-Agent Generative Adversarial Imitation Learning
Imitation learning algorithms can be used to learn a policy from expert demonstrations without access to a reward signal.However, most existing approaches are not applicable in multi-agent settings due to the existence of multiple (Nash) equilibria and non-stationary environments.The authors propose a new framework for multi-agent imitation learning for general Markov games, where the authors build upon a generalized notion of inverse reinforcement learning. |
Jiaming Song, Hongyu Ren, Dorsa Sadigh, Stefano Ermon |
Actor-Critic Policy Optimization In Partially Observable Multiagent Environments
Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence.Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. |
Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling |
Learning To Share And Hide Intentions Using Information Regularization
Learning to cooperate with friends and compete with foes is a key component of multi-agent reinforcement learning. |
DJ Strouse, Max Kleiman-Weiner, Josh Tenenbaum, Matt Botvinick, David Schwab |
Credit Assignment For Collective Multiagent RL With Global Rewards
Scaling decision theoretic planning to large multiagent systems is challenging due to uncertainty and partial observability in the environment.The authors focus on a multiagent planning model subclass, relevant to urban settings, where agent interactions are dependent on their ``collective influence' on each other, rather than their identities. |
Duc Nguyen, Akshat Kumar, Hoong Chuin Lau |
Multi-Agent Reinforcement Learning Via Double Averaging Primal-Dual Optimization
Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents.Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, the authors study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn the value of a given policy. |
Hoi-To Wai, Zhuoran Yang, Princeton Zhaoran Wang, Mingyi Hong |
Learning Others' Intentional Models In Multi-Agent Settings Using Interactive POMDPs
Interactive partially observable Markov decision processes (I-POMDPs) provide a principled framework for planning and acting in a partially observable, stochastic and multi-agent environment. In order to predict other agents' actions using I-POMDPs, authors propose an approach that effectively uses Bayesian inference and sequential Monte Carlo sampling to learn others' intentional models which ascribe to them beliefs, preferences and rationality in action selection. |
Yanlin Han, Piotr Gmytrasiewicz |
Bayesian Control Of Large MDPs With Unknown Dynamics In Data-Poor Environments
The authors propose a Bayesian decision making framework for control of Markov Decision Processes (MDPs) with unknown dynamics and large, possibly continuous, state, action, and parameter spaces in data-poor environments. |
Mahdi Imani, Seyede Fatemeh Ghoreishi, Ulisses M. Braga-Neto |
Negotiable Reinforcement Learning For Pareto Optimal Sequential Decision-Making
It is commonly believed that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a Pareto optimal policy, i.e.To gain insight into the dynamics of this new framework, the authors implement a simple NRL agent and empirically examine its behavior in a simple environment. |
Nishant Desai, Andrew Critch, Stuart J Russell |
Rho-POMDPs Have Lipschitz-Continuous Epsilon-Optimal Value Functions
Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a “fully observable” problem—a belief MDP—and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex ∆).Then, value function approximators are proposed for both upper- and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. |
Mathieu Fehr, Olivier Buffet, Vincent Thomas, Jilles Dibangoye |
Learning Task Specifications From Demonstrations
Real-world applications often naturally decompose into several sub-tasks. |
Marcell Vazquez-Chanlatte, Susmit Jha, Ashish Tiwari, Mark Ho, Sanjit Seshia |
Teaching Inverse Reinforcement Learners Via Features And Demonstrations | Luis Haug, Sebastian Tschiatschek, Adish Singla |
Single-Agent Policy Tree Search With Guarantees
The authors introduce two novel tree search algorithms that use a policy to guidesearch. |
Laurent Orseau, Levi Lelis, Tor Lattimore, Theophane Weber |
From Stochastic Planning To Marginal MAP
It is well known that the problems of stochastic planning and probabilistic inference are closely related.The authors introduce a new reduction from MMAP to maximum expected utility problems which are suitable for the symbolic computation in SOGBOFA. |
Hao Cui, Radu Marinescu, Roni Khardon |
Dual Principal Component Pursuit: Improved Analysis And Efficient Algorithms
Recent methods for learning a linear subspace from data corrupted by outliers are based on convex L1 and nuclear norm optimization and require the dimension of the subspace and the number of outliers to be sufficiently small [27].In sharp contrast, the recently proposed Dual Principal Component Pursuit (DPCP) method [22] can provably handle subspaces of high dimension by solving a non-convex L1 optimization problem on the sphere. |
Zhihui Zhu, Yifan Wang, Daniel Robinson, Daniel Naiman, Rene Vidal, Manolis Tsakiris |
Experimental Design For Cost-Aware Learning Of Causal Graphs
The authors consider the minimum cost intervention design problem: Given the essential graph of a causal graph and a cost to intervene on a variable, identify the set of interventions with minimum total cost that can learn any causal graph with the given essential graph. |
Erik Lindgren, Murat Kocaoglu, Alex Dimakis, Sriram Vishwanath |
Removing Hidden Confounding By Experimental Grounding
Observational data is increasingly used as a means for making individual-level causal predictions and intervention recommendations.The authors introduce a novel method of using limited experimental data to correct the hidden confounding in causal effect models trained on larger observational data, even if the observational data does not fully overlap with the experimental data. |
Nathan Kallus, Aahlad Manas Puli, Uri Shalit |
Domain Adaptation By Using Causal Inference To Predict Invariant Conditional Distributions | Sara Magliacane, Thijs Van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, Joris M Mooij |
Structural Causal Bandits: Where To Intervene?
The authors study the problem of identifying the best action in a sequential decision-making setting when the reward distributions of the arms exhibit a non-trivial dependence structure, which is governed by the underlying causal model of the domain where the agent is deployed. |
Sanghack Lee, Elias Bareinboim |
Uplift Modeling From Separate Labels | Ikko Yamane, Florian Yger, Jamal Atif, Masashi Sugiyama |
Causal Inference With Noisy And Missing Covariates Via Matrix Factorization
Valid causal inference in observational studies often requires controlling for confounders.The authors propose the use of matrix factorization to infer the confounders from noisy covariates. |
Nathan Kallus, Xiaojie Mao, Madeleine Udell |
Fast Estimation Of Causal Interactions Using Wold Processes
The authors here focus on the task of learning Granger causality matrices for multivariate point processes. |
Flavio Figueiredo, Guilherme Resende Borges, Pedro O.S. Vaz De Melo, Renato Assunção |
Learning And Testing Causal Models With Interventions
The authors consider testing and learning problems on causal Bayesian networks as defined by Pearl (Pearl, 2009). |
Jayadev Acharya, Arnab Bhattacharyya, Constantinos Daskalakis, Saravanan Kandasamy |
Causal Inference Via Kernel Deviance Measures
Discovering the causal structure among a set of variables is a fundamental problem in many areas of science.In this paper, the authors propose Kernel Conditional Deviance for Causal Inference (KCDC) a fully nonparametric causal discovery method based on purely observational data. |
Jovana Mitrovic, Dino Sejdinovic, Yee Whye Teh |
Multi-domain Causal Structure Learning In Linear Systems
The authors study the problem of causal structure learning in linear systems from observational data given in multiple domains, across which the causal coefficients and/or the distribution of the exogenous noises may vary. |
AmirEmad Ghassami, Negar Kiyavash, Biwei Huang, Kun Zhang |
Causal Inference And Mechanism Clustering Of A Mixture Of Additive Noise Models | Shoubo Hu, Zhitang Chen, Vahid Partovi Nia, Laiwan CHAN, Yanhui Geng |
Direct Estimation Of Differences In Causal Graphs
The authors consider the problem of estimating the differences between two causal directed acyclic graph (DAG) models with a shared topological order given i.i.d.samples from each model. |
Yuhao Wang, Chandler Squires, Anastasiya Belyaeva, Caroline Uhler |
Identification And Estimation Of Causal Effects From Dependent Data
The assumption that data samples are independent and identically distributed (iid) is standard in many areas of statistics and machine learning.The authors use segregated graphs [15], a generalization of latent projection mixed graphs [23], to represent causal models of this type and provide a complete algorithm for non-parametric identification in these models. |
Eli Sherman, Ilya Shpitser |
Multilingual Anchoring: Interactive Topic Modeling And Alignment Across Languages
Multilingual topic models can reveal patterns in cross-lingual document collections. |
Michelle Yuan, Benjamin Van Durme , Jordan Ying |
Submodular Field Grammars: Representation, Inference, And Application To Image Parsing
Natural scenes contain many layers of part-subpart structure, and distributions over them are thus naturally represented by stochastic image grammars, with one production per decomposition of a part. |
Abe L Friesen, Pedro Domingos |
Autoconj: Recognizing And Exploiting Conjugacy Without A Domain-Specific Language
Deriving conditional and marginal distributions using conjugacy relationships can be time consuming and error prone.In this paper, the authors propose a strategy for automating such derivations. |
Matthew D. Hoffman, Matthew Johnson, Dustin Tran |
Distributionally Robust Graphical Models
In many structured prediction problems, complex relationships between variables are compactly defined using graphical structures.The authors present adversarial graphical models (AGM), a distributionally robust approach for constructing a predictor that performs robustly for a class of data distributions defined using a graphical structure. |
Rizal Fathony, Ashkan Rezaei, Mohammad Ali Bashiri, Xinhua Zhang, Brian Ziebart |
Flexible And Accurate Inference And Learning For Deep Generative Models
The authors introduce a new approach to learning in hierarchical latent-variable generativemodels called the “distributed distributional code Helmholtz machine”, whichemphasises flexibility and accuracy in the inferential process. |
Eszter Vértes, Maneesh Sahani |
Provable Gaussian Embedding With One Observation
The success of machine learning methods heavily relies on having an appropriate representation for data at hand. |
Ming Yu, Zhuoran Yang, Tuo Zhao, Mladen Kolar, Princeton Zhaoran Wang |
Learning And Inference In Hilbert Space With Quantum Graphical Models
Quantum Graphical Models (QGMs) generalize classical graphical models by adopting the formalism for reasoning about uncertainty from quantum mechanics.Unlike classical graphical models, QGMs represent uncertainty with density matrices in complex Hilbert spaces. |
Siddarth Srinivasan, Carlton Downey, Byron Boots |
Multi-value Rule Sets For Interpretable Classification With Feature-Efficient Representations
The authors present the Multi-value Rule Set (MRS) for interpretableclassification with feature efficient presentations. |
Tong Wang |
Nonparametric Bayesian Lomax Delegate Racing For Survival Analysis With Competing Risks
The authors propose Lomax delegate racing (LDR) to explicitly model the mechanism of survival under competing risks and to interpret how the covariates accelerate or decelerate the time to event. |
Quan Zhang, Mingyuan Zhou |
Theoretical Guarantees For EM Under Misspecified Gaussian Mixture Models
Recent years have witnessed substantial progress in understanding the behavior of EM for mixture models that are correctly specified.The authors provide two classes of theoretical guarantees: first, the authors characterize the bias introduced due to the misspecification; and second, the authors prove that population EM converges at a geometric rate to the model projection under a suitable initialization condition. |
Raaz Dwivedi, Nhật Hồ, Koulik Khamaru, Martin Wainwright, Michael I. Jordan |
Nonparametric Learning From Bayesian Models With Randomized Objective Functions
Bayesian learning is built on an assumption that the model space contains a true reflection of the data generating mechanism.Here the authors present a Bayesian nonparametric approach to learning that makes use of statistical models, but does not assume that the model is true. |
Simon Lyddon, Stephen Walker, Chris C Holmes |
Rectangular Bounding Process
Stochastic partition models divide a multi-dimensional space into a number of rectangular regions, such that the data within each region exhibit certain types of homogeneity.Due to the nature of their partition strategy, existing partition models may create many unnecessary divisions in sparse regions when trying to describe data in dense regions. |
Xuhui Fan, Bin Li, Scott SIsson |
A Bayesian Nonparametric View On Count-Min Sketch
The count-min sketch is a time- and memory-efficient randomized data structure that provides a point estimate of the number of times an item has appeared in a data stream.The authors present a Bayesian view on the count-min sketch, using the same data structure, but providing a posterior distribution over the frequencies that characterizes the uncertainty arising from the hash-based approximation. |
Diana Cai, Michael Mitzenmacher, Ryan Adams |
Communication Efficient Parallel Algorithms For Optimization On Manifolds | Bayan Saparbayeva, Michael Zhang, Lizhen Lin |
Lifted Weighted Mini-Bucket
Many graphical models, such as Markov Logic Networks (MLNs) with evidence, possess highly symmetric substructures but no exact symmetries.In this paper, the authors present a lifted variant of the Weighted Mini-Bucket elimination algorithm which provides a principled way to (i) exploit the highly symmetric substructure of MLN models, and (ii) incorporate high-order inference terms which are necessary for high quality approximate inference. |
Nicholas Gallo, Alexander Ihler |
Cluster Variational Approximations For Structure Learning Of Continuous-Time Bayesian Networks From Incomplete Data
Continuous-time Bayesian networks (CTBNs) constitute a general and powerful framework for modeling continuous-time stochastic processes on networks.Inspired by recent advances in statistical physics, the authors present a new approximation scheme based on cluster-variational methods that significantly improves upon existing variational approximations. |
Dominik Linzner, Heinz Koeppl |
Faithful Inversion Of Generative Models For Effective Amortized Inference
Inference amortization methods share information across multiple posterior-inference problems, allowing each to be carried out more efficiently.The authors introduce an algorithm for faithfully, and minimally, inverting the graphical model structure of any generative model. |
Stefan Webb, Adam Golinski, Rob Zinkov, Siddharth Narayanaswamy, Tom Rainforth, Yee Whye Teh, Frank Wood |
A Stein Variational Newton Method
Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm: it minimizes the Kullback–Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space [Liu & Wang, NIPS 2016]. |
Gianluca Detommaso, Tiangang Cui, Youssef Marzouk, Alessio Spantini, Robert Scheichl |
Reparameterization Gradient For Non-differentiable Models
The authors present a new algorithm for stochastic variational inference that targets at models with non-differentiable densities. |
Wonyeol Lee, Hangyeol Yu, Hongseok Yang |
Implicit Reparameterization Gradients
By providing a simple and efficient way of computing low-variance gradients of continuous random variables, the reparameterization trick has become the technique of choice for training a variety of latent variable models.The authors introduce an alternative approach to computing reparameterization gradients based on implicit differentiation and demonstrate its broader applicability by applying it to Gamma, Beta, Dirichlet, and von Mises distributions, which cannot be used with the classic reparameterization trick. |
Michael Figurnov, Shakir Mohamed, Andriy Mnih |
SLANG: Fast Structured Covariance Approximations For Bayesian Deep Learning With Natural Gradient
Uncertainty estimation in large deep-learning models is a computationally challengingtask, where it is difficult to form even a Gaussian approximation to theposterior distribution.To address this issue, the authors proposea new stochastic, low-rank, approximate natural-gradient (SLANG) method forvariational inference in large deep models. |
Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Emtiyaz Khan |
Wasserstein Variational Inference
This paper introduces Wasserstein variational inference, a new form of approximate Bayesian inference based on optimal transport theory. |
Luca Ambrogioni, Umut Güçlü, Yağmur Güçlütürk, Max Hinne, Marcel A. J. Van Gerven, Eric Maris |
Adaptive Path-Integral Autoencoders: Representation Learning And Planning For Dynamical Systems
The authors present a representation learning algorithm that learns a low-dimensional latent dynamical system from high-dimensional sequential raw data, e.g., video. |
Jung-Su Ha, Young-Jin Park, Hyeok-Joo Chae, Soon-Seo Park, Han-Lim Choi |
Variational Inference With Tail-adaptive F-Divergence
Variational inference with α-divergences has been widely used in modern probabilisticmachine learning.In this paper, the authors propose a new class oftail-adaptive f-divergences that adaptively change the convex function f with thetail of the importance weights, in a way that theoretically guarantee finite moments,while simultaneously achieving mass-covering properties. |
Dilin Wang, Hao Liu, Qiang Liu |
Boosting Black Box Variational Inference
Approximating a probability density in a tractable manner is a central task in Bayesian statistics.Specifically, they require a custom implementation of the greedy step (called the LMO) for every probabilistic model with respect to an unnatural variational family of truncated distributions. |
Francesco Locatello, Gideon Dresdner, Rajiv Khanna, Isabel Valera, Gunnar Raetsch |
Discretely Relaxing Continuous Variables For Tractable Variational Inference
The authors explore a new research direction in Bayesian variational inference with discrete latent variable priors where the authors exploit Kronecker matrix algebra for efficient and exact computations of the evidence lower bound (ELBO).The proposed "DIRECT" approach has several advantages over its predecessors; (i) it can exactly compute ELBO gradients (i.e. |
Trefor Evans, Prasanth Nair |
Using Large Ensembles Of Control Variates For Variational Inference
Variational inference is increasingly being addressed with stochastic optimization.The authors also present a Bayesian risk minimization framework in which the quality of a procedure for combining control variates is quantified by its effect on optimization convergence rates, which leads to a very simple combination rule. |
Tomas Geffner, Justin Domke |
The Promises And Pitfalls Of Stochastic Gradient Langevin Dynamics
Stochastic Gradient Langevin Dynamics (SGLD) has emerged as a key MCMC algorithm for Bayesian learning from large scale datasets.Several strategies have been suggested to reduce this effect; among them, SGLD Fixed Point (SGLDFP) uses carefully designed control variates to reduce the variance of the stochastic gradients. |
Nicolas Brosse, Alain Durmus, Eric Moulines |
Large-Scale Stochastic Sampling From The Probability Simplex
Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference.To avoid the biases caused by this discretization error, the authors propose the stochastic Cox-Ingersoll-Ross process (SCIR), which removes all discretization error and the authors prove that samples from the SCIR process are asymptotically unbiased. |
Jack Baker, Paul Fearnhead, Emily Fox, Chris Nemeth |
Mirrored Langevin Dynamics
The authors consider the problem of sampling from constrained distributions, which has posed significant challenges to both non-asymptotic analysis and algorithmic design. |
Ya-Ping Hsieh, Ali Kavis, Paul Rolland, Volkan Cevher |
Thermostat-assisted Continuously-tempered Hamiltonian Monte Carlo For Bayesian Learning
In this paper, the authors propose a novel sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for the purpose of multimodal Bayesian learning. |
Rui Luo, Jianhong Wang, Yaodong Yang, Jun WANG, Zhanxing Zhu |
Dimensionally Tight Bounds For Second-Order Hamiltonian Monte Carlo
Hamiltonian Monte Carlo (HMC) is a widely deployed method to sample from high-dimensional distributions in Statistics and Machine learning.HMC is known to run very efficiently in practice and its popular second-order ``leapfrog" implementation has long been conjectured to run in $d^{1/4}$ gradient evaluations. |
Oren Mangoubi, Nisheeth Vishnoi |
Global Convergence Of Langevin Dynamics Based Algorithms For Nonconvex Optimization
The authors present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with $n$ component functions. |
Pan Xu, Jinghui Chen, Difan Zou, Quanquan Gu |
Meta-Learning MCMC Proposals
Effective implementations of sampling-based probabilistic inference often require manually constructed, model-specific proposals. |
Tongzhou Wang, YI WU, Dave Moore, Stuart Russell |
Posterior Concentration For Sparse Deep Learning
The authors introduce Spike-and-Slab Deep Learning (SS-DL), a fully Bayesian alternative to dropout for improving generalizability of deep ReLU networks. |
Veronika Rockova, Nicholas Polson |
Analytic Solution And Stationary Phase Approximation For The Bayesian Lasso And Elastic Net
The lasso and elastic net linear regression models impose a double-exponential prior distribution on the model parameters to achieve regression shrinkage and variable selection, allowing the inference of robust models from large data sets. |
Tom Michoel |
Bayesian Model Selection Approach To Boundary Detection With Non-Local Priors
Based on non-local prior distributions, the authors propose a Bayesian model selection (BMS) procedure for boundary detection in a sequence of data with multiple systematic mean changes. |
Fei Jiang, Guosheng Yin, Francesca Dominici |
Graphical Model Inference: Sequential Monte Carlo Meets Deterministic Approximations
Approximate inference in probabilistic graphical models (PGMs) can be grouped into deterministic methods and Monte-Carlo-based methods.In this paper the authors present a way of bridging the gap between deterministic and stochastic inference. |
Fredrik Lindsten, Jouni Helske, Matti Vihola |
Implicit Probabilistic Integrators For ODEs
The authors introduce a family of implicit probabilistic integrators for initial value problems (IVPs), taking as a starting point the multistep Adams–Moulton method. |
Onur Teymur, Han Lie, Tim Sullivan, Ben Calderhead |
A Bayes-Sard Cubature Method
This paper focusses on the formulation of numerical integration as an inferential task.To address these drawbacks the authors introduce Bayes-Sard cubature, a probabilistic framework that combines the flexibility of Bayesian cubature with the robustness of classical cubatures which are well-established. |
Toni Karvonen, Chris J Oates, Simo Sarkka |
Deep State Space Models For Time Series Forecasting
The authors present a novel approach to probabilistic time series forecasting that combines state space models with deep learning. |
Syama Sundar Rangapuram, Matthias W Seeger, Jan Gasthaus, Lorenzo Stella, Bernie Wang, Tim Januschowski |
BRUNO: A Deep Recurrent Model For Exchangeable Data
The authors present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations. |
Iryna Korshunova, Jonas Degrave, Ferenc Huszar, Yarin Gal, Arthur Gretton, Joni Dambre |
Scaling Gaussian Process Regression With Derivatives
Gaussian processes (GPs) with derivatives are useful in many applications, including Bayesian optimization, implicit surface reconstruction, and terrain reconstruction.The authors propose iterative solvers using fast $mathcal{O}(nd)$ matrix-vector multiplications (MVMs), together with pivoted Cholesky preconditioning that cuts the iterations to convergence by several orders of magnitude, allowing for fast kernel learning and prediction. |
David Eriksson, Kun Dong, Eric Lee, David Bindel, Andrew Wilson |
Algebraic Tests Of General Gaussian Latent Tree Models
The authors consider general Gaussian latent tree models in which the observed variables are not restricted to be leaves of the tree.Illustrating with the star tree, the authors propose a new testing methodology that circumvents singularity issues by trading off some statistical estimation efficiency and handles cases with many constraints through recent advances on Gaussian approximation for maxima of sums of high-dimensional random vectors. |
Dennis Leung, Mathias Drton |
Differentially Private Bayesian Inference For Exponential Families | Garrett Bernstein, Dan Sheldon |
Semi-crowdsourced Clustering With Deep Generative Models | Yucen Luo, TIAN TIAN, Jiaxin Shi, Jun Zhu, Bo Zhang |
Deep Poisson Gamma Dynamical Systems
The authors develop deep Poisson-gamma dynamical systems (DPGDS) to model sequentially observed multivariate count data, improving previously proposed models by not only mining deep hierarchical latent structure from the data, but also capturing both first-order and long-range temporal dependencies. |
Dandan Guo, Bo Chen, Hao Zhang, Mingyuan Zhou |
Deep State Space Models For Unconditional Word Generation
Autoregressive feedback is considered a necessity for successful unconditional text generation using stochastic sequence models.However, such feedback is known to introduce systematic biases into the training process and it obscures a principle of generation: committing to global information and forgetting local nuances. |
Florian Schmidt, Thomas Hofmann |
Modular Networks: Learning To Decompose Neural Computation
Scaling model capacity has been vital in the success of deep learning.The authors propose a training algorithm that flexibly chooses neural modules based on the data to be processed. |
Louis Kirsch, Julius Kunze, David Barber |
Gaussian Process Prior Variational Autoencoders
Variational autoencoders (VAE) are a powerful and widely-used class of models to learn complex data distributions in an unsupervised fashion.One important limitation of VAEs is the prior assumption that latent sample representations are independent and identically distributed. |
Francesco Paolo Casale, Adrian Dalca, Luca Saglietti, Jennifer Listgarten, Nicolo Fusi |
Bayesian Semi-supervised Learning With Graph Gaussian Processes
The authors propose a data-efficient Gaussian process-based Bayesian approach to the semi-supervised learning problem on graphs. |
Yin Cheng Ng, Nicolò Colombo, Ricardo Silva |
Inference In Deep Gaussian Processes Using Stochastic Gradient Hamiltonian Monte Carlo
Deep Gaussian Processes (DGPs) are hierarchical generalizations of Gaussian Processes that combine well calibrated uncertainty estimates with the high flexibility of multilayer models.To efficiently optimize the hyperparameters, the authors introduce the Moving Window MCEM algorithm. |
Marton Havasi, Jose Miguel Hernández-Lobato, Juan José Murillo-Fuentes |
Variational Bayesian Monte Carlo
Many probabilistic models of interest in scientific computing and machine learning have expensive, black-box likelihoods that prevent the application of standard techniques for Bayesian inference, such as MCMC, which would require access to the gradient or a large number of likelihood evaluations.The authors introduce here a novel sample-efficient inference framework, Variational Bayesian Monte Carlo (VBMC). |
Luigi Acerbi |
Bayesian Alignments Of Warped Multi-Output Gaussian Processes
The authors propose a novel Bayesian approach to modelling nonlinear alignments of time series based on latent shared information. |
Markus Kaiser, Clemens Otte, Thomas Runkler, Carl Henrik Ek |
Automating Bayesian Optimization With Bayesian Optimization
Bayesian optimization is a powerful tool for global optimization of expensive functions.In this work, the authors introduce a novel automated Bayesian optimization approach that dynamically selects promising models for explaining the observed data using Bayesian Optimization in the model space. |
Gustavo Malkomes, Roman Garnett |
Infinite-Horizon Gaussian Processes
Gaussian processes provide a flexible framework for forecasting, removing noise, and interpreting long temporal datasets.The authors show examples for large finite-length modelling problems, and present how the method runs in real-time on a smartphone on a continuous data stream updated at 100 Hz. |
Arno Solin, James Hensman, Richard E Turner |
Learning Gaussian Processes By Minimizing PAC-Bayesian Generalization Bounds
Gaussian Processes (GPs) are a generic modelling tool for supervised learning.To this end, the authors propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood. |
David Reeb, Andreas Doerr, Sebastian Gerwinn, Barbara Rakitsch |
Algorithmic Linearly Constrained Gaussian Processes
The authors algorithmically construct multi-output Gaussian process priors which satisfy linear differential equations. |
Markus Lange-Hegermann |
Efficient Projection Onto The Perfect Phylogeny Model
Several algorithms build on the perfect phylogeny model to infer evolutionary trees. |
Bei Jia, Surjyendu Ray, Sam Safavi, José Bento |
Distributed $k$-Clustering For Data With Heavy Noise
In this paper, the authors consider the $k$-center/median/means clustering with outliers problems (or the $(k, z)$-center/median/means problems) in the distributed setting. |
Shi Li, Xiangyu Guo |
Communication Compression For Decentralized Training
Optimizing distributed learning systems is an artof balancing between computation and communication.There have been two lines of research that try todeal with slower networks: {em communication compression} forlow bandwidth networks, and {em decentralization} forhigh latency networks.}Although the system implication of such combinationis trivial, the underlying theoretical principle andalgorithm design is challenging: unlike centralized algorithms, simply compressing{ c exchanged information,even in an unbiased stochastic way, within the decentralized network would accumulate the error and cause divergence.} |
Hanlin Tang, Shaoduo Gan, Ce Zhang, Tong Zhang, Ji Liu |
Do Less, Get More: Streaming Submodular Maximization With Subsampling
In this paper, the authors develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. |
Moran Feldman, Amin Karbasi, Ehsan Kazemi |
Optimal Algorithms For Continuous Non-monotone Submodular And DR-Submodular Maximization
In this paper the authors study the fundamental problems of maximizing a continuous non monotone submodular function over a hypercube, with and without coordinate-wise concavity. |
Rad Niazadeh, Tim Roughgarden, Joshua Wang |
Provable Variational Inference For Constrained Log-Submodular Models
Submodular maximization problems appear in several areas of machine learning and data science, as many useful modelling concepts such as diversity and coverage satisfy this natural diminishing returns property.To perform inference in these models the authors design novel variational inference algorithms, which carefully leverage the combinatorial and probabilistic properties of these objects. |
Josip Djolonga, Stefanie Jegelka, Andreas Krause |
Fast Greedy Algorithms For Dictionary Selection With Generalized Sparsity Constraints
In dictionary selection, several atoms are selected from finite candidates that successfully approximate given data points in the sparse representation. |
Kaito Fujii, Tasuku Soma |
Boolean Decision Rules Via Column Generation
This paper considers the learning of Boolean rules in either disjunctive normal form (DNF, OR-of-ANDs, equivalent to decision rule sets) or conjunctive normal form (CNF, AND-of-ORs) as an interpretable model for classification.To handle large datasets, the authors propose an approximate CG algorithm using randomization. |
Sanjeeb Dash, Oktay Gunluk, Dennis Wei |
Computing Kantorovich-Wasserstein Distances On $d$-dimensional Histograms Using $(d+1)$-partite Graphs
This paper presents a novel method to compute the exact Kantorovich-Wasserstein distance between a pair of $d$-dimensional histograms having $n$ bins each. |
Gennaro Auricchio, Federico Bassetti, Stefano Gualandi, Marco Veneroni |
Adaptive Negative Curvature Descent With Applications In Non-convex Optimization
Negative curvature descent (NCD) method has been utilized to design deterministic or stochastic algorithms for non-convex optimization aiming at finding second-order stationary points or local minima. |
Mingrui Liu, Zhe Li, Xiaoyu Wang, Jinfeng Yi, Tianbao Yang |
Implicit Bias Of Gradient Descent On Linear Convolutional Networks
The authors show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $ell_{2/L}$ bridge penalty in the frequency domain. |
Suriya Gunasekar, Jason Lee, Daniel Soudry, Nati Srebro |
Deep Generative Models For Distribution-Preserving Lossy Compression
The authors propose and study the problem of distribution-preserving lossy compression. |
Michael Tschannen, Eirikur Agustsson, Mario Lucic |
Visual Object Networks: Image Generation With Disentangled 3D Representations
Recent progress in deep generative models has led to tremendous breakthroughs in image generation.Different from previous works built on 2D datasets and models, the authors present a new generative model, Visual Object Networks (VON), synthesizing natural images of objects with a disentangled 3D representation. |
Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Josh Tenenbaum, Bill Freeman |
Nonlocal Neural Networks, Nonlocal Diffusion And Nonlocal Modeling
Nonlocal neural networks have been proposed and shown to be effective in several computer vision tasks, where the nonlocal operations can directly capture long-range dependencies in the feature space. |
Yunzhe Tao, Qi Sun, Qiang Du, Wei Liu |
Can We Gain More From Orthogonality Regularizations In Training Deep Networks?
This paper seeks to answer the question: as the (near-) orthogonality of weights is found to be a favorable property for training deep convolutional neural networks, how can the authors enforce it in more effective and easy-to-use ways?The authors observe consistent performance gains after applying those proposed regularizations, in terms of both the final accuracies achieved, and faster and more stable convergences. |
Nitin Bansal, Xiaohan Chen, Zhangyang Wang |
Discrimination-aware Channel Pruning For Deep Neural Networks
Channel pruning is one of the predominant approaches for deep model compression.To this end, the authors introduce additional losses into the network to increase the discriminative power of intermediate layers and then select the most discriminative channels for each layer by considering the additional loss and the reconstruction error. |
Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu |
Probabilistic Model-Agnostic Meta-Learning
Meta-learning for few-shot learning entails acquiring a prior over previous tasks and experiences, such that new tasks be learned from small amounts of data.In this paper, the authors propose a probabilistic meta-learning algorithm that can sample models for a new task from a model distribution. |
Chelsea Finn, Kelvin Xu, Sergey Levine |
FastGRNN: A Fast, Accurate, Stable And Tiny Kilobyte Sized Gated Recurrent Neural Network
This paper develops the FastRNN and FastGRNN algorithms to address the twin RNN limitations of inaccurate training and inefficient prediction. |
Aditya Kusupati, Manish Singh, Kush Bhatia, Ashish Kumar, Prateek Jain, Manik Varma |
Understanding Batch Normalization
Batch normalization (BN) is a technique to normalize activations in intermediate layers of deep neural networks. |
Nils Bjorck, Carla P Gomes, Bart Selman, Kilian Weinberger |
How Many Samples Are Needed To Estimate A Convolutional Neural Network?
A widespread folklore for explaining the success of Convolutional Neural Networks (CNNs) is that CNNs use a more compact representation than the Fully-connected Neural Network (FNN) and thus require fewer training samples to accurately estimate their parameters. |
Simon Du, Yining Wang, Xiyu Zhai, Sivaraman Balakrishnan, Russ Salakhutdinov, Aarti Singh |
Robust Detection Of Adversarial Attacks By Modeling The Intrinsic Properties Of Deep Neural Networks
It has been shown that deep neural network (DNN) based classifiers are vulnerable to human-imperceptive adversarial perturbations which can cause DNN classifiers to output wrong predictions with high confidence.The authors propose an unsupervised learning approach to detect adversarial inputs without any knowledge of attackers. |
Zhihao Zheng, Pengyu Hong |
Combinatorial Optimization With Graph Convolutional Networks And Guided Tree Search
The authors present a learning-based approach to computing solutions for certain NP-hard problems. |
Zhuwen Li, Qifeng Chen, Vladlen Koltun |
Automatic Differentiation In ML: Where We Are And Where We Should Be Going | Bart Van Merrienboer, Olivier Breuleux, Arnaud Bergeron, Pascal Lamblin |
Realistic Evaluation Of Deep Semi-Supervised Learning Algorithms
Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data when labels are limited or expensive to obtain.After creating a unified reimplementation of various widely-used SSL techniques, the authors test them in a suite of experiments designed to address these issues. |
Avital Oliver, Augustus Odena, Colin A Raffel, Dogus Cubuk, Ian Goodfellow |
Toddler-Inspired Visual Object Learning
Real-world learning systems have practical limitations on the quality and quantity of the training datasets that they can collect and consider. |
Sven Bambach, David Crandall, Linda Smith, Chen Yu |
Generalisation In Humans And Deep Neural Networks
The authors compare the robustness of humans and current convolutional deep neural networks (DNNs) on object recognition under twelve different types of image degradations. |
Robert Geirhos, Carlos R. M. Temme, Jonas Rauber, Heiko H. Schütt, Matthias Bethge, Felix A. Wichmann |
Assessing The Scalability Of Biologically-Motivated Deep Learning Algorithms And Architectures
The backpropagation of error algorithm (BP) is impossible to implement in a real brain. |
Sergey Bartunov, Adam Santoro, Blake Richards, Luke Marris, Geoffrey E Hinton, Timothy Lillicrap |
Incorporating Context Into Language Encoding Models For FMRI
Language encoding models help explain language processing in the human brain by learning functions that predict brain responses from the language stimuli that elicited them.In this work the authors instead build encoding models using rich contextual representations derived from an LSTM language model. |
Shailee Jain, Alexander Huth |
Why So Gloomy? A Bayesian Explanation Of Human Pessimism Bias In The Multi-armed Bandit Task
How humans make repeated choices among options with imperfectly known reward outcomes is an important problem in psychology and neuroscience.The authors present data from a human stationary bandit experiment, in which the authors vary the average abundance and variability of reward availability (mean and variance of reward rate distributions). |
Dalin Guo, Angela J Yu |
Mental Sampling In Multimodal Representations | Jianqiao Zhu, Adam Sanborn, Nick Chater |
Integrated Accounts Of Behavioral And Neuroimaging Data Using Flexible Recurrent Neural Network Models
Neuroscience studies of human decision-making abilities commonly involvesubjects completing a decision-making task while BOLD signals arerecorded using fMRI.To address this limitation, weintroduce a new method using recurrent neural network models that areflexible enough to be jointly fitted to the behavioral and neuraldata. |
Amir Dezfouli, Richard Morris, Fabio Ramos, Peter Dayan, Bernard Balleine |
Efficient Inference For Time-varying Behavior During Learning | Nicholas Roy, Ji Hyun Bak, Athena Akrami, Carlos Brody, Jonathan W Pillow |
Multivariate Convolutional Sparse Coding For Electromagnetic Brain Signals | Tom Dupré La Tour, Thomas Moreau, Mainak Jas, Alexandre Gramfort |
Manifold-tiling Localized Receptive Fields Are Optimal In Similarity-preserving Neural Networks
Many neurons in the brain, such as place cells in the rodent hippocampus, have localized receptive fields, i.e., they respond to a small neighborhood of stimulus space.What is the functional significance of such representations and how can they arise? |
Anirvan Sengupta, Cengiz Pehlevan, Mariano Tepper, Alexander Genkin, Dmitri Chklovskii |
Connectionist Temporal Classification With Maximum Entropy Regularization
The authors propose a regularization method based on maximum conditional entropy which penalizes peaky distributions and encourages exploration. |
Hu Liu, Sheng Jin, Changshui Zhang |
Removing The Feature Correlation Effect Of Multiplicative Noise
Multiplicative noise, including dropout, is widely used to regularize deep neural networks (DNNs), and is shown to be effective in a wide range of architectures and tasks.However, high feature correlation is undesirable, as it increases redundancy in representations. |
Zijun Zhang, Yining Zhang, Zongpeng Li |
Overfitting Or Perfect Fitting? Risk Bounds For Classification And Regression Rules That Interpolate
Many modern machine learning models are trained to achieve zero or near-zero training error in order to obtain near-optimal (but non-zero) test error. |
Mikhail Belkin, Daniel Hsu, Partha Mitra |
Smoothed Analysis Of Discrete Tensor Decomposition And Assemblies Of Neurons
The authors analyze linear independence of rank one tensors produced by tensor powers of randomly perturbed vectors. |
Nima Anari, Constantinos Daskalakis, Wolfgang Maass, Christos Papadimitriou, Amin Saberi, Santosh Vempala |
Entropy And Mutual Information In Models Of Deep Neural Networks
The authors examine a class of stochastic deep learning models with a tractable method to compute information-theoretic quantities.(ii) The authors extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method. |
Marylou Gabrié, Andre Manoel, Clément Luneau, Jean Barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová |
The Committee Machine: Computational To Statistical Gaps In Learning A Two-layers Neural Network
Heuristic tools from statistical physics have been used in the past to compute the optimal learning and generalization errors in the teacher-student scenario in multi- layer neural networks.The authors also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters. |
Benjamin Aubin, Antoine Maillard, Jean Barbier, Florent Krzakala, Nicolas Macris, Lenka Zdeborová |
A Unified Framework For Extensive-Form Game Abstraction With Bounds
Abstraction has long been a key component in the practical solving of large-scale extensive-form games.In this paper the authors present a unified framework for analyzing abstractions that can express all types of abstractions and solution concepts used in prior papers with performance guarantees---while maintaining comparable bounds on abstraction quality. |
Christian Kroer, Tuomas Sandholm |
Connecting Optimization And Regularization Paths
The authors study the implicit regularization properties of optimization techniques by explicitly connecting their optimization paths to the regularization paths of ``corresponding' regularized problems. |
Arun Suggala, Adarsh Prasad, Pradeep Ravikumar |
Overlapping Clustering Models, And One (class) SVM To Bind Them All
People belong to multiple communities, words belong to multiple topics, and books cover multiple genres; overlapping clusters are commonplace. |
Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti |
Learning Latent Variable Structured Prediction Models With Gaussian Perturbations
The standard margin-based structured prediction commonly uses a maximum loss over all possible structured outputs.Recent work has proposed the use of the maximum loss over random structured outputs sampled independently from some proposal distribution, with theoretical guarantees. |
Kevin Bello, Jean Honorio |
Self-Supervised Generation Of Spatial Audio For 360° Video
The authors introduce an approach to convert mono audio recorded by a 360° video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere. |
Pedro Morgado, Nuno Nvasconcelos, Timothy Langlois, Oliver Wang |
Symbolic Graph Reasoning Meets Convolutions
Beyond local convolution networks, the authors explore how to harness various external human knowledge for endowing the networks with the capability of semantic global reasoning.CRF) or constraints for modeling broader dependencies, the authors propose a new Symbolic Graph Reasoning (SGR) layer, which performs reasoning over a group of symbolic nodes whose outputs explicitly represent different properties of each semantic in a prior knowledge graph. |
Xiaodan Liang, Zhiting Hu, Hao Zhang, Liang Lin, Eric Xing |
Towards Deep Conversational Recommendations | Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, Chris Pal |
Human-in-the-Loop Interpretability Prior
The authors optimize for interpretability by directly including humans in the optimization loop. |
Isaac Lage, Andrew Ross, Samuel J Gershman, Been Kim, Finale Doshi-Velez |
Why Is My Classifier Discriminatory?
Recent attempts to achieve fairness in predictive models focus on the balance between fairness and accuracy.In this work, the authors argue that the fairness of predictions should be evaluated in context of the data, and that unfairness induced by inadequate samples sizes or unmeasured predictive variables should be addressed through data collection, rather than by constraining the model. |
Irene Chen, Fredrik Johansson, David Sontag |
Link Prediction Based On Graph Neural Networks
Link prediction is a key problem for network-structured data.In this paper, the authors study this heuristic learning paradigm for link prediction. |
Muhan Zhang, Yixin Chen |
KONG: Kernels For Ordered-neighborhood Graphs
The authors present novel graph kernels for graphs with node and edge labels that have ordered neighborhoods, i.e. |
Moez Draief, Konstantin Kutzkov, Kevin Scaman, Milan Vojnovic |
Efficient Stochastic Gradient Hard Thresholding
Stochastic gradient hard thresholding methods have recently been shown to work favorably in solving large-scale empirical risk minimization problems under sparsity or rank constraint.To address these deficiencies, the authors propose an efficient hybrid stochastic gradient hard thresholding (HSG-HT) method that can be provably shown to have sample-size-independent gradient evaluation and hard thresholding complexity bounds. |
Pan Zhou, Xiaotong Yuan, Jiashi Feng |
Measures Of Distortion For Machine Learning
Given data from a general metric space, one of the standard machine learning pipelines is to first embed the data into a Euclidean space and subsequently apply out of the box machine learning algorithms to analyze the data.The quality of such an embedding is typically described in terms of a distortion measure. |
Leena Chennuru Vankadara, Ulrike Von Luxburg |
Relating Leverage Scores And Density Using Regularized Christoffel Functions
Statistical leverage scores emerged as a fundamental tool for matrix sketching and column sampling with applications to low rank approximation, regression, random feature learning and quadrature.Borrowing ideas from the orthogonal polynomial literature, the authors introduce the regularized Christoffel function associated to a positive definite kernel. |
Edouard Pauwels, Francis Bach, Jean-Philippe Vert |
Streaming Kernel PCA With $\tilde{O}(\sqrt{n})$ Random Features
The authors study the statistical and computational aspects of kernel principal component analysis using random Fourier features |
Enayat Ullah, Poorya Mianjy, Teodor Vanislavov Marinov, Raman Arora |
Learning With SGD And Random Features
Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms.More precisely, the authors study the estimator defined by stochastic gradient with mini batches and random features. |
Luigi Carratino, Alessandro Rudi, Lorenzo Rosasco |
But How Does It Work In Theory? Linear SVM With Random Features
The authors prove that, under low noise assumptions, the support vector machine with $Nll m$ random features (RFSVM) can achieve the learning rate faster than $O(1/sqrt{m})$ on a training set with $m$ samples when an optimized feature map is used. |
Yitong Sun, Anna Gilbert, Ambuj Tewari |
Statistical And Computational Trade-Offs In Kernel K-Means
The authors investigate the efficiency of k-means in terms of both statistical and computational requirements.More precisely, the authors study a Nystr"om approach to kernel k-means. |
Daniele Calandriello, Lorenzo Rosasco |
Quadrature-based Features For Kernel Approximation
The authors consider the problem of improving kernel approximation via randomized feature maps.These maps arise as Monte Carlo approximation to integral representations of kernel functions and scale up kernel methods for larger datasets. |
Marina Munkhoeva, Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets |
Processing Of Missing Data By Neural Networks
The authors propose a general, theoretically justified mechanism for processing missing data by neural networks. |
Marek Śmieja, Łukasz Struski, Jacek Tabor, Bartosz Zieliński, Przemysław Spurek |
Constructing Deep Neural Networks By Bayesian Network Structure Learning
The authors introduce a principled approach for unsupervised structure learning of deep neural networks. |
Raanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz, Guy Koren, Gal Novik |
Mallows Models For Top-k Lists
The classic Mallows model is a widely-used tool to realize distributions on per- mutations.The authors demonstrate this by studying two basic problems in this model, namely, sampling and reconstruction, from both algorithmic and experimental points of view. |
Flavio Chierichetti, Anirban Dasgupta, Shahrzad Haddadan, Ravi Kumar, Silvio Lattanzi |
Cooperative Neural Networks (CoNN): Exploiting Prior Independence Structure For Improved Classification
The authors propose a new approach, called cooperative neural networks (CoNN), which use a set of cooperatively trained neural networks to capture latent representations that exploit prior given independence structure. |
Harsh Shrivastava, Eugene Bart, Bob Price, Hanjun Dai, Bo Dai, Srinivas Aluru |
Maximum-Entropy Fine Grained Classification
Fine-Grained Visual Classification (FGVC) is an important computer vision problem that involves small diversity within the different classes, and often requires expert annotators to collect data. |
Abhimanyu Dubey, Otkrist Gupta, Ramesh Raskar, Nikhil Naik |
Efficient Loss-Based Decoding On Graphs For Extreme Classification
In extreme classification problems, learning algorithms are required to map instances to labels from an extremely large label set.The authors build on a recent extreme classification framework with logarithmic time and space (LTLS), and on a general approach for error correcting output coding (ECOC) with loss-based decoding, and introduce a flexible and efficient approach accompanied by theoretical bounds. |
Itay Evron, Edward Moroshko, Koby Crammer |
A No-regret Generalization Of Hierarchical Softmax To Extreme Multi-label Classification
Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. The authors investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems. |
Marek Wydmuch, Kalina Jasinska, Mikhail Kuznetsov, Róbert Busa-Fekete, Krzysztof Dembczynski |
Efficient Gradient Computation For Structured Output Learning With Rational And Tropical Losses
Many structured prediction problems admit a natural loss function for evaluation such as the edit-distance or $n$-gram loss.However, existing learning algorithms are typically designed to optimize alternative objectives such as the cross-entropy. |
Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Dmitry Storcheus, Scott Yang |
Deep Structured Prediction With Nonlinear Output Transformations
Deep structured models are widely used for tasks like semantic segmentation, where explicit correlations between variables provide important prior information which generally helps to reduce the data needs of deep nets.However, current deep structured models are restricted by oftentimes very local neighborhood structure, which cannot be increased for computational complexity reasons, and by the fact that the output configuration, or a representation thereof, cannot be transformed further. |
Colin Graber, Ofer Meshi, Alex Schwing |
Mapping Images To Scene Graphs With Permutation-Invariant Structured Prediction
Machine understanding of complex images is a key goal of artificial intelligence.However, it is unclear what principles should guide the design of a structured prediction model that utilizes the power of deep learning components. |
Roei Herzig, Moshiko Raboh, Gal Chechik, Jonathan Berant, Amir Globerson |
Large Margin Deep Networks For Classification
The authors present a formulation of deep learning that aims at producing a large margin classifier. |
Gamaleldin Elsayed, Dilip Krishnan, Hossein Mobahi, Kevin Regan, Samy Bengio |
Semi-supervised Deep Kernel Learning: Regression With Unlabeled Data By Minimizing Predictive Variance
Large amounts of labeled data are typically required to train deep learning models.The authors present semi-supervised deep kernel learning (SSDKL), a semi-supervised regression model based on minimizing predictive variance in the posterior regularization framework. |
Neal Jean, Sang Michael Xie, Stefano Ermon |
Multitask Boosting For Survival Analysis With Competing Risks
The co-occurrence of multiple diseases among the general population is an important problem as those patients have more risk of complications and represent a large share of health care expenditure. |
Alexis Bellot, Mihaela Van Der Schaar |
Multi-Layered Gradient Boosting Decision Trees
Multi-layered distributed representation is believed to be the key ingredient of deep neural networks especially in cognitive tasks like computer vision. |
Ji Feng, Yang Yu, Zhi-Hua Zhou |
Unsupervised Adversarial Invariance
Data representations that contain all the information about target variables but are invariant to nuisance factors benefit supervised learning algorithms by preventing them from learning associations between these factors and the targets, thus reducing overfitting. |
Ayush Jaiswal, Rex Yue Wu, Wael Abd-Almageed, Prem Natarajan |
Learning Deep Disentangled Embeddings With The F-Statistic Loss
Deep-embedding methods aim to discover representations of a domain. Disentangling methods aim to make explicit compositional or factorial structure. The authors combine these two active but independent lines of research and propose a new paradigm suitable for both goals. |
Karl Ridgeway, Mike Mozer |
Learning Latent Subspaces In Variational Autoencoders
Variational autoencoders (VAEs) are widely used deep generative models capable of learning unsupervised latent representations of data. |
Jack Klys, Jake Snell, Richard Zemel |
Dual Swap Disentangling
Learning interpretable disentangled representations is a crucial yet challenging task. |
Zunlei Feng, Xinchao Wang, Chenglong Ke, An-Xiang Zeng, Dacheng Tao, Mingli Song |
Joint Autoregressive And Hierarchical Priors For Learned Image Compression
Recent models for learned image compression are based on autoencoders that learn approximately invertible mappings from pixels to a quantized latent representation. |
David Minnen, Johannes Ballé, Johannes Ballé, George D Toderici |
Group Equivariant Capsule Networks
The authors present group equivariant capsule networks, a framework to introduce guaranteed equivariance and invariance properties to the capsule network idea. |
Jan Lenssen, Matthias Fey, Pascal Libuschewski |
Learning Disentangled Joint Continuous And Discrete Representations
The authors present a framework for learning disentangled and interpretable jointly continuous and discrete representations in an unsupervised manner. |
Emilien Dupont |
Image-to-image Translation For Cross-domain Disentanglement
Deep image translation methods have recently shown excellent results, outputting high-quality images covering multiple modes of the data distribution.There has also been increased interest in disentangling the internal representations learned by deep methods to further improve their performance and achieve a finer control. |
Abel Gonzalez-Garcia, Joost Van De Weijer, Yoshua Bengio |
Cooperative Learning Of Audio And Video Models From Self-Supervised Synchronization
There is a natural correlation between the visual and auditive elements of a video.The authors demonstrate that a calibrated curriculum learning scheme, a careful choice of negative examples, and the use of a contrastive loss are critical ingredients to obtain powerful multi-sensory representations from models optimized to discern temporal synchronization of audio-video pairs. |
Bruno Korbar, Du Tran, Lorenzo Torresani |
Non-Adversarial Mapping With VAEs
The study of cross-domain mapping without supervision has recently attracted much attention. |
Yedid Hoshen |
Learning To Teach With Dynamic Loss Functions
Teaching is critical to human society: it is with teaching that prospective students are educated and human civilization can be inherited and advanced. |
Lijun Wu, Fei Tian, Yingce Xia, Yang Fan, Tao Qin, Lai Jian-Huang, Tie-Yan Liu |
Maximizing Acquisition Functions For Bayesian Optimization
Bayesian optimization is a sample-efficient approach to global optimization that relies on theoretically motivated value heuristics (acquisition functions) to guide its search process. |
James Wilson, Frank Hutter, Marc Deisenroth |
MetaReg: Towards Domain Generalization Using Meta-Regularization
Training models that generalize to new domains at test time is a problem of fundamental importance in machine learning. |
Yogesh Balaji, Swami Sankaranarayanan, Rama Chellappa |
Transfer Learning With Neural AutoML
The authors reduce the computational cost of Neural AutoML with transfer learning.AutoML relieves human effort by automating the design of ML algorithms. |
Catherine Wong, Neil Houlsby, Yifeng Lu, Andrea Gesmundo |
Hierarchical Reinforcement Learning For Zero-shot Generalization With Subtask Dependencies
The authors introduce a new RL problem where the agent is required to generalize to a previously-unseen environment characterized by a subtask graph which describes a set of subtasks and their dependencies. |
Sungryull Sohn, Junhyuk Oh, Honglak Lee |
Lifelong Inverse Reinforcement Learning
Methods for learning from demonstration (LfD) have shown success in acquiring behavior policies by imitating a user.To address this challenge, the authors introduce the novel problem of lifelong learning from demonstration, which allows the agent to continually build upon knowledge learned from previously demonstrated tasks to accelerate the learning of new tasks, reducing the amount of demonstrations required. |
Jorge A Mendez, Shashank Shivkumar, Eric Eaton |
Safe Active Learning For Time-Series Modeling With Gaussian Processes
Learning time-series models is useful for many applications, such as simulationand forecasting.In this study, the authors consider the problem of actively learning time-series models while taking given safety constraints into account. |
Christoph Zimmer, Mona Meister, Duy Nguyen-Tuong |
Online Structure Learning For Feed-Forward And Recurrent Sum-Product Networks
Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable. |
Agastya Kalra, Abdullah Rashwan, Wei-Shou Hsu, Pascal Poupart, Prashant Doshi, George Trimponias |
Preference Based Adaptation For Learning Objectives
In many real-world learning tasks, it is hard to directly optimize the true performance measures, meanwhile choosing the right surrogate objectives is also difficult.A novel sampling based algorithm DL^2M is proposed to learn the optimal parameter, which enjoys strong theoretical guarantees and efficient empirical performance. |
Yao-Xiang Ding, Zhi-Hua Zhou |
Byzantine Stochastic Gradient Descent
This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of $m$ machines which allegedly compute stochastic gradients every iteration, an $alpha$-fraction are Byzantine, and may behave adversarially. |
Dan Alistarh, Zeyuan Allen-Zhu, Jerry Li |
Contextual Bandits With Surrogate Losses: Margin Bounds And Efficient Algorithms
The authors use surrogate losses to obtain several new regret bounds and new algorithms for contextual bandit learning. |
Dylan Foster, Akshay Krishnamurthy |
Online Learning Of Quantum States
Suppose the authors have many copies of an unknown n-qubit state $ ho$. |
Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, Ashwin Nayak |
Horizon-Independent Minimax Linear Regression
The authors consider online linear regression: at each round, an adversary reveals a covariate vector, the learner predicts a real value, the adversary reveals a label, and the learner suffers the squared prediction error.Further, the authors show that this forward recursion remains optimal even against adaptively chosen labels and covariates, provided that the adversary adheres to a set of constraints that prevent misrepresentation of the scale of the problem. |
Alan Malek, Peter Bartlett |
Factored Bandits
The authors introduce the factored bandits model, which is a framework for learning withlimited (bandit) feedback, where actions can be decomposed into a Cartesianproduct of atomic actions. |
Julian Zimmert, Yevgeny Seldin |
A Model For Learned Bloom Filters And Optimizing By Sandwiching
Recent work has suggested enhancing Bloom filters by using a pre-filter, based on applying machine learning to determine a function that models the data set the Bloom filter is meant to represent. |
Michael Mitzenmacher |
Streamlining Variational Inference For Constraint Satisfaction Problems
Several algorithms for solving constraint satisfaction problems are based on survey propagation, a variational inference scheme used to obtain approximate marginal probability estimates for variable assignments.The authors introduce a more general branching strategy based on streamlining constraints, which sidestep hard assignments to variables. |
Aditya Grover, Tudor Achim, Stefano Ermon |
Robust Hypothesis Testing Using Wasserstein Uncertainty Sets
The authors develop a novel computationally efficient and general framework for robust hypothesis testing. |
RUI GAO, Liyan Xie, Yao Xie, Huan Xu |
Smoothed Analysis Of The Low-rank Approach For Smooth Semidefinite Programs
The authors consider semidefinite programs (SDPs) of size $n$ with equality constraints.In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix $Y$ of size $n imes k$ such that $X=YY^*$ is the SDP variable. |
Thomas Pumir, Samy Jelassi, Nicolas Boumal |
Convergence Of Cubic Regularization For Nonconvex Optimization Under KL Property
Cubic-regularized Newton's method (CR) is a popular algorithm that guarantees to produce a second-order stationary solution for solving nonconvex optimization problems. The authors explore the asymptotic convergence rate of CR by exploiting the ubiquitous Kurdyka-Lojasiewicz (KL) property of the nonconvex objective functions. |
Yi Zhou, Zhe Wang, Yingbin Liang |
A Simple Proximal Stochastic Gradient Method For Nonsmooth Nonconvex Optimization
The authors analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. |
Zhize Li, Jian Li |
Stochastic Chebyshev Gradient Descent For Spectral Optimization
A large class of machine learning techniques requires the solution of optimization problems involving spectral functions of parametric matrices, e.g.A careful design of the truncation distribution allows us to offer distributions that are variance-optimal, which is crucial for fast and stable convergence of stochastic gradient methods. |
Insu Han, Haim Avron, Jinwoo Shin |
Proximal SCOPE For Distributed Sparse Learning
Distributed sparse learning with a cluster of multiple machines has attracted much attention in machine learning, especially for large-scale applications with high-dimensional data.One popular way to implement sparse learning is to use L1 regularization. |
Shenyi Zhao, Gong-Duo Zhang, Ming-Wei Li, Wu-Jun Li |
LAG: Lazily Aggregated Gradient For Communication-Efficient Distributed Learning
This paper presents a new class of gradient methods for distributed machine learning that adaptively skip the gradient calculations to learn with reduced communication and computation. |
Tianyi Chen, Georgios Giannakis, Tao Sun, Wotao Yin |
Direct Runge-Kutta Discretization Achieves Acceleration
The authors study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method. |
Jingzhao Zhang, Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie |
Optimal Algorithms For Non-Smooth Distributed Optimization In Networks
In this work, the authors consider the distributed optimization of non-smooth convex functions using a network of computing units. |
Kevin Scaman, Francis Bach, Sebastien Bubeck, Laurent Massoulié, Yin Tat Lee |
Graph Oracle Models, Lower Bounds, And Gaps For Parallel Stochastic Optimization
The authors suggest a general oracle-based framework that captures parallel stochastic optimization in different parallelization settings described by a dependency graph, and derive generic lower bounds in terms of this graph. |
Blake Woodworth, Jialei Wang, Adam Smith, Brendan McMahan, Nati Srebro |
(Probably) Concave Graph Matching
In this paper the authors address the graph matching problem.The authors introduce the concepts of emph{conditionally concave} and emph{probably conditionally concave} energies on polytopes and show that they encapsulate many instances of the graph matching problem, including matching Euclidean graphs and graphs on surfaces. |
Haggai Maron, Yaron Lipman |
Solving Non-smooth Constrained Programs With Lower Complexity Than $\mathcal{O}(1/\varepsilon)$: A Primal-Dual Homotopy Smoothing Approach
The authors propose a new primal-dual homotopy smoothing algorithm for a linearly constrained convex program, where neither the primal nor the dual function has to be smooth or strongly convex. |
Xiaohan Wei, Hao Yu, Qing Ling, Michael Neely |
Wasserstein Distributionally Robust Kalman Filtering
The authors study a distributionally robust mean square error estimation problem over a nonconvex Wasserstein ambiguity set containing only normal distributions. |
Soroosh Shafieezadeh Abadeh, Viet Anh Nguyen, Daniel Kuhn, Peyman Mohajerin Esfahani |
Decentralize And Randomize: Faster Algorithm For Wasserstein Barycenters
The authors study the decentralized distributed computation of discrete approximations for the regularized Wasserstein barycenter of a finite set of continuous probability measures distributedly stored over a network. |
Pavel Dvurechenskii, Darina Dvinskikh, Alexander Gasnikov, Cesar Uribe, Angelia Nedich |
Limited Memory Kelley's Method Converges For Composite Convex And Submodular Objectives
The original simplicial method (OSM), a variant of the classic Kelley’s cutting plane method, has been shown to converge to the minimizer of a composite convex and submodular objective, though no rate of convergence for this method was known.The authors propose a limited memory version of Kelley’s method (L-KM) and of OSM that requires limited memory (at most n+ 1 constraints for an n-dimensional problem) independent of the iteration. |
Song Zhou, Swati Gupta, Madeleine Udell |
Stochastic Spectral And Conjugate Descent Methods
The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD).In this paper the authors introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. |
Dmitry Kovalev, Peter Richtarik, Eduard Gorbunov, Elnur Gasanov |
Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms For Finding Local Minima
The authors propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently. |
Yaodong Yu, Pan Xu, Quanquan Gu |
First-order Stochastic Algorithms For Escaping From Saddle Points In Almost Linear Time
(This is a theory paper) In this paper, the authors consider first-order methods for solving stochastic non-convex optimization problems.The key building block of the proposed algorithms is first-order procedures to extract negative curvature from the Hessian matrix through a principled sequence starting from noise, which are referred to {it NEgative-curvature-Originated-from-Noise or NEON} and are of independent interest. |
Yi Xu, Jing Rong, Tianbao Yang |
Gen-Oja: Simple & Efficient Algorithm For Streaming Generalized Eigenvector Computation
In this paper, the authors study the problems of principle Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting. |
Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter Bartlett, Michael I. Jordan |
Sparse DNNs With Improved Adversarial Robustness
Deep neural networks (DNNs) are computationally/memory-intensive and vulnerable to adversarial attacks, making them prohibitive in some real-world applications. |
Yiwen Guo, Chao Zhang, Changshui Zhang, Yurong Chen |
Constructing Fast Network Through Deconstruction Of Convolution
This study shows that naive convolution can be deconstructed into a shift operation and pointwise convolution. |
Yunho Jeon, Junmo Kim |
Learning Loop Invariants For Program Verification
A fundamental problem in program verification concerns inferring loop invariants.Inspired by how human experts construct loop invariants, the authors propose a reasoning framework Code2Inv that constructs the solution by multi-step decision making and querying an external program graph memory block. |
Xujie Si, Hanjun Dai, Mukund Raghothaman, Mayur Naik, Le Song |
Learning Libraries Of Subroutines For Neurally–Guided Bayesian Program Induction
Successful approaches to program induction require a hand-engineered domain-specific language (DSL), constraining the space of allowed programs and imparting prior knowledge of the domain. |
Kevin Ellis, Lucas Morales, Mathias Sablé-Meyer, Armando Solar-Lezama, Josh Tenenbaum |
Learning To Infer Graphics Programs From Hand-Drawn Images
The authors introduce a model that learns to convert simple hand drawings into graphics programs written in a subset of LaTeX.~The model combines techniques from deep learning and program synthesis. |
Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, Josh Tenenbaum |
Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn The Same Representation
It is widely believed that learning good representations is one of the main reasons for the success of deep neural networks. |
Liwei Wang, Lunjia Hu, Jiayuan Gu, Zhiqiang Hu, Yue Wu, Kun He, John Hopcroft |
Norm Matters: Efficient And Accurate Normalization Schemes In Deep Networks
The authors present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective. |
Elad Hoffer, Ron Banner, Itay Golan, Daniel Soudry |
ResNet With One-neuron Hidden Layers Is A Universal Approximator
The authors demonstrate that a very deep ResNet with stacked modules that have one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in d dimensions, i.e. \ell_1(R^d). |
Hongzhou Lin, Stefanie Jegelka |
Hyperbolic Neural Networks
Hyperbolic spaces have recently gained momentum in the context of machine learning due to their high capacity and tree-likeliness properties.However, the representational power of hyperbolic geometry is not yet on par with Euclidean geometry, firstly because of the absence of corresponding hyperbolic neural network layers. |
Octavian Ganea, Gary Becigneul, Thomas Hofmann |
A Simple Unified Framework For Detecting Out-of-Distribution Samples And Adversarial Attacks
Detecting test samples drawn sufficiently far away from the training distribution statistically or adversarially is a fundamental requirement for deploying a good classifier in many real-world machine learning applications.In this paper, the authors propose a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier. |
Kimin Lee, Kibok Lee, Honglak Lee, Jinwoo Shin |
Improving Neural Program Synthesis With Inferred Execution Traces
The task of program synthesis, or automatically generating programs that are consistent with a provided specification, remains a challenging task in artificial intelligence. |
Richard Shin, Illia Polosukhin, Dawn Song |
Scaling The Poisson GLM To Massive Neural Datasets Through Polynomial Approximations
The authors develop highly scalable approximate inference methods for Poisson generalized linear models (GLMs) that require only a single pass over the data. |
David Zoltowski, Jonathan W Pillow |
Discovery Of Latent 3D Keypoints Via End-to-end Geometric Reasoning
This paper presents KeypointNet, an end-to-end geometric reasoning framework to learn an optimal set of category-specific keypoints, along with their detectors to predict 3D keypoints in a single 2D input image. |
Supasorn Suwajanakorn, Noah Snavely, Jonathan Tompson, Mohammad Norouzi |
Learning To Reconstruct Shapes From Unseen Classes
From a single image, humans are able to perceive the full 3D shape of an object by exploiting learned shape priors from everyday life.Here the authors present an algorithm, Generalizable Reconstruction (GenRe), designed to capture more generic, class-agnostic shape priors. |
Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Josh Tenenbaum, Bill Freeman, Jiajun Wu |
Using Trusted Data To Train Deep Networks On Labels Corrupted By Severe Noise
The growing importance of massive datasets with the advent of deep learning makes robustness to label noise a critical property for classifiers to have.The authors demonstrate that robustness to label noise up to severe strengths can be achieved by using a set of trusted data with clean labels, and propose a loss correction that utilizes trusted examples in a data-efficient manner to mitigate the effects of label noise on deep neural network classifiers. |
Dan Hendrycks, Mantas Mazeika, Duncan Wilson, Kevin Gimpel |
A Reduction For Efficient LDA Topic Reconstruction
The authors present a novel approach for LDA (Latent Dirichlet Allocation) topic reconstruction. |
Matteo Almanza, Flavio Chierichetti, Alessandro Panconesi, Andrea Vattani |
A Unified View Of Piecewise Linear Neural Network Verification
The success of Deep Learning and its potential use in many safety-critical applications has motivated research on formal verification of Neural Network (NN) models.First, the authors present a unified framework that encompasses previous methods. |
Rudy Bunel, Ilker Turkaslan, Philip Torr, Pushmeet Kohli, Pawan K Mudigonda |
Optimization Over Continuous And Multi-dimensional Decisions With Observational Data
The authors consider the optimization of an uncertain objective over continuous and multi-dimensional decision spaces in problems in which the authors are only provided with observational data.The authors propose a novel algorithmic framework that is tractable, asymptotically consistent, and superior to comparable methods on example problems. |
Dimitris Bertsimas, Christopher McCord |
The Convergence Of Sparsified Gradient Methods
Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace.Several families of communication-reduction methods, such as quantization, large-batch methods, and gradient sparsification, have been proposed. |
Dan Alistarh, Torsten Hoefler, Mikael Johansson, Nikola Konstantinov, Sarit Khirirat, Cedric Renggli |
Estimating Learnability In The Sublinear Data Regime
The authors consider the problem of estimating how well a model class is capable of fitting a distribution of labeled data.This ability to estimate the explanatory value of a set of features (or dataset), even in the regime in which there is too little data to realize that explanatory value, may be relevant to the scientific and industrial settings for which data collection is expensive and there are many potentially relevant feature sets that could be collected. |
Weihao Kong, Gregory Valiant |
Learning Convex Polytopes With Margin
The authors present improved algorithm for properly learning convex polytopes in therealizable PAC setting from data with a margin. |
Lee-Ad Gottlieb, Eran Kaufman, Aryeh Kontorovich, Gabriel Nivasch |
Ridge Regression And Provable Deterministic Ridge Leverage Score Sampling
Ridge leverage scores provide a balance between low-rank approximation and regularization, and are ubiquitous in randomized linear algebra and machine learning. |
Shannon McCurdy |
GIANT: Globally Improved Approximate Newton Method For Distributed Optimization
For distributed computing environment, the authors consider the empirical risk minimization problem and propose a distributed and communication-efficient Newton-type optimization method. |
Shusen Wang, Farbod Roosta-Khorasani, Peng Xu, Michael W Mahoney |
Wavelet Regression And Additive Models For Irregularly Spaced Data
The authors present a novel approach for nonparametric regression using wavelet basis functions. |
Asad Haris, Ali Shojaie, Noah Simon |
New Insight Into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling And Convexity
As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HSGD) enjoys merits of both stochastic and full gradient methods for finite-sum minimization problem. |
Pan Zhou, Xiaotong Yuan, Jiashi Feng |
Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method For Stochastic Sparse Linear Regression With Limited Attribute Observation
The authors develop new stochastic gradient methods for efficiently solving sparse linear regression in a partial attribute observation setting |
Tomoya Murata, Taiji Suzuki |
Robust Subspace Approximation In A Stream
The authors study robust subspace estimation in the streaming and distributed settings. |
Roie Levin, Anish Sevekari Sevekari, David Woodruff |
A Practical Algorithm For Distributed Clustering And Outlier Detection
The authors study the classic k-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where the authors are allowed to discard a small portion of the data by labeling them as outliers. |
Jiecao Chen, Erfan Sadeqi Azer, Qin Zhang |
Compact Representation Of Uncertainty In Clustering
For many classic structured prediction problems, probability distributions over the dependent variables can be efficiently computed using widely-known algorithms and data structures (such as forward-backward, and its corresponding trellis for exact probability distributions in Markov models).However, the authors know of no previous work studying efficient representations of exact distributions over clusterings. |
Craig Greenberg, Nicholas Monath, Ari Kobren, Patrick Flaherty, Andrew McGregor, Andrew McCallum |
Bipartite Stochastic Block Models With Tiny Clusters
The authors study the problem of finding clusters in random bipartite graphs. |
Stefan Neumann |
Clustering Redemption–Beyond The Impossibility Of Kleinberg’s Axioms
Kleinberg (2002) stated three axioms that any clustering procedure should satisfy and showed there is no clustering procedure that simultaneously satisfies all three.The authors show that single linkage satisfies the modified axioms, and if the input is well-clusterable, some popular procedures such as k-means also satisfy the axioms, taking a step towards explaining the success of these objective functions for guiding the design of algorithms. |
Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn |
Understanding Regularized Spectral Clustering Via Graph Conductance
This paper uses the relationship between graph conductance and spectral clustering to study (i) the failures of spectral clustering and (ii) the benefits of regularization. |
Yilin Zhang, Karl Rohe |
Query K-means Clustering And The Double Dixie Cup Problem
The authors consider the problem of approximate $K$-means clustering with outliers and side information provided by same-cluster queries and possibly noisy answers.Compared to a handful of other known approaches that perform importance sampling to account for small cluster sizes, the proposed query technique reduces the number of queries by a factor of roughly $O(frac{K^6}{epsilon^3})$, at the cost of possibly missing very small clusters. |
Eli Chien, Chao Pan, Olgica Milenkovic |
How To Tell When A Clustering Is (approximately) Correct Using Convex Relaxations
The authors introduce the Sublevel Set (SS) method, a generic method to obtain sufficient guarantees of near-optimality and uniqueness (up to small perturbations) for a clustering. |
Marina Meila |
Derivative Estimation In Random Design
The authors propose a nonparametric derivative estimation method for random design withouthaving to estimate the regression function. |
Yu Liu, Kris De Brabanter |
Exploiting Numerical Sparsity For Efficient Learning : Faster Eigenvector Computation And Regression
In this paper, the authors obtain improved running times for regression and top eigenvector computation for numerically sparse matrices. |
Neha Gupta, Aaron Sidford |
Boosted Sparse And Low-Rank Tensor Regression
The authors propose a sparse and low-rank tensor regression model to relate a univariate outcome to a feature tensor, in which each unit-rank tensor from the CP decomposition of the coefficient tensor is assumed to be sparse. |
Lifang He, Kun Chen, Wanwan Xu, Jiayu Zhou, Fei Wang |
An Efficient Pruning Algorithm For Robust Isotonic Regression
The authors study a generalization of the classic isotonic regression problem where the authors allow separable nonconvex objective functions, focusing on the case of estimators used in robust regression. |
Cong Han Lim |
A Convex Program For Bilinear Inversion Of Sparse Vectors
The authors consider the bilinear inverse problem of recovering two vectors, x in R^L and w in R^L, from their entrywise product.Here, K and N may be larger than, smaller than, or equal to L. The authors introduce L1-BranchHull, which is a convex program posed in the natural parameter space and does not require an approximate solution or initialization in order to be stated or solved. |
Alireza Aghasi, Ali Ahmed, Paul Hand, Babhru Joshi |
Efficient Convex Completion Of Coupled Tensors Using Coupled Nuclear Norms
Coupled norms have emerged as a convex method to solve coupled tensor completion.In this paper, the authors introduce a new set of coupled norms known as coupled nuclear norms by constraining the CP rank of coupled tensors. |
Kishan Wimalawarne, Hiroshi Mamitsuka |
Support Recovery For Orthogonal Matching Pursuit: Upper And Lower Bounds
This paper studies the problem of sparse regression where the goal is to learn a sparse vector that best optimizes a given objective function. |
Raghav Somani, Chirag Gupta, Prateek Jain, Praneeth Netrapalli |
Learning Without The Phase: Regularized PhaseMax Achieves Optimal Sample Complexity | Fariborz Salehi, Ehsan Abbasi, Babak Hassibi |
On Controllable Sparse Alternatives To Softmax
Converting an n-dimensional vector to a probability distribution over n objects is a commonly used component in many machine learning tasks like multiclass classification, multilabel classification, attention mechanisms etc.For this, several probability mapping functions have been proposed and employed in literature such as softmax, sum-normalization, spherical softmax, and sparsemax, but there is very little understanding in terms how they relate with each other. |
Anirban Laha, Saneem Ahmed Chemmengath, Priyanka Agrawal, Mitesh Khapra, Karthik Sankaranarayanan, Harish Ramaswamy |
Sparse PCA From Sparse Linear Regression
Sparse Principal Component Analysis (SPCA) and Sparse Linear Regression (SLR) have a wide range of applications and have attracted a tremendous amount of attention in the last two decades as canonical examples of statistical problems in high dimension.A variety of algorithms have been proposed for both SPCA and SLR, but an explicit connection between the two had not been made. |
Guy Bresler, Sung Min Park, Madalina Persu |
Efficient Anomaly Detection Via Matrix Sketching
The authors consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. |
Vatsal Sharan, Parikshit Gopalan, Udi Wieder |
Dimensionality Reduction For Stationary Time Series Via Stochastic Nonconvex Optimization
Stochastic optimization naturally arises in machine learning. Efficient algorithms with provable guarantees, are still largely missing. This paper studies this challenge through a streaming PCA problem for stationary time series data. |
Minshuo Chen, Lin Yang, Mengdi Wang, Tuo Zhao |
Contrastive Learning From Pairwise Measurements
The authors study a semiparametric model where the pairwise measurements follow a natural exponential family distribution with an unknown base measure. |
Yi Chen, Zhuoran Yang, Yuchen Xie, Princeton Zhaoran Wang |
Deep Functional Dictionaries: Learning Consistent Semantic Structures On 3D Models From Functions
Various 3D semantic attributes such as segmentation masks, geometric features, keypoints, and materials can be encoded as per-point probe functions on 3D geometries.Given a collection of related 3D shapes, the authors consider how to jointly analyze such probe functions over different shapes, and how to discover common latent structures using a neural network — even in the absence of any correspondence information. |
Minhyuk Sung, Hao Su, Ronald Yu, Leonidas J Guibas |
Large Scale Computation Of Means And Clusters For Persistence Diagrams Using Optimal Transport
Persistence diagrams (PDs) are now routinely used to summarize the underlying topology of complex data.The authors propose in this article a tractable framework to carry out standard tasks on PDs at scale, notably evaluating distances, estimating barycenters and performing clustering. |
Theo Lacombe, Marco Cuturi, Steve OUDOT |
Representation Learning Of Compositional Data
The authors consider the problem of learning a low dimensional representation for compositional data. |
Marta Avalos, Richard Nock, Cheng Soon Ong, Julien Rouar, Ke Sun |
How To Make The Gradients Small Stochastically: Even Faster Convex And Nonconvex SGD
Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives $f(x)$.However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when $f(x)$ is convex.If $f(x)$ is convex, to find a point with gradient norm $varepsilon$, the authors design an algorithm SGD3 with a near-optimal rate $ ilde{O}(varepsilon^{-2})$, improving the best known rate $O(varepsilon^{-8/3})$. |
Zeyuan Allen-Zhu |
Statistical Optimality Of Stochastic Gradient Descent On Hard Learning Problems Through Multiple Passes
The authors consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data. |
Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach |
Optimal Subsampling With Influence Functions
Subsampling is a common and often effective method to deal with the computational challenges of large datasets. |
Daniel Ting, Eric Brochu |
Metric On Nonlinear Dynamical Systems With Perron-Frobenius Operators
The development of a metric for structural data is a long-term problem in pattern recognition and machine learning. The authors develop a general metric for comparing nonlinear dynamical systems that is defined with Perron-Frobenius operators in reproducing kernel Hilbert spaces. |
Isao Ishikawa, Keisuke Fujii, Masahiro Ikeda, Yuka Hashimoto, Yoshinobu Kawahara |
Random Feature Stein Discrepancies
Computable Stein discrepancies have been deployed for a variety of applications, ranging from sampler selection in posterior inference to approximate Bayesian inference to goodness-of-fit testing.While linear-time Stein discrepancies have been proposed for goodness-of-fit testing, they exhibit avoidable degradations in testing power—even when power is explicitly optimized. |
Jonathan Huggins, Lester Mackey |
Informative Features For Model Comparison
Given two candidate models, and a set of target observations, the authors address the problem of measuring the relative goodness of fit of the two models.The authors propose two new statistical tests which are nonparametric, computationally efficient (runtime complexity is linear in the sample size), and interpretable. |
Wittawat Jitkrittum, Heishiro Kanagawa, Patsorn Sangkloy, James Hays, Bernhard Schölkopf, Arthur Gretton |
On Fast Leverage Score Sampling And Optimal Learning
Leverage score sampling provides an appealing way to perform approximate com- putations for large matrices.In this paper, the authors study the problem of leverage score sampling for positive definite ma- trices defined by a kernel. |
Alessandro Rudi, Daniele Calandriello, Luigi Carratino, Lorenzo Rosasco |
Persistence Fisher Kernel: A Riemannian Manifold Kernel For Persistence Diagrams
Algebraic topology methods have recently played an important role for statistical analysis with complicated geometric structured data such as shapes, linked twist maps, and material data.In this work, the authors rely upon the alternative extit{Fisher information geometry} to propose a positive definite kernel for PDs extit{without approximation}, namely the Persistence Fisher (PF) kernel. |
Tam Le, Makoto Yamada |
Learning Bounds For Greedy Approximation With Explicit Feature Maps From Multiple Kernels
Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization. |
Shahin Shahrampour, Vahid Tarokh |
RetGK: Graph Kernels Based On Return Probabilities Of Random Walks
The authors develop a framework for computing graph kernels, based on return probabilities of random walks. |
Zhen Zhang, Mianzhi Wang, Yijian Xiang, Yan Huang, Arye Nehorai |
Nonparametric Density Estimation Under Adversarial Losses
The authors study minimax convergence rates of nonparametric density estimation under a large class of loss functions called ``adversarial losses', which, besides classical L^p losses, includes maximum mean discrepancy (MMD), Wasserstein distance, and total variation distance. |
Shashank Singh, Ananya Uppal, Boyue Li, Chun-Liang Li, Manzil Zaheer, Barnabas Poczos |
Deep Homogeneous Mixture Models: Representation, Separation, And Approximation
At their core, many unsupervised learning models provide a compact representation of homogeneous density mixtures, but their similarities and differences are not always clearly understood. |
Priyank Jaini, Pascal Poupart, Yaoliang Yu |
Gaussian Process Conditional Density Estimation
Conditional Density Estimation (CDE) models deal with estimating conditional distributions.CDE is a challenging task as there is a fundamental trade-off between model complexity, representational capacity and overfitting. |
Vincent Dutordoir, Hugh Salimbeni, James Hensman, Marc Deisenroth |
Low-rank Interaction With Sparse Additive Effects Model For Large Data Frames
Many applications of machine learning involve the analysis of large data frames -- matrices collecting heterogeneous measurements (binary, numerical, counts, etc.) |
Geneviève Robin, Hoi-To Wai, Julie Josse, Olga Klopp, Eric Moulines |
Mixture Matrix Completion
Completing a data matrix X has become an ubiquitous problem in modern data science, with motivations in recommender systems, computer vision, and networks inference, to name a few. |
Daniel Pimentel-Alarcon |
Multivariate Time Series Imputation With Generative Adversarial Networks
Multivariate time series usually contain a large number of missing values, which hinders the application of advanced analysis methods on multivariate time series data.Inspired by the success of Generative Adversarial Networks (GAN) in image generation, the authors propose to learn the overall distribution of a multivariate time series dataset with GAN, which is further used to generate the missing values for each sample. |
Yonghong Luo, Xiangrui Cai, Ying ZHANG, Jun Xu, Yuan Xiaojie |
Fully Understanding The Hashing Trick
Feature hashing, also known as {em the hashing trick}, introduced by Weinberger et al. |
Lior Kamma, Casper B. Freksen, Kasper Green Larsen |
Learning Semantic Similarity In A Continuous Space
The authors address the problem of learning semantic representation of questions to measure similarity between pairs as a continuous distance metric. |
Michel Deudon |
Bilevel Learning Of The Group Lasso Structure
Regression with group-sparsity penalty plays a central role in high-dimensional prediction problems.To circumvent this issue, the authors present a method to estimate the group structure by means of a continuous bilevel optimization problem where the data is split into training and validation sets. |
Jordan Frecon, Saverio Salzo, Massimiliano Pontil |
Bayesian Structure Learning By Recursive Bootstrap
The authors address the problem of Bayesian structure learning for domains with hundreds of variables by employing non-parametric bootstrap, recursively.The authors propose a method that covers both model averaging and model selection in the same framework. |
Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov, Guy Koren, Gal Novik |
Learning From Group Comparisons: Exploiting Higher Order Interactions
The authors study the problem of learning from group comparisons, with applications in predicting outcomes of sports and online games. |
Yao Li, Minhao Cheng, Kevin Fujii, Fushing Hsieh, Cho-Jui Hsieh |
A Structured Prediction Approach For Label Ranking
The authors propose to solve a label ranking problem as a structured output regression task. |
Anna Korba, Alexandre Garcia, Florence D'Alché-Buc |
The Sample Complexity Of Semi-Supervised Learning With Nonparametric Mixture Models
The authors study the sample complexity of semi-supervised learning (SSL) and introduce new assumptions based on the mismatch between a mixture model learned from unlabeled data and the true mixture model induced by the (unknown) class conditional distributions. |
Chen Dan, Liu Leqi, Bryon Aragam, Pradeep Ravikumar, Eric Xing |
Binary Classification From Positive-Confidence Data
Can the authors learn a binary classifier from only positive data, without any negative data or unlabeled data?The authors theoretically establish the consistency and an estimation error bound, and demonstrate the usefulness of the proposed method for training deep neural networks through experiments. |
Takashi Ishida, Gang Niu, Masashi Sugiyama |
Learning SMaLL Predictors
The authors introduce a new framework for learning in severely resource-constrained settings. |
Vikas Garg, Ofer Dekel, Lin Xiao |
Contour Location Via Entropy Reduction Leveraging Multiple Information Sources
The authors introduce an algorithm to locate contours of functions that are expensive to evaluate. |
Alexandre Marques, Remi Lam, Karen Willcox |
Multi-Class Learning: From Theory To Algorithm
In this paper, the authors study the generalization performance of multi-class classification and obtain a shaper data-dependent generalization error bound with fast convergence rate, substantially improving the state-of-art bounds in the existing data-dependent generalization analysis. |
Jian Li, Yong Liu, Rong Yin, Hua Zhang, Lizhong Ding, Weiping Wang |
Generalized Cross Entropy Loss For Training Deep Neural Networks With Noisy Labels
Deep neural networks (DNNs) have achieved tremendous success in a variety of applications across many disciplines.To combat this problem, mean absolute error (MAE) has recently been proposed as a noise-robust alternative to the commonly-used categorical cross entropy (CCE) loss. |
Zhilu Zhang, Mert Sabuncu |
A Smoother Way To Train Structured Prediction Models
The authors present a framework to train a structured prediction model by performing smoothing on the inference algorithm it builds upon. |
Krishna Pillutla, Vincent Roulet, Sham Kakade, Zaid Harchaoui |
Constrained Graph Variational Autoencoders For Molecule Design
Graphs are ubiquitous data structures for representing interactions between entities. |
Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, Alexander Gaunt |
Learning Beam Search Policies Via Imitation Learning
Beam search is widely used for approximate decoding in structured prediction problems. |
Renato Negrinho, Matt Gormley, Geoffrey Gordon |
Loss Functions For Multiset Prediction
The authors study the problem of multiset prediction. |
Sean Welleck, Zixin Yao, Yu Gai, Jialin Mao, Zheng Zhang, Kyunghyun Cho |
Learning Confidence Sets Using Support Vector Machines
The goal of confidence-set learning in the binary classification setting is to construct two sets, each with a specific probability guarantee to cover a class.Instead of plug-in approaches, the authors propose a support vector classifier to construct confidence sets in a flexible manner. |
Wenbo Wang, Xingye Qiao |
Fast Similarity Search Via Optimal Sparse Lifting
Similarity search is a fundamental problem in computing science with various applications and has attracted significant research attention, especially in large-scale search with high dimensions. |
Wenye Li, Jingwei Mao, Yin Zhang, Shuguang Cui |
The Sparse Manifold Transform
The authors present a signal representation framework called the sparse manifold transform that combines key ideas from sparse coding, manifold learning, and slow feature analysis. |
Yubei Chen, Dylan Paiton, Bruno Olshausen |
When Do Random Forests Fail?
Random forests are learning algorithms that build large collections of random trees and make predictions by averaging the individual tree predictions.In this paper, the authors consider various tree constructions and examine how the choice of parameters affects the generalization error of the resulting random forests as the sample size goes to infinity. |
Cheng Tang, Damien Garreau, Ulrike Von Luxburg |
Diverse Ensemble Evolution: Curriculum Data-Model Marriage
The authors study a new method (``Diverse Ensemble Evolution (DivE$^2$)'') to train an ensemble of machine learning models that assigns data to models at each training epoch based on each model's current expertise and an intra- and inter-model diversity reward. |
Tianyi Zhou, Shengjie Wang, Jeff Bilmes |
The Pessimistic Limits And Possibilities Of Margin-based Losses In Semi-supervised Learning
Consider a classification problem where the authors have both labeled and unlabeled data available. |
Jesse Krijthe, Marco Loog |
Semi-Supervised Learning With Declaratively Specified Entropy Constraints
The authors propose a technique for declaratively specifying strategies for semi-supervised learning (SSL). |
Haitian Sun, William Cohen, Lidong Bing |
Learning To Multitask
Multitask learning has shown promising performance in many applications and many multitask models have been proposed. |
Yu Zhang, Ying Wei, Qiang Yang |
CatBoost: Unbiased Boosting With Categorical Features
This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. |
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin |
Supervised Autoencoders: Improving Generalization Performance With Unsupervised Regularizers
Generalization performance is a central goal in machine learning, particularly when learning representations with large neural networks. |
Lei Le, Andy Patterson, Martha White |
Representation Learning For Treatment Effect Estimation From Observational Data
Estimating individual treatment effect (ITE) is a challenging problem in causal inference, due to the missing counterfactuals and the selection bias.In this paper, the authors propose a local similarity preserved individual treatment effect (SITE) estimation method based on deep representation learning. |
Liuyi Yao, Sheng Li, Yaliang Li, Mengdi Huai, Jing Gao, Aidong Zhang |
SimplE Embedding For Link Prediction In Knowledge Graphs
Knowledge graphs contain knowledge about the world and provide a structured representation of this knowledge. |
Seyed Mehran Kazemi, David Poole |
DeepProbLog: Neural Probabilistic Logic Programming
The authors introduce DeepProbLog, a probabilistic logic programming language that incorporates deep learning by means of neural predicates. |
Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, Luc De Raedt |
Watch Your Step: Learning Node Embeddings Via Graph Attention
Graph embedding methods represent nodes in a continuous vector space,preserving different types of relational information from the graph.There are many hyper-parameters to these methods (e.g. |
Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, Alex Alemi |
Invariant Representations Without Adversarial Training
Representations of data that are invariant to changes in specified factors are useful for a wide range of problems: removing potential biases in prediction problems, controlling the effects of covariates, and disentangling meaningful factors of variation. |
Daniel Moyer, Shuyang Gao, Rob Brekelmans, Aram Galstyan, Greg Ver Steeg |
Domain-Invariant Projection Learning For Zero-Shot Recognition
Zero-shot learning (ZSL) aims to recognize unseen object classes without any training samples, which can be regarded as a form of transfer learning from seen classes to unseen ones.In this paper, the authors propose a novel ZSL model termed domain-invariant projection learning (DIPL). |
An Zhao, Mingyu Ding, Abel Guan, Zhiwu Lu, Tao Xiang, Ji-Rong Wen |
Unsupervised Learning Of View-invariant Action Representations
The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance.However, label collection is an expensive and time-consuming process. |
Junnan Li, Yongkang Wong, Qi Zhao, Mohan Kankanhalli |
Neural Architecture Optimization
Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. |
Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, Tie-Yan Liu |
Scalable Hyperparameter Transfer Learning
Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization.The authors propose a multi-task adaptive Bayesian linear regression model for transfer learning in BO, whose complexity is linear in the function evaluations: one Bayesian linear regression model is associated to each black-box function optimization problem (or task), while transfer learning is achieved by coupling the models through a shared deep neural net. |
Valerio Perrone, Rodolphe Jenatton, Matthias W Seeger, Cedric Archambeau |
Learning To Learn Around A Common Mean
The problem of learning-to-learn (LTL) or meta-learning is gaining increasing attention due to recent empirical evidence of its effectiveness in applications. In this work, the authors consider the family of algorithms given by a variant of Ridge Regression, in which the regularizer is the square distance to an unknown mean vector. |
Giulia Denevi, Carlo Ciliberto, Dimitris Stamos, Massimiliano Pontil |
Hybrid-MST: A Hybrid Active Sampling Strategy For Pairwise Preference Aggregation
In this paper the authors present a hybrid active sampling strategy for pairwise preference aggregation, which aims at recovering the underlying rating of the test candidates from sparse and noisy pairwise labeling. |
Jing LI, Rafal Mantiuk, Junle Wang, Suiyi Ling, Patrick Le Callet |
Algorithmic Assurance: An Active Approach To Algorithmic Testing Using Bayesian Optimisation
The authors introduce algorithmic assurance, the problem of testing whethermachine learning algorithms are conforming to their intended designgoal. |
Shiva Gopakumar, Sunil Gupta, Santu Rana, Vu Nguyen, Svetha Venkatesh |
Understanding The Role Of Adaptivity In Machine Teaching: The Case Of Version Space Learners
In real-world applications of education, an effective teacher adaptively chooses the next example to teach based on the learner’s current state.In this paper, the authors study the case of teaching consistent, version space learners in an interactive setting. |
Yuxin Chen, Adish Singla, Oisin Mac Aodha, Pietro Perona, Yisong Yue |
Active Learning For Non-Parametric Regression Using Purely Random Trees
Active learning is the task of using labelled data to select additional points to label, with the goal of fitting the most accurate model with a fixed budget of labelled points.In this paper the authors propose an intuitive tree based active learning algorithm for non-parametric regression with provable improvement over random sampling. |
Jack Goetz, Ambuj Tewari, Paul Zimmerman |
Interactive Structure Learning With Structural Query-by-Committee
In this work, the authors introduce interactive structure learning, a framework that unifies many different interactive learning tasks. |
Christopher Tosh, Sanjoy Dasgupta |
Efficient Nonmyopic Batch Active Search
Active search is a learning paradigm for actively identifying as many members of a given class as possible.The authors also propose novel, efficient batch policies inspired by state-of-the-art sequential policies, and develop an aggressive pruning technique that can dramatically speed up computation. |
Shali Jiang, Gustavo Malkomes, Matthew Abbott, Benjamin Moseley, Roman Garnett |
Uncertainty Sampling Is Preconditioned Stochastic Gradient Descent On Zero-One Loss
Uncertainty sampling, a popular active learning algorithm, is used to reduce the amount of data required to learn a classifier, but it has been observed in practice to converge to different parameters depending on the initialization and sometimes to even better parameters than standard training on all the data. |
Stephen Mussmann, Percy Liang |
Online Adaptive Methods, Universality And Acceleration
The authors present a novel method for convex unconstrained optimization that, without any modifications ensures: (1) accelerated convergence rate for smooth objectives, (2) standard convergence rate in the general (non-smooth) setting, and (3) standard convergence rate in the stochastic optimization setting. |
Yehuda Kfir Levy, Alp Yurtsever, Volkan Cevher |
Online Improper Learning With An Approximation Oracle
The authors study the following question: given an efficient approximation algorithm for an optimization problem, can the authors learn efficiently in the same setting? |
Elad Hazan, Wei Hu, Yuanzhi Li, Zhiyuan Li |
Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting
The authors introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. |
Hippolyt Ritter, Alex Botev, David Barber |
Approximating Real-Time Recurrent Learning With Random Kronecker Factors
Despite all the impressive advances of recurrent neural networks, sequential data is still in need of better modelling.In this paper the authors propose the Kronecker Factored RTRL (KF-RTRL) algorithm that uses a Kronecker product decomposition to approximate the gradients for a large class of RNNs. |
Asier Mujika, Florian Meier, Angelika Steger |
Online Reciprocal Recommendation With Theoretical Performance Guarantees
The authors initiate a rigorous theoretical investigation of the reciprocal recommendation task in a specific framework of sequential learning. |
Claudio Gentile, Nikos Parotsidis, Fabio Vitale |
Generalized Inverse Optimization Through Online Learning
Inverse optimization is a powerful paradigm for learning preferences and restrictions that explain the behavior of a decision maker, based on a set of external signal and the corresponding decision pairs.However, most inverse optimization algorithms are designed specifically in batch setting, where all the data is available in advance. |
Chaosheng Dong, Yiran Chen, Bo Zeng |
Adaptive Online Learning In Dynamic Environments
In this paper, the authors study online convex optimization in dynamic environments, and aim to bound the dynamic regret with respect to any sequence of comparators. |
Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou |
Online Convex Optimization For Cumulative Constraints
The authors propose the algorithms for online convex optimization which lead to cumulative squared constraint violations of the form $sumlimits_{t=1}^Tig([g(x_t)]_+ig)^2=O(T^{1-eta})$, where $etain(0,1)$. |
Jianjun Yuan, Andrew Lamperski |
Efficient Online Algorithms For Fast-rate Regret Bounds Under Sparsity
The authors consider the problem of online convex optimization in two different settings: arbitrary and i.i.d. |
Pierre Gaillard, Olivier Wintenberger |
Regret Bounds For Online Portfolio Selection With A Cardinality Constraint
Online portfolio selection is a sequential decision-making problem in which a learner repetitively selects a portfolio over a set of assets, aiming to maximize long-term return.In this paper, the authors study the problem with the cardinality constraint that the number of assets in a portfolio is restricted to be at most k, and consider two scenarios: (i) in the full-feedback setting, the learner can observe price relatives (rates of return to cost) for all assets, and (ii) in the bandit-feedback setting, the learner can observe price relatives only for invested assets. |
Shinji Ito, Daisuke Hatano, Sumita Hanna, Akihiro Yabe, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi |
Policy Regret In Repeated Games
The notion of policy regret' in online learning is supposed to capture the reactions of the adversary to the actions taken by the learner, which more traditional notions such as external regret do not take into account. |
Raman Arora, Michael Dinitz, Teodor Vanislavov Marinov, Mehryar Mohri |
Query Complexity Of Bayesian Private Learning
The authors study the query complexity of Bayesian Private Learning. |
Kuang Xu |
The Limits Of Post-Selection Generalization
The authors show several limitations on the power of algorithms satisfying post hoc generalization. |
Jonathan Ullman, Adam Smith, Kobbi Nissim, Uri Stemmer, Thomas Steinke |
Sequential Test For The Lowest Mean: From Thompson To Murphy Sampling
Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-problem in planning, game tree search and reinforcement learning.The authors then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules. |
Emilie Kaufmann, Wouter Koolen, Aurélien Garivier |
Contextual Combinatorial Multi-armed Bandits With Volatile Arms And Submodular Reward
In this paper, the authors study the stochastic contextual combinatorial multi-armed bandit (CC-MAB) framework that is tailored for volatile arms and submodular reward functions. |
Lixing Chen, Jie Xu, Zhuo Lu |
TopRank: A Practical Algorithm For Online Stochastic Ranking
Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user.Many sample-efficient algorithms have been proposed for this problem that assume a specific click model connecting rankings and user behavior. |
Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari |
A Bandit Approach To Sequential Experimental Design With False Discovery Control
The authors propose a new adaptive sampling approach to multiple testing which aims to maximize statistical power while ensuring anytime false discovery control. |
Kevin Jamieson, Lalit Jain |
Adaptation To Easy Data In Prediction With Limited Advice
The authors derive an online learning algorithm with improved regret guarantees for ``easy' loss sequences.While a number of algorithms have been proposed for exploiting small effective range in the full information setting, Gerchinovitz and Lattimore [2016] have shown the impossibility of regret scaling with the effective range of the losses in the bandit setting. |
Tobias Thune, Yevgeny Seldin |
Differentially Private Contextual Linear Bandits
The authors study the contextual linear bandit problem, a version of the standard stochastic multi-armed bandit (MAB) problem where a learner sequentially selects actions to maximize a reward which depends also on a user provided per-round context. |
Roshan Shariff, Or Sheffet |
Community Exploration: From Offline Optimization To Online Learning
The authors introduce the community exploration problem that has various real-world applications such as online advertising. |
Xiaowei Chen, Weiran Huang, Wei Chen, John C. S. Lui |
Adaptive Learning With Unknown Information Flows
An agent facing sequential decisions that are characterized by partial feedback needs to strike a balance between maximizing immediate payoffs based on available information, and acquiring new information that may be essential for maximizing future payoffs.This trade-off is captured by the multi-armed bandit (MAB) framework that has been studied and applied when at each time epoch payoff observations are collected on the actions that are selected at that epoch. |
Yonatan Gur, Ahmad Momeni |
Multi-armed Bandits With Compensation
The authors propose and study the known-compensation multi-arm bandit (KCMAB) problem, where a system controller offers a set of arms to many short-term players for $T$ steps. |
Siwei Wang, Longbo Huang |
Bandit Learning With Implicit Feedback
the authors perform contextual bandit learning with implicit feedback by modeling the feedback as a composition of user result examination and relevance judgment. |
Yi Qi, Qingyun Wu, Hongning Wang, Jie Tang, Maosong Sun |
Optimistic Optimization Of A Brownian
The authors address the problem of optimizing a Brownian motion. |
Jean-Bastien Grill, Michal Valko, Remi Munos |
Bandit Learning With Positive Externalities
In many platforms, user arrivals exhibit a self-reinforcing behavior: future user arrivals are likely to have preferences similar to users who were satisfied in the past.The authors study multiarmed bandit (MAB) problems with positive externalities. |
Virag Shah, Jose Blanchet, Ramesh Johari |
An Information-Theoretic Analysis For Thompson Sampling With Many Actions
Information-theoretic Bayesian regret bounds of Russo and Van Roy capture the dependence of regret on prior uncertainty. |
Shi Dong, Benjamin Van Roy |
Distributed Multi-Player Bandits - A Game Of Thrones Approach
The authors consider a multi-armed bandit game where N players compete for K arms for T turns.The authors present a distributed algorithm and prove that it achieves an expected sum of regrets of near-Oleft(log^{2}T ight). |
Ilai Bistritz, Amir Leshem |
PG-TS: Improved Thompson Sampling For Logistic Contextual Bandits
The authors address the problem of regret minimization in logistic contextual bandits, where a learner decides among sequential actions or arms given their respective contexts to maximize binary rewards.Using a fast inference procedure with Polya-Gamma distributed augmentation variables, the authors propose an improved version of Thompson Sampling, a Bayesian formulation of contextual bandits with near-optimal performance. |
Bianca Dumitrascu, Karen Feng, Barbara Engelhardt |
Non-delusional Q-learning And Value-iteration
The authors identify a fundamental source of error in Q-learning and other forms of dynamic programming with function approximation. |
Tyler Lu, Dale Schuurmans, Craig Boutilier |
Differentiable MPC For End-to-end Planning And Control
The authors present foundations for using Model Predictive Control (MPC) as a differentiable policy class for reinforcement learning. |
Brandon Amos, Ivan Jimenez, Jacob I Sacks, Byron Boots, J. Zico Kolter |
Multiple-Step Greedy Policies In Approximate And Online Reinforcement Learning
Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.In a recent work (Efroni et al., 2018), multiple-step greedy policies and their use in vanilla Policy Iteration algorithms were proposed and analyzed. |
Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor |
Deep Reinforcement Learning In A Handful Of Trials Using Probabilistic Dynamics Models
Model-based reinforcement learning (RL) algorithms can attain excellent sample efficiency, but often lag behind the best model-free algorithms in terms of asymptotic performance.In this paper, the authors study how to bridge this gap, by employing uncertainty-aware dynamics models. |
Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine |
Learning Convex Bounds For Linear Quadratic Control Policy Synthesis
Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a numbers of fields, from artificial intelligence and robotics, to medicine and finance.This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function.The authors present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data.The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees.Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both. |
Jack Umenberger, Thomas Schön |
Sample-Efficient Reinforcement Learning With Stochastic Ensemble Value Expansion
There is growing interest in combining model-free and model-based approaches in reinforcement learning with the goal of achieving the high performance of model-free algorithms with low sample complexity.The authors propose stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue. |
Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee |
Policy-Conditioned Uncertainty Sets For Robust Markov Decision Processes
What policy should be employed in a Markov decision process with uncertain parameters?In this work, the authors propose non-rectangular uncertainty sets that bound marginal moments of state-action features defined over entire trajectories through a decision process. |
Andrea Tirinzoni, Marek Petrik, Xiangli Chen, Brian Ziebart |