KDD 2018
Commerce and Profiling
Paper Name Author(s)
Perceive Your Users In Depth: Learning Universal User Representations From Multiple E-commerce Tasks

This paper studies personalization in E-commerece.

Yabo Ni , Dan Ou , Shichen Liu , Xiang Li , Wenwu Ou , Anxiang Zeng , Luo Si
E-tail Product Return Prediction Via Hypergraph-based Local Graph Cut

This paper studies the problems of E-tail, which has provided customers with great convenience by allowing them to purchase retail products anywhere without visiting the actual stores.

Jianbo Li , Jingrui He , Yada Zhu
OpenTag: Open Attribute Value Extraction From Product Profiles

This paper studies Extraction of missing attribute values.

Guineng Zheng , Subhabrata Mukherjee , Xin Luna Dong , Feifei Li
Learning And Transferring IDs Representation In E-commerce

Many machine intelligence techniques are developed in E-commerce and one of the most essential components is the representation of IDs. The authors propose an embedding based framework to learn and transfer the representation of IDs.

Kui Zhao , Yuechuan Li , Zhaoqian Shuai , Cheng Yang
I Know You’ll Be Back: Interpretable New User Clustering And Churn Prediction On A Mobile Social App

The authors present a novel order dispatch algorithm in large-scale on-demand ride-hailing platforms.

Carl Yang , Xiaolin Shi , Jie Luo , Jiawei Han
Deep Learning I
Paper Name Author(s)
Exact And Consistent Interpretation For Piecewise Linear Neural Networks: A Closed Form Solution

The paper studies Strong intelligent machines powered by deep neural networks. The authors propose an elegant closed form solution named OpenBox to compute exact and consistent interpretations for the family of Piecewise Linear Neural Networks (PLNN).

Lingyang Chu , Xia Hu , Juhua Hu , Lanjun Wang , Jian Pei
Voxel Deconvolutional Networks For 3D Brain Image Labeling

In this work, the authors propose the voxel deconvolutional layer (VoxelDCL) to solve the checkerboard artifact problem of deconvolutional layers in 3D space.

Yongjun Chen , Hongyang Gao , Lei Cai , Min Shi , Dinggang Shen , Shuiwang Ji
Deep Variational Network Embedding In Wasserstein Space

This paper studies Network Embedding. The formation and evolution of real-world networks are full of uncertainties. the authors propose a novel Deep Variational Network Embedding in Wasserstein Space (DVNE) in this paper.

Dingyuan Zhu , Peng Cui , Daixin Wang , Wenwu Zhu
Subspace Network: Deep Multi-Task Censored Regression For Modeling Neurodegenerative Diseases

This paper studies Supply-Side Platforms (SSP) in digital media. The authors design and experiment a version of the Thompson Sampling algorithm

Mengying Sun , Inci M. Baytas , Liang Zhan , Zhangyang Wang , Jiayu Zhou
Towards Explanation Of DNN-based Prediction With Guided Feature Inversion

The paper studies deep neural networks (DNN). The authors propose to investigate a guided feature inversion framework for taking advantage of the deep architectures towards effective interpretation.

Mengnan Du , Ninghao Liu , Qingquan Song , Xia Hu
Deep Learning II
Paper Name Author(s)
XiaoIce Band: A Melody And Arrangement Generation Framework For Pop Music

The authors present a focused study on pop music generation. They propose an end-to-end melody and arrangement generation framework, called XiaoIce Band, which generates a melody track with several accompany tracks played by several types of instruments. They also propose a Multi-Instrument Co-Arrangement Model (MICA) using multi-task learning for multi-track music arrangement.

Hongyuan Zhu , Qi Liu , Nicholas Jing Yuan , Chuan Qin , Jiawei Li , Kun Zhang , Guang Zhou , Furu Wei , Yuanchun Xu , Enhong Chen
Deep R-th Root Rank Supervised Joint Binary Embedding For Multivariate Time Series Retrieval

The paper deals with Multivariate time series data. The authors propose a Deep r-th root of Rank Supervised Joint Binary Embedding (Deep r-RSJBE) to perform multivariate time series retrieval.

Dongjin Song , Ning Xia , Wei Cheng , Haifeng Chen , Dacheng Tao
Cost-Effective Training Of Deep CNNs With Active Model Adaptation

This paper studies Deep convolutional neural networks. The authors propose to overcome these challenges by actively adapting a pre-trained model to a new task with less labeled examples.

Sheng-Jun Huang , Jia-Wei Zhao , Zhao-Yang Liu
Learning Deep Network Representations With Adversarially Regularized Autoencoders

This paper studies network embedding. The authors propose to learn the network representations with adversarially regularized autoencoders (NetRA).

Wenchao Yu , Cheng Zheng , Wei Cheng , Charu Aggarwal , Dongjin Song , Bo Zong , Haifeng Chen , Wei Wang
Smoothed Dilated Convolutions For Improved Dense Prediction

This paper studies Dilated convolutions. The authors propose two simple yet effective degridding methods by studying a decomposition of dilated convolutions.

Zhengyang Wang , Shuiwang Ji
Graph and Social Network I
Paper Name Author(s)
Multi-Round Influence Maximization

This paper study the Multi-Round Influence Maximization (MRIM) problem. MRIM problem models the viral marketing scenarios in which advertisers conduct multiple rounds of viral marketing to promote one product.

Lichao Sun , Weiran Huang , Philip Yu , Wei Chen
SpotLight: Detecting Anomalies In Streaming Graphs

The authors propose a randomized sketching-based approach called SpotLight, which guarantees that an anomalous graph is mapped ‘far’ away from ‘normal’ instances in the sketch space with high probability for appropriate choice of parameters.

Dhivya Eswaran , Christos Faloutsos , Sudipto Guha , Nina Mishra
Adversarial Attacks On Neural Networks For Graph Data

This paper studies the robustness of Deep learning models for graphs. The authors introduce the first study of adversarial attacks on attributed graphs, specifically focusing on models exploiting ideas of graph convolutions.

Daniel Zügner , Amir Akbarnejad , Stephan Günnemann
Graph Classification Using Structural Attention

In many real-world applications, graphs can be noisy with discriminative patterns confined to certain regions in the graph only.the authors study the problem of attention-based graph classification.

John Boaz Lee , Ryan Rossi , Xiangnan Kong
EvoGraph: An Effective And Efficient Graph Upscaling Method For Preserving Graph Properties

This paper studies synthetic graph generation methods. The authors propose a novel graph upscaling method called EvoGraph that can upscale the original graph with preserving its properties regardless of a scale factor.

Himchan Park , Min-Soo Kim
Graph and Social Network II
Paper Name Author(s)
Network Connectivity Optimization: Fundamental Limits And Effective Algorithms

This paper studies Network connectivity optimization. First, the authors reveal some fundamental limits by proving that, for a wide range of network connectivity optimization problems, (1) they are NP-hard and (2) (1-1/e) is the optimal approximation ratio for any polynomial algorithms. Second, they propose an effective, scalable and general algorithm (CONTAIN) to carefully balance the optimization quality and the computational efficiency.

Chen Chen , Ruiyue Peng , Lei Ying , Hanghang Tong
Opinion Dynamics With Varying Susceptibility To Persuasion

This paper studies social psychology. The authors adopt a popular model for social opinion dynamics, and formalize the opinion maximization and minimization problems where interventions happen at the level of susceptibility.

Rediet Abebe , Jon Kleinberg , David Parkes , Charalampos Tsourkakis
Node Similarity With Q-Grams For Real-World Labeled Networks

This paper studies node similarity in labeled networks, using the label sequences found in paths of bounded length q leading to the nodes.

Roberto Grossi , Alessio Conte , Gaspare Ferraro , Andrea Marino , Kunihiko Sadakane , Takeaki Uno
NetLSD: Hearing The Shape Of A Graph

This paper studies graph comparison. The authors propose the Network Laplacian Spectral Descriptor (NetLSD).

Anton Tsitsulin , Davide Mottin , Panagiotis Karras , Alexander Bronstein , Emmanuel Muller
LARC: Learning Activity-Regularized Overlapping Communities Across Time

This paper studies communities. The authors propose LARC, a general framework for joint learning of the overlapping community structure and the periods of activity of communities, directly from temporal interaction data.

Alexander Gorovits , Ekta Gujral , Evangelos Papalexakis , Petko Bogdanov
FASTEN: Fast Sylvester Equation Solver For Graph Mining

This paper studies The Sylvester equation. The authors propose a family of Krylov subspace based algorithms (fasten) to speed up and scale up the computation of Sylvester equation for graph mining.

Boxin Du , Hanghang Tong
Knowledge Discovery
Paper Name Author(s)
An Efficient Two-Layer Mechanism For Privacy-Preserving Truth Discovery

This paper studies Soliciting answers from online users . The authors propose perturbation-based mechanisms that provide users with privacy guarantees and maintain the accuracy of aggregated answers.

Yaliang Li , Chenglin Miao , Lu Su , Jing Gao , Qi Li , Bolin Ding , Zhan Qin , Kui Ren
TINET: Learning Invariant Networks Via Knowledge Transfer

This paper studies Neural-based multi-task learning. To avoid the prohibitive time and resource consuming network building process, the authors propose TINET, a knowledge transfer based model for accelerating invariant network construction.

Chen Luo , Zhengzhang Chen , Lu-An Tang , Anshumali Shrivastava , Zhichun Li , Haifeng Chen , Jieping Ye
R-VQA: Learning Visual Relation Facts With Semantic Attention For Visual Question Answering

This paper studies Visual Question Answering (VQA) . The authors propose a novel framework to learn visual relation facts for VQA.

Pan Lu , Lei Ji , Wei Zhang , Nan Duan , Ming Zhou , Jianyong Wang
Can Who-Edits-What Predict Edit Survival?

This paper studies online peer-production systems. The authors explore a different point in the solution space that goes beyond user reputation but does not involve any content-based feature of the edits.

Ali Batuhan Yardim , Victor Kristof
Dynamic Embeddings For User Profiling In Twitter

In this paper, the authors study the problem of dynamic user profiling in Twitter. They address the problem by proposing a dynamic user and word embedding model (DUWE), a scalable black-box variational inference algorithm, and a streaming keyword diversification model (SKDM).

Shangsong Liang , Xiangliang Zhang , Zhaochun Ren , Evangelos Kanoulas
Generalized Score Functions For Causal Discovery

This paper deals with discovery of causal relationships from observational data. In this paper, the authors introduce generalized score functions for causal discovery based on the characterization of general (conditional) independence relationships between random variables, without assuming particular model classes.

Biwei Huang , Kun Zhang , Yizhu Lin , Bernhard Scho?lkopf , Clark Glymour
Matrices, Kernels and Sketches
Paper Name Author(s)
Disturbance Grassmann Kernels For Subspace-Based Learning

In this paper, the authors focus on subspace-based learning problems, where data elements are linear subspaces instead of vectors. Grassmann kernels were proposed to measure the space structure and used with classifiers.

Junyuan Hong , Huanhuan Chen , Feng Lin
SUSTain: Scalable Unsupervised Scoring For Tensors And Its Application To Phenotyping

This paper presents a new method, which we call SUSTain, that extends real-valued matrix and tensor factorizations to data where values are integers.

Ioakeim Perros , Evangelos Papalexakis , Haesun Park , Richard Vuduc , Xiaowei Yan , Christopher Defilippi , Walter F. Stewart , Jimeng Sun
Discrete Ranking-based Matrix Factorization With Self-Paced Learning

This paper studies The efficiency of top-k recommendation. The authors propose a Discrete Ranking-based Matrix Factorization (DRMF) algorithm based on each user’s pairwise preferences, and formulate it into binary quadratic programming problems to learn binary codes.

Yan Zhang , Haoyu Wang , Defu Lian , Ivor W. Tsang , Hongzhi Yin , Guowu Yang
High-order Proximity Preserving Information Network Hashing

This paper studies Information network embedding. The authors propose a MF-based underlineI nformation underlineN etwork underlineH ashing (INH-MF) algorithm, to learn binary codes which can preserve high-order proximity.

Defu Lian , Kai Zheng , Vincent W. Zheng , Yong Ge , Longbing Cao , Ivor W. Tsang , Xing Xie
Optimal Distributed Submodular Optimization Via Sketching

The authors present distributed algorithms for several classes of submodular optimization problems such as k-cover, set cover, facility location, and probabilistic coverage.

Mohammadhossein Bateni , Hossein Esfandiari , Vahab Mirrokni
Active Feature Acquisition With Supervised Matrix Completion

This paper copes with the problem of feature missing. In this paper, the authors try to train an effective classification model with least acquisition cost by jointly performing active feature querying and supervised matrix completion.

Sheng-Jun Huang , Miao Xu , Ming-Kun Xie , Masashi Sugiyama , Gang Niu , Songcan Chen
Medicine and Healthcare
Paper Name Author(s)
TATC: Predicting Alzheimer’s Disease With Actigraphy Data

The authors present a novel solution named time-aware TICC and CNN (TATC), for predicting AD (Alzheimer’s Disease) from actigraphy data.

Jia Li , Yu Rong , Helen Meng , Zhihui Lu , Timothy Kwok , Hong Cheng
Accelerating Prototype-Based Drug Discovery Using Conditional Diversity Networks

The authors develop an algorithmic unsupervised-approach that automatically generates potential drug molecules given a prototype drug.

Shahar Harel , Kira Radinsky
Detection Of Apathy In Alzheimer Patients By Analysing Visual Scanning Behaviour With RNNs

In this study, visual scanning behaviours (VSBs) on emotional and non-emotional stimuli were used to detect apathy in patients with AD.Sixteen of the patients were apathetic.

Jonathan Chung , Sarah A. Chau , Nathan Herrmann , Krista L. Lanctôt , Moshe Eizenman
Releasing EHealth Analytics Into The Wild: Lessons Learnt From The SPHERE Project

The SPHERE project is devoted to advancing eHealth in a smart-home context, and supports full-scale sensing and data analysis to enable a generic healthcare service. The authors describe, from a data-science perspective, our experience of taking the system out of the laboratory

Tom Diethe , Mike Holmes , Meelis Kull , Miquel Perello Nieto , Kacper Sokol , Hao Song , Emma Tonkin , Niall Twomey , Peter Flach
Estimating Glaucomatous Visual Sensitivity From Retinal Thickness By Using Pattern-Based Regularizat

glaucoma is diagnosed on the basis of visual field sensitivity (VF), which is time-consuming, costly, and noisy. .the authors propose a new methodology for estimating VF from RT in glaucomatous eyes.The authors can thereby avoid overfitting of a CNN to small sized data.

Hiroki Sugiura , Taichi Kiwaki , Siamak Yousefi , Hiroshi Murata , Ryo Asaoka , Kenji Yamanishi
Methodology I
Paper Name Author(s)
PCA By Determinant Optimization Has No Spurious Local Optima

This paper studies Principal Component Analysis (PCA) .Classically, principal components of a dataset are interpreted as the directions that preserve most of its “energy”.In this paper, the authors consider one such interpretation of principal components as the directions that preserve most of the “volume” of the dataset.

Raphael Hauser , Armin Eftekhari , Heinrich Matzinger
Sequences Of Sets

This paper studies Sequential behavior.

Austin Benson , Ravi Kumar , Andrew Tomkins
Metric Learning From Probabilistic Labels

This paper studies Metric learning. The authors study how to effectively learn the distance metric from datasets that contain probabilistic information, and then propose two novel metric learning mechanisms for two types of probabilistic labels

Mengdi Huai , Chenglin Miao , Yaliang Li , Qiuling Suo , Lu Su , Aidong Zhang
Count-Min: Optimal Estimation And Tight Error Bounds Using Empirical Error Distributions

This paper studies The Count-Min sketch . The authors derive new count estimators, including a provably optimal estimator, which best or match previous estimators in all scenarios.

Daniel Ting
A Unified Approach To Quantifying Algorithmic Unfairness: Measuring Individual & Group Unfairness Vi

This paper studies Discrimination via algorithmic decision making. The authors focus on defining satisfactory measures of algorithmic unfairness. They offers a justified and general framework to compare and contrast the (un)fairness of algorithmic predictors.

Till Speicher , Hoda Heidari , Nina Grgic-Hlaca , Krishna P. Gummadi , Adish Singla , Adrian Weller , Muhammad Bilal Zafar
Methodology II
Paper Name Author(s)
FAHES: A Robust Disguised Missing Values Detector

This paper deals with disguised missing values(DMV). In this paper, the authors present FAHES, a robust system for detecting DMVs from two angles: DMVs as detectable outliers and as detectable inliers.

Mourad Ouzzani , Nan Tang , Ahmed Elmagarmid , Raul Castro Fernandez , Abdulhakim A. Qahtan
Data Diff: Interpretable, Executable Summaries Of Changes In Distributions For Data Wrangling

Many analyses in data science are not one-off projects, but are repeated over multiple data samples. The authors introduce the data diff problem, which attempts to turn this problem into an opportunity.

Charles Sutton , Timothy Hobson , James Geddes , Rich Caruana
New Robust Metric Learning Model Using Maximum Correntropy Criterion

This paper studies Metric learning. The authors propose a new robust metric learning approach by introducing the maximum correntropy criterion to deal with real-world malicious occlusions or corruptions.

Jie Xu , Lei Luo , Cheng Deng , Heng Huang
Concentrated Differentially Private Gradient Descent With Adaptive Per-Iteration Privacy Budget

Iterative algorithms' conversion to differentially private algorithms is often naive.

Jaewoo Lee , Daniel Kifer
Learning And Interpreting Complex Distributions In Empirical Data

To fit empirical data distributions and then interpret them in a generative way is a common research paradigm to understand the structure and dynamics underlying the data in various disciplines. The paper's model potentially provides a framework to fit complex distributions in empirical data, and more importantly, to understand their generative mechanisms..

Chengxi Zang , Peng Cui , Wenwu Zhu
HeavyGuardian: Separate And Guard Hot Items In Data Streams

Data stream processing is a fundamental issue in many fields. the authors propose a novel data structure named HeavyGuardian.

Tong Yang , Junzhi Gong , Haowei Zhang , Lei Zou , Lei Shi , Xiaoming Li
Natural Sciences, Sport, and the Application of Controlled Experiments
Paper Name Author(s)
Using Rule-Based Labels For Weak Supervised Learning: A ChemNet For Transferable Chemical Property P

The authors develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervised manner from large unlabeled chemical databases

Garrett Goh , Charles Siegel , Abhinav Vishnu , Nathan Hodas
Automatic Discovery Of Tactics In Spatio-Temporal Soccer Match Data

This paper explores the problem of automatic tactics detection from event-stream data collected from professional soccer matches.

Tom Decroos , Jan Van Haaren , Jesse Davis
False Discovery Rate Controlled Heterogeneous Treatment Effect Detection For Online Controlled Exper

THis paper studies Online controlled experiments . the authors propose statistical methods that can systematically and accurately identify Heterogeneous Treatment Effect (HTE) of any user cohort of interest

Yuxiang Xie , Nanyu Chen , Xiaolin Shi
PrePeP – A Tool For The Identification And Characterization Of Pan Assay Interference Compounds

This paper studies Pan Assays Interference Compounds (PAINS). The authors are developing a tool, PrePeP, that predicts PAINS, and allows experts to visually explore the reasons for the prediction.

Maksim Koptelov , Albrecht Zimmermann , Pascal Bonnet , Ronan Bureau , Bruno Crémilleux
Winner’s Curse: Bias Estimation For Total Effects Of Features In Online Controlled Experiments

This paper studies Online controlled experiments. The authors investigate a statistical selection bias and propose a correction method of getting an unbiased estimator.

Minyong Lee , Milan Shen
Planning and Forecasting in Finance and Commerce
Paper Name Author(s)
Customized Regression Model For Airbnb Dynamic Pricing

This paper describes the pricing strategy model deployed at Airbnb, an online marketplace for sharing home and experience.

Peng Ye , Julian Qian , Jieying Chen , Chen-Hung Wu , Yitong Zhou , Spencer De Mars , Frank Yang , Li Zhang
Large-Scale Order Dispatch In On-Demand Ride-Sharing Platforms: A Learning And Planning Approach

The authors present a novel order dispatch algorithm in large-scale on-demand ride-hailing platforms.

Zhe Xu , Zhixin Li , Qingwen Guan , Dingshui Zhang , Qiang Li , Junxiao Nan , Chunyang Liu , Wei Bian , Jieping Ye
Optimization Of A SSP’s Header Bidding Strategy Using Thompson Sampling

This paper studies Supply-Side Platforms (SSP) in digital media. The authors design and experiment a version of the Thompson Sampling algorithm

Grégoire Jauvion
Applying The Delta Method In Metric Analytics: A Practical Guide With Novel Ideas

This paper studies metrics to measure and monitor business performance. Motivated by fthe paper's real-life examples in metric development and analytics for large-scale A/B testing, the authors provide a practical guide to applying the Delta method, one of the most important tools from the classic statistics literature.

Alex Deng , Ulf Knoblich , Jiannan Lu
Audience Size Forecasting: Fast And Smart Budget Planning For Media Buyers

This paper studies ad campaigns. The authors provide a way to estimate campaign impressions given the campaign criteria.

Yeming Shi , Claudia Perlich , Rod Hook
Poster Sessions
Paper Name Author(s)
Fatigue Prediction In Outdoor Runners Via Machine Learning And Sensor Fusion

The authors explore whether we can use machine learning to predict the rating of perceived exertion (RPE), a validated subjective measure of fatigue, from inertial sensor data of individuals running outdoors.

Tim Op De Beeck, Wannes Meert, Kurt Schutte, Benedicte Vanwanseele, Jesse Davis
Sketched Follow-The-Regularized-Leader For Online Factorization Machine

This paper studies Factorization Machine (FM). The authors devise a novel online learning algorithm called Sketched Follow-The-Regularizer-Leader (SFTRL).

Luo Luo , Wenpeng Zhang , Zhihua Zhang , Wenwu Zhu , Tong Zhang , Jian Pei
Learning From History And Present: Next-item Recommendation Via Discriminatively Exploiting User Beh

This paper studies session-based recommendationsare. The authors propose a novel Behavior-Intensive Neural Network (BINN) for next-item recommendation by incorporating both users’ historical stable preferences and present consumption motivations.

Zhi Li , Hongke Zhao , Qi Liu , Zhenya Huang , Tao Mei , Enhong Chen
Isolation Kernel And Its Effect On SVM

This paper investigates data dependent kernels that are derived directly from data. The authors introduce Isolation Kernel which is solely dependent on data distribution, requiring neither class information nor explicit learning to be a classifier.

Kai Ming Ting , Yue Zhu , Zhi-Hua Zhou
SDREGION: Fast Spotting Of Changing Communities In Biological Networks

This paper studies molecular mechanism to disease progression.

Serene W.H. Wong , Chiara Pastrello , Max Kotlyar , Christos Faloutsos , Igor Jurisica
A Real-time Framework For Detecting Efficiency Regressions In A Globally Distributed Codebase

This paper studies monitoring compute and memory utilization metrics.This paper describes the end-to-end regression detection system designed and used at Facebook.

Martin Valdez-Vivas , Caner Gocmen , Andrii Korotkov , Ethan Fang , Kapil Goenka , Sherry Chen
Discovering Models From Structural And Behavioral Brain Imaging Data

This paper studies Block models of graphs. The authors explore finding block models where there is both a structural network and multiple behavioral graphs.

Zilong Bai , Buyue Qian , Ian Davidson
Next-Step Suggestions For Modern Interactive Data Analysis Platforms

This paper studies Modern Interactive Data Analysis (IDA) platforms. the authors present REACT, a recommender system designed for modern IDA platforms.

Amit Somech , Tova Milo
An Iterative Global Structure-Assisted Labeled Network Aligner

Integrating data from heterogeneous sources is often modeled as merging graphs.The authors propose a new iterative graph aligner, gsaNA, that uses the global structure of the graphs to significantly reduce the problem size and align large graphs with a minimal loss of information.

Abdurrahman Yaşar
Resolving Abstract Anaphora In Conversational Assistants Using A Hierarchically-stacked RNN

This paper studies conversational systems . The authors propose a novel solution which uses hierarchical neural network, comprising of BiLSTM layer and a maxpool layer that is hierarchically stacked to obtain a representation of each user utterance and then to obtain a representation for sequence of utterances.

Prerna Khurana , Puneet Agarwal , Gautam Shroff , Lovekesh Vig
Scalable Optimization For Embedding Highly-Dynamic And Recency-Sensitive Data

Generating embeddings on such data in a high-speed way is a challenging problem The authors propose a Nested Segment Tree to improve the recency-sensitive weight method and the diffusion strategy into a complexity no slower than the iteration step in practice.

Xumin Chen , Peng Cui , Shiqiang Yang
Learning Tree-based Deep Model For Recommender Systems

This paper studies Model-based methods for recommender systems. The authors propose a novel tree-based method which can provide logarithmic complexity .

Han Zhu , Xiang Li , Pengye Zhang , Guozheng Li , Jie He , Han Li , Kun Gai
Identify Susceptible Locations In Medical Records Via Adversarial Attacks On Deep Predictive Models

This paper studies electronic medical records (EHR). The authors propose an efficient and effective framework that learns a time-preferential minimum attack targeting the LSTM model with EHR inputs, and the authors leverage this attack strategy to screen medical records of patients and identify susceptible events and measurements.

Mengying Sun , Fengyi Tang , Jinfeng Yi , Fei Wang , Jiayu Zhou
Web-Scale Responsive Visual Search At Bing

In this paper, the authors introduce a web-scale general visual search system deployed in Microsoft Bing.

Houdong Hu , Yan Wang , Linjun Yang , Pavel Komlev , Li Huang , Xi Stephen Chen , Jiapei Huang , Ye Wu , Meenaz Merchant , Arun Sacheti
Learning Tasks For Multitask Learning: Heterogenous Patient Populations In The ICU

This paper studies Mobility event prediction. In this work, the authors present a two-step framework to 1) learn relevant patient subgroups, and 2) predict an outcome for separate patient populations in a multi-task framework, where each population is a separate task.

Harini Suresh , Jen Gong , John Guttag
Multilevel Wavelet Decomposition Network For Interpretable Time Series Analysis

This paper studies time series analysis. The authors propose a wavelet-based neural network structure called multilevel Wavelet Decomposition Network (mWDN) for building frequency-aware deep learning models for time series analysis.

Jingyuan Wang , Ze Wang , Jianfeng Li , Junjie Wu
Not Just Privacy: Improving Performance Of Private Deep Learning In Mobile Cloud

This paper studies deep neural networks (DNNs) on mobile devices.To benefit from the cloud data center without the privacy risk, the authors design a cloud-based framework ARDEN which partitions the DNN across mobile devices and cloud data centers.

Ji Wang , Jianguo Zhang , Weidong Bao , Xiaomin Zhu , Bokai Cao , Philip S. Yu
Learning Credible Models

It is important that a model be capable of providing reasons for its predictions. However, the model’s reasoning may not conform with well-established knowledge.The authors formally define credibility in the linear setting and focus on techniques for learning models that are both accurate and credible.

Jiaxuan Wang , Jeeheh Oh , Haozhu Wang , Jenna Wiens
Deep Censored Learning Of The Winning Price In The Real Time Bidding

The authors generalize the winning price model to incorporate the deep learning models with different distributions and propose an algorithm to learn from the historical bidding information, where the winning price are either observed or partially observed.

Wush Chi-Hsuan Wu , Mi-Yen Yeh , Ming-Syan Chen
Deep Multi-Output Forecasting: Learning To Accurately Predict Blood Glucose Trajectories

This paper studies multi-step forecasting.The authors propose multi-output deep architectures for multi-step forecasting in which the authors explicitly model the distribution of future values of the signal over a prediction horizon.

Ian Fox , Lynn Ang , Mamta Jaiswal , Rodica Pop-Busui , Jenna Wiens
Learning-to-Ask: Knowledge Acquisition Via 20 Questions

The authors study 20 Questions, an online interactive game where each question-response pair corresponds to a fact of the target entity, to acquire highly accurate knowledge effectively with nearly zero labor cost. The authors propose the Learning-to-Ask (LA) framework, within which the agent learns smart questioning strategies for information seeking and knowledge acquisition by means of deep reinforcement learning and generalized matrix factorization respectively.

Yihong Chen , Bei Chen , Xuguang Duan , Jian-Guang Lou , Yue Wang , Wenwu Zhu , Yong Cao
A Data-Driven Three-Layer Algorithm For Split Delivery Vehicle Routing Problem With 3D Container Loa

This paper studies Split Delivery Vehicle Routing Problem with 3D Loading Constraints (3L-SDVRP) .The paper's solution employs a novel data-driven three-layer search algorithm (DTSA),

Xijun Li , Mingxuan Yuan , Di Chen , Jianguo Yao , Jia Zeng
Coupled Context Modeling For Deep Chit-Chat: Towards Conversations Between Human And Computer

To have automatic conversations between human and computer is regarded as one of the most hardcore problems in computer science. The authors propose a novel context modeling framework with end-to-end neural networks for human-computer conversational systems.

Rui Yan , Dongyan Zhao
Multi-Cast Attention Networks

This paper investigates the novel use of attention as a form of feature augmentation, i.e, casted attention. The authors propose Multi-Cast Attention Networks (MCAN), a new attention mechanism and general model architecture for a potpourri of ranking tasks in the conversational modeling and question answering domains.

Yi Tay , Anh Tuan Luu , Siu Cheung Hui
Deep Reinforcement Learning For Sponsored Search Real-time Bidding

This paper studies real-time bidding (RTB) for bidding optimization. Teh authors consider the RTB problem in sponsored search auction, named SS-RTB. The authors propose a reinforcement learning (RL) solution for handling the complex dynamic environment.

Jun Zhao , Guang Qiu , Ziyu Guan , Wei Zhao , Xiaofei He
Towards Evolutionary Compression

This paper studies Compressing convolutional neural networks (CNNs) . In contrast to directly recognizing subtle weights or filters as redundant in a given CNN, this paper presents an evolutionary method to automatically eliminate redundant convolution filters.

Yunhe Wang , Chang Xu , Jiayan Qiu , Chao Xu , Dacheng Tao
BigIN4: Instant, Interactive Insight Identification For Multi-Dimensional Big Data

This paper studies identifying insights from multi-dimensional big data. The authors present BigIN4, a system for instant, interactive identification of insights from multi-dimensional big data.

Qingwei Lin , Weichen Ke , Jian-Guang Lou , Hongyu Zhang , Kaixin Sui , Yong Xu , Ziyi Zhou , Bo Qiao , Dongmei Zhang
Optimizing Cluster-based Randomized Experiments Under Monotonicity

This paper studies Cluster-based randomized experiments. The authors introduce a monotonicity condition under which a novel two-stage experimental design allows us to determine which of two cluster-based designs yields the least biased estimator.

Jean Pouget-Abadie , David Parkes , Vahab Mirrokni , Edoardo M. Airoldi
Rotation-Blended CNNs On A New Open Dataset For Tropical Cyclone Image-to-intensity Regression

Tropical cyclone (TC) is a type of severe weather systems that occur in tropical regions. Accurate estimation of TC intensity is crucial for disaster management. The authors release a such a benchmark dataset, which is a new open dataset collected from satellite remote sensing, for the TC-image-to-intensity estimation task.

Boyo Chen , Buofu Chen , Hsuan-Tien Lin
Detection Of Paroxysmal Atrial Fibrillation Using Attention Based Bidirectional Recurrent Neural Net

This paper studies Detection of atrial fibrillation (AF). The authors present an attention based deep learning framework for detection of paroxysmal AF episodes from a sequence of windows.

Supreeth Prajwal Shashikumar , Amit Shah , Gari Clifford , Shamim Nemati
Prediction-time Efficient Classification Using Feature Computational Dependencies

This paper studies constraints into the process of model selection and model optimization. this paper proposes a heterogeneous hypergraph to represent the feature computation dependency, after which a framework is proposed that jointly optimizes the accuracy and the exact test-time cost based on a given feature computational dependency.

Liang Zhao , Amir Alipour-Fanid , Martin Slawski , Kai Zeng
State Space Models For Forecasting Water Quality Variables: An Application In Aquaculture Prawn Farm

A novel approach to deterministic modelling of diurnal water quality parameters in aquaculture prawn ponds is presented.

Joel Dabrowski , Ashfaqur Rahman , Andrew George , Stuart Arnold , John McCulloch
StepDeep: A Novel Spatial-temporal Mobility Event Prediction Framework Based On Deep Neural Network

It has a huge potential in solving important problems such as minimizing passenger waiting time and maximizing the utilization of the transportation resources by planning vehicle routes and dispatching transportation resources.

Bilong Shen , Xiaodan Liang , Yufeng Ouyang , Miaofeng Liu , Weimin Zheng , Kathleen Carley
Unlocking The Value Of Privacy: Trading Aggregate Statistics Over Private Correlated Data

The authors study noisy aggregate statistics trading from the perspective of a data broker in data markets. They propose ERATO, which enables aggrEgate statistics pRicing over privATe cOrrelated data.

Chaoyue Niu , Zhenzhe Zheng , Fan Wu , Shaojie Tang , Xiaofeng Gao , Guihai Chen
DeepInf: Social Influence Prediction With Deep Learning

This paper studies an effective social influence prediction for each user .

Jiezhong Qiu , Jian Tang , Hao Ma , Yuxiao Dong , Kuansan Wang , Jie Tang
Latent Variable Time-varying Network Inference

The authors present latent variable time-varying graphical lasso (LTGL), a method for multivariate time-series graphical modelling that considers the influence of hidden or unmeasurable factors.

Federico Tomasi , Veronica Tozzo , Saverio Salzo , Alessandro Verri
Device Graphing By Example

The authors demonstrate how measurement, tracking, and other internet entities can associate multiple identifiers with a single device or user after coarse associations are made. The authors employ a Bayesian similarity algorithm

Keith Funkhouser , Matthew Malloy , Enis Ceyhun Alp , Phillip Poon , Paul Barford
Optimal Allocation Of Real-Time-Bidding And Direct Campaigns

The authors consider the problem of optimizing the revenue a web publisher gets through real-time bidding .

Gre?goire Jauvion , Nicolas Grislain
Approximating The Spectrum Of A Graph

The authors study the problem of approximating the spectrum. They present a sublinear time algorithm that, given the ability to query a random node in the graph and select a random neighbor of a given node, computes a succinct representation of an approximation.

David Cohen-Steiner , Weihao Kong , Christian Sohler , Gregory Valiant
Context-aware Academic Collaborator Recommendation

This paper studies Collaborator Recommendation . The authors propose Context-aware Collaborator Recommendation (CACR), which aims to recommend high-potential new collaborators for people’s context-restricted requests.

Zheng Liu , Xing Xie , Lei Chen
Deep Adversarial Learning For Multi-Modality Missing Data Completion

This paper studies multi-modality missing data completion problem. The authors formulate the problem as a conditional image generation task and propose an encoder-decoder deep neural network to tackle this problem.

Lei Cai , Zhengyang Wang , Hongyang Gao , Dinggang Shen , Shuiwang Ji
Accelerated Equivalence Structure Extraction Via Pairwise Incremental Search

This paper studies Equivalence structure (ES) extraction . The authors propose a new fast method called pairwise incremental search (PIS).

Seiya Satoh , Yoshinobu Takahashi , Hiroshi Yamakawa
Deep Interest Network For Click-Through Rate Prediction

This paper studies Click-through rate prediction. The authors propose a novel model: Deep Interest Network (DIN) which designs a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad.

Guorui Zhou , Xiaoqiang Zhu , Chengru Song , Kun Gai , Ying Fan , Han Zhu , Xiao Ma , Yanghui Yan , Junqi Jin , Han Li
Deploying Machine Learning Models For Public Policy: A Framework

This paper studies the deployment of machien learning. The authors describe their implementation of a machine learning early intervention system (EIS) for police officers

Klaus Ackermann , Joe Walsh , Adolfo De Unanue , Hareem Naveed , Andrea Navarrete Rivera , Sun-Joo Lee , Jason Bennett , Michael Defoe , Crystal Cody , Lauren Haynes , Rayid Ghani
Notification Volume Control And Optimization System At Pinterest

The authors propose a novel machine learning approach to decide notification volume for each user such that long term user engagement is optimized.

Bo Zhao , Koichiro Narita , Burkay Orten , John Egan
NGUARD: A Game Bot Detection Framework For NetEase MMORPGs

Game bots are automated programs that assist cheating users and enable them to obtain huge superiority. This paper studies game bot detection. To deal with the fast-changing nature of game bots, the authors proposed a generalized game bot detection framework for MMORPGs termed NGUARD, denoting NetEase Games’ Guard.

Jianrong Tao , Jiarong Xu , Linxia Gong , Yifu Li , Changjie Fan , Zhou Zhao
On Interpretation Of Network Embedding Via Taxonomy Induction

The authors investigate the interpretation of network embedding, aiming to understand how instances are distributed in embedding space, as well as explore the factors that lead to the embedding results.

Ninghao Liu , Xiao Huang , Jundong Li , Xia Hu
A Distributed Quasi-Newton Algorithm For Empirical Risk Minimization With Nonsmooth Regularization

The authors propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving ERM problems with a nonsmooth regularization term.

Ching-Pei Lee , Cong Han Lim , Stephen Wright
Accelerating Large-Scale Data Analysis By Offloading To High-Performance Computing Libraries Using A

many linear algebra computations that are the basis for solving common machine learning problems are significantly slower in Spark than when done using libraries written in a high-performance computing framework such as the Message-Passing Interface (MPI).

Alex Gittens , Kai Rothauge , Shusen Wang , Michael Mahoney , Lisa Gerhardt , Prabhat , Jey Kottalam , Michael Ringenburg , Kristyn Maschhoff
An Empirical Evaluation Of Sketching For Numerical Linear Algebra

This paper studies randomized data dimensionality reduction. the goal of this work is to provide a comprehensive comparison of such methods to alternative approaches.

Yogesh Dahiya , Dimitris Konomis , David P. Woodruff
BagMinHash - Minwise Hashing Algorithm For Weighted Sets

This paper studies Minwise hashing. BagMinHash is a new algorithm that can be orders of magnitude faster than current state of the art without any particular restrictions or assumptions on weights or data dimensionality.

Otmar Ertl
XStream: Outlier Detection In Feature-Evolving Data Streams

This work addresses the outlier detection problem for feature-evolving streams.The authors propose a density-based ensemble outlier detector, called xStream, for this more extreme streaming setting.

Emaad Ahmed Manzoor , Hemank Lamba , Leman Akoglu
HiExpan: Task-Guided Taxonomy Construction By Hierarchical Tree Expansion

This paper studies Taxonomies. The authors aim to construct a task-guided taxonomy from a domain-specific corpus, and allow users to input a seed taxonomy. They propose an expansion-based taxonomy construction framework, namely HiExpan.

Jiaming Shen , Zeqiu Wu , Dongming Lei , Chao Zhang , Xiang Ren , Michelle T. Vanni , Brain M. Sadler , Jiawei Han
Anatomy Of A Privacy-Safe Large-Scale Information Extraction System Over Email

This paper studies the problem of Extracting structured data from emails. This paper presents Juicer, a system for extracting information from email that is serving over a billion Gmail users daily.

Ying Sheng , Sandeep Tata , James B. Wendt , Jing Xie , Qi Zhao , Marc Najork
Recurrent Binary Embedding For GPU-Enabled Exhaustive Retrieval From Billion-Scale Semantic Vectors

This paper studies billion-scale information retrieval with exhaustive search. this paper proposes a Recurrent Binary Embedding (RBE) model that learns compact representations for real-time retrieval.

Ying Shan , Jian Jiao , Jie Zhu , Jc Mao
Deep Learning For Practical Image Recognition: Case Study On Kaggle Competitions

This paper studies deep convolutional neural networks (DCNN)

Xulei Yang , Zeng Zeng , Sin Teo , Li Wang , Vijay Chandrasekhar , Steven Hoi
Large-Scale Learnable Graph Convolutional Networks

This paper studies Convolutional Neural Network(CNN). To enable model training on large-scale graphs, we propose a sub-graph training method to reduce the excessive memory and computational resource requirements suffered by prior methods on graph convolutions.

Hongyang Gao , Zhengyang Wang , Shuiwang Ji
Deep Distributed Fusion Network For Air Quality Prediction

This paper studies air pollution. the authors propose a deep neural network (DNN)-based approach (entitled DeepAir), which consists of a spatial transformation component and a deep distributed fusion network.

Xiuwen Yi , Junbo Zhang , Zhaoyuan Wang , Tianrui Li , Yu Zheng
Infrastructure Quality Assessment In Africa Using Satellite Imagery And Deep Learning

Monitoring infrastructure quality in developing regions remains prohibitively expensive and impedes efforts to measure progress toward these goals.To this end, the authors investigate the use of widely available remote sensing data for the prediction of infrastructure quality in Africa.

Barak Oshri , Annie Hu , Peter Adelson , Xiao Chen , Pascaline Dupas , Jeremy Weinstein , Marshall Burke , David Lobell , Stefano Ermon
Automated Local Regression Discontinuity Design Discovery

This paper studies the problem of Inferring causal relationships in observational data. The authors develop the first statistical machine learning approach for automatically discovering regression discontinuity designs (RDDs), a quasi-experimental setup often used in econometrics

William Herlands , Edward McFowland Iii , Andrew Wilson , Daniel Neill
Autotune: A Derivative-free Optimization Framework For Hyperparameter Tuning

This paper studies hyperparameter tuning. the authors present an automated parallel derivative-free optimization framework called Autotune.

Patrick Koch , Oleg Golovidov , Steven Gardner , Brett Wujek , Joshua Griffin , Yan Xu
Fairness Of Exposure In Rankings

This paper studies rankings. The authors propose a conceptual and computational framework that allows the formulation of fairness constraints on rankings in terms of exposure allocation.

Ashudeep Singh , Thorsten Joachims
Dynamic Recommendations For Sequential Hiring Decisions In Online Labor Markets

This paper studies Online labor markets . the authors propose a framework for recommending contractors who are likely to get hired and successfully complete the task at hand.

Marios Kokkodis
Route Recommendations For Idle Taxi Drivers: Find Me The Shortest Route To A Customer!

This paper studies he problem of route recommendation to idle taxi drivers such that the distance between the taxi and an anticipated customer request is minimized. The authors develop a route recommendation engine called MDM: Minimizing Distance through Monte Carlo Tree Search.In contrast to existing techniques, MDM employs a continuous learning platform where the underlying model to predict future customer requests is dynamically updated.

Nandani Garg , Sayan Ranu
Product Characterisation Towards Personalisation: Learning Attributes From Unstructured Data To Reco

The authors describe a solution to tackle a common set of challenges in e-commerce, which arise from the fact that new products are continually being added to the catalogue.

Angelo Cardoso , Fabio Daolio
RAIM: Recurrent Attentive And Intensive Modeling Of Multimodal Continuous Patient Monitoring Data

This paper studies the problem of modelling medical data.

Yanbo Xu , Siddharth Biswal , Shriprasad Deshpande , Kevin Maher , Jimeng Sun
Deep Sequence Learning With Auxiliary Information For Traffic Prediction

This paper studies the problem of Predicting traffic conditions from online route queries. The authors intend to improve traffic prediction by appropriate integration of three kinds of implicit but essential factors encoded in auxiliary information.

Binbing Liao , Jingqing Zhang , Chao Wu , Douglas McIlwraith , Tong Chen , Shengwen Yang , Yike Guo , Fei Wu
Multi-Label Inference For Crowdsourcing

This paper studies multi-class multi-label annotation. The authors propose a novel probabilistic method, which includes a multi-class multi-label dependency (MCMLD) model, to address this problem.

Jing Zhang , Xindong Wu
MIX: Multi-Channel Information Crossing For Text Matching

This paper studies Short Text Matching. The authors present the design of Multi-Channel Information Crossing , a multi-channel convolutional neural network model for text matching, with additional attention mechanisms from sentence and text semantics.

Haolan Chen , Fred X. Han , Di Niu , Dong Liu , Kunfeng Lai , Chenglin Wu , Yu Xu
Multi-task Representation Learning For Travel Time Estimation

The paper's task in intelligent transportation systems is estimating the duration of a potential trip given the origin location, destination location as well as the departure time. The authors propose a MUlti-task Representation learning model for Arrival Time estimation (MURAT).

Yaguang Li , Kun Fu , Zheng Wang , Cyrus Shahabi , Jieping Ye , Yan Liu
Mobile Access Record Resolution On Large-scale Identifier-linkage Graphs

This paper studies Mobile Access Records Resolution (MARR) problem. the authors propose a SParse Identifier-linkage Graph (SPI-Graph) accompanied with the abundant mobile device profiling data to accurately match mobile access records to devices.

Shen Xin , Weizhao Xian , Martin Ester , Hongxia Yang , Zhongyao Wang , Jiajun Bu , Can Wang
An Extensible Event Extraction System With Cross-Media Event Resolution

This paper studies the automatic extraction of breaking news events from natural language text. The authors describe a large-scale automated system for extracting natural disasters and critical events from both newswire text and social media.

Fabio Petroni , Natraj Raman , Timothy Nugent , Armineh Nourbakhsh , Zarko Panic , Sameena Shah , Jochen L. Leidner
Neural Memory Streaming Recommender Networks With Adversarial Training

This paper studies ecommender systems with inputs of streaming data. The authors propose a streaming recommender model based on neural memory networks.An adaptive negative sampling framework based on Generative Adversarial Nets (GAN) is developed to optimize the paper's proposed streaming recommender model.

Qinyong Wang , Hongzhi Yin , Zhiting Hu , Defu Lian , Hao Wang , Zi Huang
Buy It Again: Modeling Repeat Purchase Recommendations

Repeat purchasing, i.e., a customer purchasing the same product multiple times, is a common phenomenon in retail.In this paper, the authors present the approach the authors developed for modeling repeat purchase recommendations.

Rahul Bhagat , Srevatsan Muralidharan , Alex Lobzhanidze , Shankar Vishwanath
Efficient Mining Of The Most Significant Patterns With Permutation Testing

This paper studies The extraction of patterns displaying significant association with a class label. The authors develop TopKWY, the first algorithm to mine the top-k significant patterns while rigorously controlling the family-wise error rate of the output and provide theoretical evidence of its effectiveness.

Leonardo Pellegrina , Fabio Vandin
Identifying Sources And Sinks In The Presence Of Multiple Agents With Gaussian Process Vector Calcul

In systems of multiple agents, identifying the cause of observed agent dynamics is challenging.The authors present GP-LAPLACE, a technique for locating sources and sinks from trajectories in time-varying fields.

Adam Derek Cobb , Richard Everett , Andrew Markham , Stephen Roberts
Dual Memory Neural Computer For Asynchronous Two-view Sequential Learning

This paper studies the problem of capturing relations among views.The authors present a new memory augmented neural network that aims to model these complex interactions between two asynchronous sequential views.

Hung Le , Truyen Tran , Svetha Venkatesh
D2K: Scalable Community Detection In Massive Networks Via Small-Diameter K-Plexes

This paper studies k-plexes, a well known pseudo-clique model for network communities.The paper's goal is to detect large communities in today’s real-world graphs. The authors present D2K, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes.

Alessio Conte , Tiziano De Matteis , Daniele De Sensi , Roberto Grossi , Andrea Marino , Luca Versari
Quantifying And Minimizing Risk Of Conflict In Social Networks

This paper studies controversy, disagreement, conflict, polarization and opinion divergence in social networks

Xi Chen , Jefrey Lijffijt , Tijl De Bie
Predicting Estimated Time Of Arrival For Commercial Flights

A major factor in increased airspace efficiency and capacity is accurate prediction of Estimated Time of Arrival (ETA) for commercial flights.In this paper, the authors present a novel ETA Prediction System for commercial flights.

Samet Ayhan , Pablo Costas , Hanan Samet
Adaptive Paywall Mechanism For Digital News Media

Many online news agencies utilize the paywall mechanism to increase reader subscriptions. the authors propose an adaptive paywall mechanism to balance the benefit of showing an article against that of displaying the paywall (i.e., terminating the session).

Heidar Davoudi , Aijun An , Morteza Zihayat , Gordon Edall
Career Transitions And Trajectories: A Case Study In Computing

This paper studies long-term career paths of the people of AI.

Tara Safavi , Maryam Davoodi , Danai Koutra
Parsing To Programs: A Framework For Situated QA

This paper introduces Parsing to Programs, a framework that combines ideas from parsing and probabilistic programming for situated question answering.As a case study, the authors build a system that solves pre-university level Newtonian physics questions.

Mrinmaya Sachan , Eric P. Xing
Recommendations With Negative Feedback Via Pairwise Deep Reinforcement Learning

the authors propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users.

Xiangyu Zhao , Liang Zhang , Zhuoye Ding , Long Xia , Jiliang Tang , Dawei Yin
EANN: Event Adversarial Neural Networks For Multi-Modal Fake News Detection

This paper deals with falek news in social media. The authors propose an end-to-end framework named Event Adversarial Neural Network (EANN), which can derive event-invariant features and thus benefit the detection of fake news on newly arrived events.

Yaqing Wang , Fenglong Ma , Zhiwei Jin , Ye Yuan , Guangxu Xun , Kishlay Jha , Lu Su , Jing Gao
How LinkedIn Economic Graph Bonds Information And Product: Applications In LinkedIn Salary

The LinkedIn Salary product was launched in late 2016 with the goal of providing insights on compensation distribution to job seekers. A key challenge is to reliably infer the insights at the company level when there is limited or no data at all. The authors propose a two-step framework that utilizes a novel, semantic representation of companies (Company2vec) and a Bayesian statistical model.

Xi Chen , Yiqun Liu , Liang Zhang , Krishnaram Kenthapadi
Explanation Mining: Post Hoc Interpretability Of Latent Factor Models For Recommendation Systems

This paper studies machine learning algorithms to drive decision-making. The authors propose a novel approach for extracting explanations from latent factor recommendation systems by training association rules on the output of a matrix factorisation black-box model

Georgina Peake , Jun Wang
PittGrub: A Frustration-Free System To Reduce Food Waste By Notifying Hungry College Students

This paper deals with food waste. The authors introduce PittGrub, a notification system to intelligently select users to invite to events that have leftover food.

Mark Silvis , Anthony Sicilia , Alexandros Labrinidis
Online Adaptive Asymmetric Active Learning For Budgeted Imbalanced Data

This paper investigates Online Active Learning (OAL) for imbalanced unlabeled datastream, where only a budget of labels can be queried to optimize some cost-sensitive performance measure. The authors propose a novel Online Adaptive Asymmetric Active (OA3) learning algorithm, which is based on a new asymmetric strategy and second-order optimization.

Yifan Zhang , Peilin Zhao , Jiezhang Cao , Wenye Ma , Junzhou Huang , Qingyao Wu , Mingkui Tan
Enhancing Predictive Modeling Of Nested Spatial Data Through Group-Level Feature Disaggregation

This paper examines relationship between multilevel modeling and multi-task learning.the authors present a comparative analysis between the two methods to illustrate their strengths and limitations when applied to two-level nested data.

Boyang Liu , Pang-Ning Tan , Jiayu Zhou
A Scalable Solution For Rule-Based Part-of-Speech Tagging On Novel Hardware Accelerators

This paper studies Part-of-speech (POS) tagging . The authors leverage two hardware accelerators, the Automata Processor (AP) and Field Programmable Gate Arrays (FPGA), to accelerate rule-based POS tagging

Elaheh Sadredini , Deyuan Guo , Chunkun Bo , Reza Rahimi , Hongning Wang , Kevin Skadron
Easing Embedding Learning By Comprehensive Transcription Of Heterogeneous Information Networks

the authors propose to study the problem of comprehensive transcription of heterogeneous information networks(HINs). To cope with the challenges in the comprehensive transcription of HINs, the authors propose the HEER algorithm, which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics.

Yu Shi , Qi Zhu , Fang Guo , Chao Zhang , Jiawei Han
On Discrimination Discovery And Removal In Ranked Data Using Causal Graph

the authors study the fairness-aware ranking problem which aims to discover discrimination in ranked datasets and reconstruct the fair ranking. They propose to map the rank position to a continuous score variable that represents the qualification of the candidates.

Yongkai Wu , Lu Zhang , Xintao Wu
Interpretable Representation Learning For Healthcare Via Capturing Disease Progression Through Time

This paper studies predictive modeling of Electronic Health Records (EHR). the authors propose a novel interpretable deep learning model, called Timeline.

Tian Bai , Shanshan Zhang , Brian Egleston , Slobodan Vucetic
Learning Structural Node Embeddings Via Diffusion Wavelets

This paper studies the problem of learning structural representations of nodes. The authors develop GraphWave, a method that represents each node’s network neighborhood via a low-dimensional embedding by leveraging heat wavelet diffusion patterns.

Claire Donnat , Marinka Zitnik , David Hallac , Jure Leskovec
New Incremental Learning Algorithm For Semi-Supervised Support Vector Machine

This paper studies Semi-Supervised Support Vector Machine (S3VM). The authors propose a new incremental learning algorithm to scale up S3VM (IL-S3VM) based on the path following technique in the framework of Difference of Convex (DC) programming.

Bin Gu , Xiao-Tong Yuan , Songcan Chen , Heng Huang
Inferring Metapopulation Propagation Network For Intra-city Epidemic Control And Prevention

the authors argue that the intra-city epidemic propagation should be modeled on a metapopulation base, and propose a two-step method for this purpose.

Jingyuan Wang , Xiaojian Wang , Junjie Wu
Billion-scale Commodity Embedding For E-commerce Recommendation In Alibaba

This paper studies Recommender systems. The methods are based on a well-known graph embedding framework.

Jizhe Wang , Pipei Huang , Huan Zhao , Zhibo Zhang , Binqiang Zhao , Dik Lun Lee
Multi-Task Learning With Neural Networks For Voice Query Understanding On An Entertainment Platform

This paper tackles the challenge of understanding voice queries posed against the Comcast Xfinity X1 entertainment platform.The authors present a novel multi-task neural architecture that jointly learns to accomplish all three tasks.

Jinfeng Rao , Ferhan Ture , Jimmy Lin
Shield: Fast, Practical Defense And Vaccination For Deep Learning Using JPEG Compression

deep neural networks (DNNs) are highly vulnerable to adversarially generated images.The authors place JPEG compression at the core of their proposed SHIELD defense framework, utilizing its capability to effectively “compress away” such pixel manipulation

Nilaksh Das , Madhuri Shanbhogue , Shang-Tse Chen , Fred Hohman , Siwei Li , Li Chen , Michael E. Kounavis , Duen Horng Chau
Safe Triplet Screening For Distance Metric Learning

The authors study safe screening for metric learning.

Tomoki Yoshida , Ichiro Takeuchi , Masayuki Karasuyama
PME: Projected Metric Embedding On Heterogeneous Networks For Link Prediction

This paper studies Heterogenous information network embedding. To alleviate the potential geometrical inflexibility of existing metric learning approaches, the authors propose to build object and relation embeddings in separate object space and relation spaces rather than in a common space.

Hongxu Chen , Hongzhi Yin , Weiqing Wang , Hao Wang , Quoc Viet Hung Nguyen , Xue Li
COTA: Improving The Speed And Accuracy Of Customer Support Through Ranking And Deep Networks

This paper studies customer issues. This paper proposes COTA, a system to improve speed and reliability of customer support for end users through automated ticket classification and answers selection for support representatives.

Piero Molino , Huaixiu Zheng , Yi-Chia Wang
Exploring Student Check-In Behavior For Improved Point-of-Interest Prediction

This paper studies the problem of Point-of- Interest (POI) prediction . the authors propose a heterogeneous graph-based method to encode the correlations between users, POIs, and activities, and then jointly learn embeddings for the vertices.

Mengyue Hang , Ian Pytlarz , Jennifer Neville
Variable Selection And Task Grouping For Multi-Task Learning

The authors consider multi-task learning, which simultaneously learns related prediction tasks, to improve generalization performance.

Junyong Jeong , Chi-Hyuck Jun
Scalable Query N-Gram Embedding For Improving Matching And Relevance In Sponsored Search

the authors propose a novel embedding of queries and ads in sponsored search.

Xiao Bai , Erik Ordentlich , Yuanyuan Zhang , Andy Feng , Adwait Ratnaparkhi , Reena Somvanshi , Aldi Tjahjadi
Quantifying Uncertainty In Discrete-Continuous And Skewed Data With Bayesian Deep Learning

This paper studies Deep Learning (DL) methods. Tthe authors present a discrete-continuous BDL model with Gaussian and lognormal likelihoods for uncertainty quantification (UQ).

Thomas Vandal , Evan Kodra , Jennifer Dy , Sangram Ganguly , Ramakrishna Nemani , Auroop Ganguly
DILOF: Effective And Memory Efficient Local Outlier Detection In Data Streams

This paper studies outlier detection algorithm called Local Outlier Factor (LOF). The authors propose a new outlier detection algorithm for data streams, called DILOF that effectively overcomes the limitations of LOF.

Gyoung S. Na , Donghyun Kim , Hwanjo Yu
Transfer Learning Via Feature Isomorphism Discovery

The authors develop a novel transfer learning framework called Transfer Learning via Feature Isomorphism Discovery (abbreviated to TLFid), which owns high tolerance for noisy correspondence between domains as well as scarce or non-existing labeled instances.

Shimin Di , Jingshu Peng , Yanyan Shen , Lei Chen
Visual Search At Alibaba

This paper introduces the large scale visual search algorithm and system infrastructure at Alibaba.(b) how to deal with large scale indexing for massive updating data.

Yanhao Zhang , Pan Pan , Yun Zheng , Kang Zhao , Yingya Zhang , Xiaofeng Ren , Rong Jin
Learning To Estimate The Travel Time

Vehicle travel time estimation or estimated time of arrival (ETA) is one of the most important location-based services (LBS).This paper presents a novel machine learning solution to predict the vehicle travel time based on floating-car data.The authors evaluate the paper's solution offline with millions of historical vehicle travel data.

Zheng Wang , Kun Fu , Jieping Ye
Distributed Collaborative Hashing And Its Applications In Ant Financial

This paper studies Collaborative filtering. the authors propose a Distributed Collaborative Hashing ( DCH ) model .

Chaochao Chen , Ziqi Liu , Peilin Zhao , Longfei Li , Jun Zhou , Xiaolong Li
Robust Bayesian Kernel Machine Via Stein Variational Gradient Descent For Big Data

Most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization. The authors propose a robust Bayesian Kernel Machine (BKM) - a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods.

Khanh Nguyen , Trung Le , Tu Dinh Nguyen , Dinh Phung , Geo?rey Webb
Accurate And Fast Asymmetric Locality-Sensitive Hashing Scheme For Maximum Inner Product Search

This paper studies The problem of Approximate Maximum Inner Product (AMIP) search. The authors propose a novel Asymmetric LSH scheme based on Homocentric Hypersphere partition (H2-ALSH) for high-dimensional AMIP search.

Qiang Huang , Guihong Ma , Jianlin Feng , Qiong Fang , Anthony K. H. Tung
Content To Node: Self-translation Network Embedding

This paper concerns the problem of network embedding (NE). The authors propose a novel sequence-to-sequence model based NE framework which is referred to as Self-Translation Network Embedding (STNE) model.

Jie Liu , Zhicheng He , Lai Wei , Yalou Huang
When Sentiment Analysis Meets Social Network: A Holistic User Behavior Modeling In Opinionated Data

This paper studies User modeling.

Lin Gong , Hongning Wang
Discovering Latent Patterns Of Urban Cultural Interactions In WeChat For Modern City Planning

This paper studies the optimal allocation of cultural establishments and related resources across urban regions . The authors make use of a large longitudinal dataset of user location check-ins from the online social network WeChat to develop a data-driven framework for cultural planning in the city of Beijing.

Xiao Zhou , Anastasios Noulas , Cecilia Mascolo , Zhongxiang Zhao
Self-Paced Network Embedding

The authors propose a novel self-paced network embedding method.

Hongchang Gao , Heng Huang
Semi-Supervised Generative Adversarial Network For Gene Expression Inference

This paper stuties gene expression. In order to take advantage of cheap unlabeled data, the authors propose a novel semi-supervised deep generative model for target gene expression inference.

Kamran Ghasedi , Xiaoqian Wang , Heng Huang
Lessons Learned From Developing And Deploying A Large-Scale Employer Name Normalization System For O

This paper studies Employer name normalization, or linking employer names in job postings or resumes to entities in an employer knowledge base (KB). The authors describe the CompanyDepot system developed at CareerBuilder

Qiaoling Liu , Josh Chao , Thomas Mahoney , Chris Min , Faizan Javed , Alan Chern , Valentin Jijkoun
Multimodal Sentiment Analysis To Explore The Structure Of Emotions

The authors propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing.

Anthony Hu , Seth Flaxman
Are Your Data Gathered?

This paper studies the problem of Understanding data distributions . The authors propose the folding test of unimodality.

Alban Siffer , Pierre-Alain Fouque , Alexandre Termier , Christine Largou
Improving Box Office Result Predictions For Movies Using Consumer-Centric Models

This paper studies movie box office revenue. The authors use individual recommendations and user-based forecast models in a system that forecasts revenue and additionally provides actionable insights for industry professionals.

Rui Paulo Ruhrlander, Martin Boissier, Matthias Uflacker
On The Generative Discovery Of Structured Medical Knowledge

the authors introduce a generative perspective to study the relational medical entity pair discovery problem. A generative model named Conditional Relationship Variational Autoencoder is proposed to discover meaningful and novel medical entity pairs by purely learning from the expression diversity in the existing relational medical entity pairs.

Chenwei Zhang , Yaliang Li , Nan Du , Wei Fan , Philip S. Yu
Dynamic Pricing Under Competition On Online Marketplaces: A Data-Driven Approach

The authors analyze stochastic dynamic pricing models in competitive markets with multiple offer dimensions, such as price, quality, and rating.

Rainer Schlosser , Martin Boissier
Pangloss: Fast Entity Linking In Noisy Text Environments

This paper studies entity linking.This paper presents Pangloss, a production system for entity disambiguation on noisy text.

Michael Conover , Scott Blackburn , Matthew Hayes , Pete Skomoroch , Sam Shah
Towards Knowledge Discovery From The Vatican Secret Archives. In Codice Ratio

In Codice Ratio is a research project to study tools for analyzing the contents of historical documents conserved in the Vatican Secret Archives (VSA). In this paper, the authors propose to develop a system to support the transcription of medieval manuscripts.

KDonatella Firmani , Marco Maiorino , Paolo Merialdo
Active Opinion Maximization In Social Networks

The paper studies Influence maximization (IM). The authors consider a problem called AcTive Opinion Maximization (ATOM), where the goal is to find a set of seed users to maximize the overall opinion spread toward a target product in a multi-round campaign.

Xinyue Liu , Xiangnan Kong , Philip Yu
Hetero-ConvLSTM: A Deep Learning Approach To Traffic Accident Prediction On Heterogeneous Spatio-Tem

This paper studies Predicting traffic accidents. The authors perform a comprehensive study on the traffic accident prediction problem using the Convolutional Long Short-Term Memory (ConvLSTM) neural network model.

Zhuoning Yuan , Xun Zhou , Tianbao Yang
Classifying And Counting With Recurrent Contexts

Many real-world applications in the batch and data stream settings with data shift pose restrictions to the access to class labels. The authors explore a different set of assumptions without relying on the availability of class labels.

Denis Dos Reis
Butterfly Counting In Bipartite Networks

This paper considers the problem of counting motifs in bipartite affiliation networks. main contribution is a suite of randomized algorithms that can quickly approximate the number of butterflies in a graph with a provable guarantee on accuracy.

Seyed-Vahid Sanei-Mehri , Ahmet Erdem Sariyuce , Srikanta Tirthapura
Automated Audience Segmentation Using Reputation Signals

Selecting the right audience for an advertising campaign is one of the most challenging, time-consuming and costly steps in the advertising process. In this paper the authors study how demand-side platforms (DSPs) can leverage the data they collect (demographic and behavioral) in order to learn reputation signals about end user convertibility and advertisement (ad) quality.

Maria Daltayanni
SPARC: Self-Paced Network Representation For Few-Shot Rare Category Characterization

This paper studies rare category characterization .

Dawei Zhou , Jingrui He , Hongxia Yang , Wei Fan
RapidScorer: Fast Tree Ensemble Evaluation By Maximizing Compactness In Data Level Parallelization

This paper studies Relevance ranking models based on additive ensembles of regression trees. The authors present RapidScorer , a novel framework for speeding up the scoring process of industry-scale tree ensemble models.

Ting Ye , Hucheng Zhou , Will Zou , Bin Gao , Ruofei Zhang
SQR: Balancing Speed, Quality And Risk In Online Experiments

This paper studies Controlled experimentation, also called A/B testing. The authors build up a ramping framework that can effectively balance among Speed, Quality and Risk (SQR).

Ya Xu , Weitao Duan , Shaochen Huang
Tax Fraud Detection For Under-reporting Declarations Using An Unsupervised Machine Learning Approach

The authors present a novel approach for the detection of potential fraudulent tax payers using only unsupervised learning techniques and allowing the future use of supervised learning techniques.

Daniel de Roux , Boris P
Demand-Aware Charger Planning For Electric Vehicle Sharing

The authors formulate the Electric Vehicle Charger Planning (EVCP) problem especially for EV-sharing.

Bowen Du , Yongxin Tong , Zimu Zhou , Qian Tao , Wenjun Zhou
Active Search Of Connections For Case Building And Combating Human Trafficking

The authors formulate a problem called Active Search of Connections, which finds target entities that share evidence of different types with a given lead. They present RedThread, an efficient solution for inferring related and relevant nodes while incorporating the user’s feedback to guide the inference.

Reihaneh Rabbany , David Bayani , Artur Dubrawski
Reinforcement Learning To Rank In E-Commerce Search Engine: Formalization, Analysis, And Application

Learning to rank (LTR) methods have been widely applied to ranking problems. The authors propose to use reinforcement learning (RL) to learn an optimal ranking policy which maximizes the expected accumulative rewards in a search session.

Yujing Hu , Qing Da , Anxiang Zeng , Yang Yu , Yinghui Xu
Stablizing Reinforcement Learning In Dynamic Environment With Application To Online Recommendation

traditional reinforcement learning approaches are designed to work in static environments. The authors propose two techniques to alleviate the unstable reward estimation problem in dynamic environments, the stratified sampling replay strategy and the approximate regretted reward

Shi-Yong Chen , Yang Yu , Qing Da , Jun Tan , Hai-Kuan Huang , Hai-Hong Tang
Managing Computer-Assisted Detection System Based On Transfer Learning With Negative Transfer Inhibi

computer-assisted detection (CAD) system based on machine learning is expected to assist radiologists. In this paper, the authors focus on transfer learning without sharing training data due to the need to protect personal information in each institution.

Issei Sato , Yukihiro Nomura , Shouhei Hanaoka , Soichiro Miki , Naoto Hayashi , Osamu Abe , Yoshitaka Masutani
Hyperparameter Importance Across Datasets

This paper studies automated hyperparameter optimization methods. The authors aim to answer the following two questions: Given an algorithm, what are generally its most important hyperparameters, and what are typically good values for these?

Jan N. van Rijn , Frank Hutter
Assessing Candidate Preference Through Web Browsing History

The authors propose the use of Web browsing history as a new indicator of candidate preference among the electorate, one that has potential to overcome a number of the drawbacks of election polls.

Giovanni Comarela
Active Deep Learning To Tune Down The Noise In Labels

The authors present a new Active Deep Denoising (ADD) approach that first builds a DNN noise model, and then adopts an active learning algorithm to identify the optimal denoising function.

Karan Samel , Xu Miao
Gotcha - Sly Malware! Scorpion: A Metagraph2vec Based Malware Detection System

The authors propose a new HIN embedding model metagraph2vec on the first attempt to learn the low-dimensional representations for the nodes in HIN, where both the HIN structures and semantics are maximally preserved for malware detection

Yujie Fan , Shifu Hou , Yiming Zhang , Yanfang Ye , Melih Abdulhayoglu
Ranking and Making Recommendations
Paper Name Author(s)
Q&R: A Two-Stage Approach Toward Interactive Recommendation

This paper studies Recommendation Systems.The authors detail an RNN-based model for generating topics a user might be interested in, and then extend a state-of-the-art RNN-based video recommender to incorporate the user’s selected topic.

Konstantina Christakopoulou , Alex Beutel , Rui Li , Sagar Jain , Ed H. Chi
Graph Convolutional Neural Networks For Web-Scale Recommender Systems

THe authors develop a data-efficient Graph Convolutional Network (GCN) algorithm, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information.

Rex Ying , Ruining He , Kaifeng Chen , Pong Eksombatchai , William L. Hamilton , Jure Leskovec
Real-time Personalization Using Embeddings For Search Ranking At Airbnb

This paper studies Search Ranking and Recommendations. The authors describe Listing and User Embedding techniques we developed and deployed for purposes of Real-time Personalization in Search Ranking and Similar Listing Recommendations

Mihajlo Grbovic , Haibin Cheng
Online Parameter Selection For Web-based Ranking Problems

This paper studies Web-based ranking problems. The authors consider a large-scale online application where metrics for multiple objectives are continuously available and can be controlled in a desired fashion by changing certain control parameters in the ranking model.

Deepak Agarwal , Kinjal Basu , Souvik Ghosh , Ying Xuan , Yang Yang , Liang Zhang
Near Real-time Optimization Of Activity-based Notifications

This paper studies the problem of notifications of social media.

Yan Gao , Viral Gupta , Jinyun Yan , Changji Shi , Zhongen Tao , Pj Xiao , Curtis Wang , Shipeng Yu , Romer Rosales , Ajith Muralidharan , Shaunak Chatterjee
Recommenders I
Paper Name Author(s)
Ranking Distillation: Learning Compact Ranking Models With High Performance For Recommender System

The authors propose a novel way to train ranking models, such as recommender systems, that are both effective and efficient. They propose a KD technique for learning to rank problems, called ranking distillation (RD).

Jiaxi Tang , Ke Wang
Efficient Attribute Recommendation With Probabilistic Guarantee

The authors study how to efficiently solve a primitive data exploration problem: Given two ad-hoc predicates which define two subsets of a relational table, find the top-K attributes whose distributions in the two subsets deviate most from each other. The authors develop an adaptive querying solution with probabilistic guarantee of correctness and near-optimal sample complexity.

Chi Wang , Kaushik Chakrabarti
Algorithms For Hiring And Outsourcing In The Online Labor Market

In this paper, the authors provide algorithms for outsourcing and hiring workers in a general setting, where workers form a team and contribute different skills to perform a task. They call this model team formation with outsourcing.

Aris Anagnostopoulos , Carlos Castillo , Adriano Fazzone , Stefano Leonardi , Evimaria Terzi
Multi-Pointer Co-Attention Networks For Recommendation

Many recent state-of-the-art recommender systems such as D-ATT, TransNet and DeepCoNN exploit reviews for representation learning. This paper proposes a new neural architecture for recommendation with reviews.

Yi Tay , Anh Tuan Luu , Siu Cheung Hui
Leveraging Meta-path Based Context For Top N Recommendation With Co-attention Mechanism

This paper studies Heterogeneous information network (HIN) .To construct the meta-path based context, the authors propose to use a priority based sampling technique to select high-quality path instances.

Binbin Hu , Chuan Shi , Xin Zhao , Philip S. Yu
Recommenders II
Paper Name Author(s)
XDeepFM: Combining Explicit And Implicit Feature Interactions For Recommender Systems

This paper studies Combinatorial features. The authors propose a novel Compressed Interaction Network (CIN), which aims to generate feature interactions in an explicit fashion and at the vector-wise level.

Jianxun Lian , Xiaohuan Zhou , Fuzheng Zhang , Zhongxia Chen , Xing Xie , Guangzhong Sun
STAMP: Short-Term Attention/Memory Priority Model For Session-based Recommendation

This paper studies Predicting users’ actions based on anonymous sessions. A novel short-term attention/memory priority model is proposed as a remedy, which is capable of capturing users’ general interests from the long-term memory of a session context,

Qiao Liu , Yifu Zeng , Refuoe Mokhosi , Haibin Zhang
Local Latent Space Models For Top-N Recommendation

This paper studies Users’ behaviors . The authors consider models in which there are some latent factors that capture the shared aspects and some user subset specific latent factors that capture the set of aspects that the different subsets of users care about.

Evangelia Christakopoulou , George Karypis
Multi-User Mobile Sequential Recommendation: An Efficient Parallel Computing Paradigm

This paper studies The classic mobile sequential recommendation (MSR) problem. The authors formalize a new multi-user MSR (MMSR) problem that locates optimal routes for a group of drivers with different starting positions.

Zeyang Ye , Lihao Zhang , Keli Xiao , Wenjun Zhou , Yong Ge , Yuefan Deng
Trajectory-driven Influential Billboard Placement

In this paper the authors propose and study the problem of trajectory-driven influential billboard placement.

Ping Zhang , Zhifeng Bao , Yuchen Li , Guoliang Li , Yipeng Zhang , Zhiyong Peng
Offline Evaluation Of Ranking Policies With Click Models

Many web systems rank and present a list of items to users. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed.The authors propose evaluation algorithms for estimating the expected number of clicks on ranked lists from historical logged data.

Shuai Li , Yasin Abbasi-Yadkori , Branislav Kveton , S. Muthukrishnan , Vishwa Vinay , Zheng Wen
Reinforcement Learning
Paper Name Author(s)
IntelliLight: A Reinforcement Learning Approach For Intelligent Traffic Light Control

This paper studies The intelligent traffic light control. In this paper, the authors propose a more effective deep reinforcement learning model for traffic light control.

Hua Wei , Guanjie Zheng , Huaxiu Yao , Zhenhui Li
Transcribing Content From Structural Images With Spotlight Mechanism

This paper studies the problem of Transcribing content from structural images. The authors propose a hierarchical Spotlight Transcribing Network (STN) framework followed by a two-stage “where-to-what’’ solution.The authors also design a reinforcement method to refine the paper's STN framework by self-improving the spotlight mechanism.

Yu Yin , Zhenya Huang , Enhong Chen , Qi Liu , Fuzheng Zhang , Xing Xie , Guoping Hu
Investor-Imitator: A Framework For Trading Knowledge Extraction

This paper studies the analysis of Stock trading. The authors propose a reinforcement learning driven Investor-Imitator framework to formalize the trading knowledge, by imitating an investor’s behavior with a set of logic descriptors.

Yi Ding , Weiqing Liu , Jiang Bian , Daoqiang Zhang , Tie-Yan Liu
Efficient Large-Scale Fleet Management Via Multi-Agent Deep Reinforcement Learning

This paper studies Large-scale fleet management strategy.The authors propose to tackle the large-scale fleet management problem using reinforcement learning, and propose a contextual multi-agent reinforcement learning framework including two concrete algorithms, namely contextual deep Q-learning and contextual multi-agent actor-critic, to achieve explicit coordination among a large number of agents adaptive to different contexts.

Kaixiang Lin , Renyu Zhao , Zhe Xu , Jiayu Zhou
Supervised Reinforcement Learning With Recurrent Neural Network For Dynamic Treatment Recommendation

This paper studies Dynamic treatment recommendation systems based on large-scale electronic health records (EHRs). The authors propose Supervised Reinforcement Learning with Recurrent Neural Network (SRL-RNN), which fuses them into a synergistic learning framework.

Lu Wang , Wei Zhang , Xiaofeng He , Hongyuan Zha
Representation and Embedding I
Paper Name Author(s)
Multi-label Learning With Highly Incomplete Data Via Collaborative Embedding

This paper studies improving the effectiveness of multi-label learning with incomplete label assignments. The authors propose a weakly supervised multi-label learning approach, based on the idea of collaborative embedding.

Yufei Han , Yun Shen , Guolei Sun , Xiangliang Zhang
Concepts-Bridges: Uncovering Conceptual Bridges Based On Biomedical Concept Evolution

Given two topics of interest (A, C) that are otherwise disconnected for instance two concepts: a disease (“Migraine”) and a therapeutic substance (“Magnesium”) this paper attempts to find the conceptual bridges (e.g., serotonin (B)) that connects them in a novel way. The authors define this problem as mining time-aware Top-k conceptual bridges, and in doing so provide a systematic approach to formalize the problem.

Kishlay Jha , Guangxu Xun , Yaqing Wang , Vishrawas Gopalakrishnan , Aidong Zhang
Interactive Paths Embedding For Semantic Proximity Search On Heterogeneous Graphs

This paper studies Semantic proximity search on heterogeneous graph. The authors introduce a novel concept of interactive paths to model the inter-dependency among multiple paths between a query object and a target object. They then propose an Interactive Paths Embedding (IPE) model, which learns low-dimensional representations for the resulting interactive-paths structures for proximity estimation.

Zemin Liu , Vincent W. Zheng , Zhou Zhao , Zhao Li , Hongxia Yang , Minghui Wu , Jing Ying
Multi-Type Itemset Embedding For Learning Behavior Success

Contextual behavior modeling uses data from multiple contexts to discover patterns for predictive analysis. The authors formulate a behavior as a set of context items of different types (such as decision makers, operators, goals and resources), consider an observable itemset as a behavior success, and propose a novel scalable method, “multi-type itemset embedding”, to learn the context items’ representations preserving the success structures.

Daheng Wang , Meng Jiang , Qingkai Zeng , Zachary Eberhart , Nitesh Chawla
Learning Representations Of Ultrahigh-dimensional Data For Random Distance-based Outlier Detection

This paper studies the problem of Learning expressive low-dimensional representations of ultrahigh-dimensional data.

Guansong Pang , Longbing Cao , Ling Chen , Huan Liu
Representation and Embedding II
Paper Name Author(s)
Deep Recursive Network Embedding With Regular Equivalence

This paper studies Network embedding. The authors propose a new approach named Deep Recursive Network Embedding (DRNE) to learn network embeddings with regular equivalence.

Ke Tu , Peng Cui , Xiao Wang , Philip S. Yu , Wenwu Zhu
Arbitrary-Order Proximity Preserved Network Embedding

This paper studies network embedding. the authors propose AROPE (arbitrary-order proximity preserved embedding), a novel network embedding method based on SVD framework.

Ziwei Zhang , Peng Cui , Xiao Wang , Jian Pei , Xuanrong Yao , Wenwu Zhu
NetWalk: A Flexible Deep Embedding Approach For Anomaly Detection In Dynamic Networks

This paper studies Massive and dynamic networks . The authors propose a novel approach, NetWalk, for anomaly detection in dynamic networks by learning network representations which can be updated dynamically as the network evolves.

Wenchao Yu , Wei Cheng , Charu Aggarwal , Kai Zhang , Haifeng Chen , Wei Wang
Embedding Temporal Network Via Neighborhood Formation

This paper studies network embedding. The authors introduce the concept of neighborhood formation sequence to describe the evolution of a node, where temporal excitation effects exist between neighbors in the sequence, and propose a Hawkes process based Temporal Network Embedding (HTNE) method.

Yuan Zuo , Guannan Liu , Hao Lin , Jia Guo , Xiaoqian Hu , Junjie Wu
Finding Similar Exercises With A Unified Semantic Representation

This paper studies the problem of finding similar exercises in online education systems,. The authors develop a novel Multimodal Attention-based Neural Network (MANN) framework for finding similar exercises in large-scale online education systems by learning a unified semantic representation from the heterogenous data.

Qi Liu , Zai Huang , Zhenya Huang , Chuanren Liu , Enhong Chen , Yu Su , Guoping Hu
Hierarchical Taxonomy Aware Network Embedding

This paper studies Network embedding. Incorporating the hierarchical taxonomy into network embedding poses a great challenge. The authors propose NetHiex, a NETwork embedding model that captures the latent HIErarchical taXonomy.The whole model is unified within a nonparametric probabilistic framework.

Jianxin Ma , Peng Cui , Xiao Wang , Wenwu Zhu
Paper Name Author(s)
A Dynamic Pipeline For Spatio-Temporal Fire Risk Prediction

Recent high-profile fire incidents in cities around the world have highlighted gaps in fire risk reduction efforts.the authors developed a predictive risk framework for all 20,636 commercial properties in Pittsburgh, based on time-varying data from a variety of municipal agencies.

Bhavkaran Singh Walia , Qianyi Hu , Jeffrey Chen , Fangyan Chen , Jessica Lee , Nathan Kuo , Palak Narang , Jason Batts , Geoffrey Arnold , Michael Madaio
Detecting Spacecraft Anomalies Using LSTMs And Nonparametric Dynamic Thresholding

This paper studies spacecraft monitoring systems. The authors demonstrate the effectiveness of Long Short-Term Memory (LSTMs) networks, a type of Recurrent Neural Network (RNN), which uses expert-labeled telemetry anomaly data from the Soil Moisture Active Passive (SMAP) satellite and the Mars Science Laboratory (MSL) rover, Curiosity.

Kyle Hundman , Valentino Constantinou , Christopher Laporte , Ian Colwell , Tom Soderstrom
ActiveRemediation: The Search For Lead Pipes In Flint, Michigan

The authors detail their ongoing work in Flint, Michigan to detect pipes made of lead and other hazardous metals.

Jacob Abernethy , Alex Chojnacki , Arya Farahi , Eric Schwartz , Jared Webb
Explaining Aviation Safety Incidents Using Deep Temporal Multiple Instance Learning

This paper deals with aviation safety. The authors propose a precursor mining algorithm that identifies events in the multidimensional time series that are correlated with the safety incident.

Vijay Manikandan Janakiraman
Semi-supervised and Transfer Learning
Paper Name Author(s)
Learning Dynamics Of Decision Boundaries Without Additional Labeled Data

The authors propose a method for learning the dynamics of the decision boundary to maintain classification performance without additional labeled data.

Atsutoshi Kumagai , Tomoharu Iwata
Towards Mitigating The Class-Imbalance Problem For Partial Label Learning

This paper studies Partial label (PL) learning. To mitigate the negative influence of class-imbalance to partial label learning, a novel class-imbalance aware approach named CIMAP is proposed by adapting over-sampling techniques for handling PL training examples.

Jing Wang , Min-Ling Zhang
Learning Adversarial Networks For Semi-Supervised Text Classification Via Policy Gradient

The paper studies Semi-supervised learning. The authors reformulate the semi-supervised learning as a model-based reinforcement learning problem and propose an adversarial networks based framework. The authors propose a recurrent neural network based model for semi-supervised text classification.

Yan Li , Jieping Ye
Scalable Active Learning By Approximated Error Reduction

This paper studies the problem of active learning for multi-class classification on large-scale datasets. This paper proposes a novel query selection criterion called approximated error reduction (AER).

Weijie Fu , Meng Wang , Shijie Hao , Xindong Wu
Multi-view Adversarially Learned Inference For Cross-domain Joint Distribution Matching

Many important data mining problems can be modeled as learning a (bidirectional) multidimensional mapping between two data domains. The authors propose a multi-view adversarially learned inference (ALI) model, termed as MALI.

Changying Du , Changde Du , Xingyu Xie , Chen Zhang , Hao Wang
Supervised Learning I
Paper Name Author(s)
Feedback-Guided Anomaly Discovery Via Online Optimization

This paper is about Anomaly detectors . In this paper, the authors study how to reduce the analyst’s effort by incorporating their feedback about whether the anomalies they investigate are of interest or not.

Md Amran Siddiqui , Alan Fern , Thomas Dietterich , Ryan Wright , Alec Theriault , David Archer
Modeling Task Relationships In Multi-task Learning With Multi-gate Mixture-of-Experts

In this work, the authors propose a novel multi-task learning approach, Multi-gate Mixture-of-Experts (MMoE), which explicitly learns to model task relationships from data.

Jiaqi Ma , Zhe Zhao , Xinyang Yi , Jilin Chen , Lichan Hong , Ed Chi
Unlearn What You Have Learned: Adaptive Crowd Teaching With Exponentially Decayed Memory Learners

In this paper, the authors address a different problem of adaptive crowd teaching, which is a sub-area of machine teaching in the context of crowdsourcing. They propose an adaptive teaching framework named JEDI to construct the personalized optimal teaching set for the crowdsourcing workers.

Yao Zhou , Arun Reddy Nelakurthi , Jingrui He
Complex Object Classification: A Multi-Modal Multi-Instance Multi-Label Deep Network With Optimal Tr

This paper studies multiple modal representations. The authors propose a novel Multi-modal Multi-instance Multi-label Deep Network (M3DN), which learns the label prediction and exploits label correlation simultaneously based on the Optimal Transport,

Yang Yang , Yi-Feng Wu , De-Chuan Zhan , Zhi-Bin Liu , Yuan Jiang
Calibrated Multi-Task Learning

This paper proposes a novel algorithm, named Non-Convex Calibrated Multi-Task Learning (NC-CMTL), for learning multiple related regression tasks jointly.

Feiping Nie , Zhanxuan Hu , Xuelong Li
Supervised Learning II
Paper Name Author(s)
Adversarial Detection With Model Interpretation

The authors propose to investigate whether model interpretation could potentially help adversarial detection.

Ninghao Liu , Hongxia Yang , Xia Hu
Training Big Random Forests With Little Resources

This paper studies the problem of building random forests on large datasets.The authors propose a simple yet effective framework that allows to efficiently construct ensembles of huge trees for hundreds of millions or even billions of training instances using a cheap desktop computer with commodity hardware.

Fabian Gieseke , Christian Igel
Stable Prediction Across Unknown Environments

In many important machine learning applications, the training distribution used to learn a probabilistic classifier differs from the distribution on which the classifier will be used to make predictions. The authors propose a novel Deep Global Balancing Regression (DGBR) algorithm to jointly optimize a deep auto-encoder model for feature selection and a global balancing model for stable prediction across unknown environments.

Kun Kuang , Peng Cui , Susan Athey , Ruoxuan Xiong , Bo Li
R2SDH: Robust Rotated Supervised Discrete Hashing

This paper studies Learning-based hashing . In this paper, the authors propose a learning-based hashing algorithm called “Robust Rotated Supervised Discrete Hashing” (R 2 SDH), by extending the previous work on “Supervised Discrete Hashing” (SDH).

Jie Gui , Ping Li
A Treatment Engine By Predicting Next-Period Prescriptions

This paper studies Electronic Medical Records (EMRs). This paper is aimed at developing a treatment engine, which learns from historical EMR data and provides a patient with next-period prescriptions based on disease conditions, laboratory results, and treatment records of the patient.

Bo Jin , Haoyu Yang , Leilei Sun , Chuanren Liu , Yue Qu , Jianing Tong
Risk Prediction On Electronic Healthcare Records With Prior Medical Knowledge

This paper studies Predicting the risk of potential diseases from Electronic Health Records (EHR). The authors propose a novel and general framework called PRIME for risk prediction task, which can successfully incorporate discrete prior medical knowledge into all of the state-of-the-art predictive models using posterior regularization technique.

Fenglong Ma , Jing Gao , Qiuling Suo , Quanzeng You , Jing Zhou , Aidong Zhang
Temporal and Spatial Data Mining I
Paper Name Author(s)
You Are How You Drive: Peer And Temporal-Aware Representation Learning For Driving Behavior Analysis

This paper studies the problem of Analyzing driving behavior. The authors develop a Peer and Temporal-Aware Representation Learning based framework (PTARL) for driving behavior analysis with GPS trajectory data.

Pengyang Wang , Yanjie Fu , Jiawei Zhang , Pengfei Wang , Yu Zheng , Charu Aggarwal
Decoupled Learning For Factorial Marked Temporal Point Processes

This paper presents a factorial marked temporal point process model and presents efficient learning methods.

Weichang Wu , Junchi Yan , Xiaokang Yang , Hongyuan Zha
A Dual Markov Chain Topic Model For Dynamic Environments

The abundance of digital text has led to extensive research on topic models that reason about documents using latent representations. This paper introduces the DM-DTM, a dual Markov chain dynamic topic model, for characterizing a corpus that evolves over time.

Ayan Acharya , Joydeep Ghosh , Mingyuan Zhou
StockAssIstant: A Stock AI Assistant For Reliability Modeling Of Stock Comments

This paper studies Stock comments from analysts.this paper provides a solution called StockAssIstant for modeling the reliability of stock comments by considering multiple factors.

Chen Zhang , Hao Wang , Changying Du , Yijun Wang , Can Chen , Hongzhi Yin
Exploring The Urban Region-of-Interest Through The Analysis Of Online Map Search Queries

This paper studies Urban Region-of-Interest (ROI) . The authors propose a systematic study on ROI analysis through mining the large-scale online map query logs, which provides a new data-driven research paradigm for ROI detection and profiling.

Ying Sun , Hengshu Zhu , Fuzhen Zhuang , Jingjing Gu , Qing He
Temporal and Spatial Data Mining II
Paper Name Author(s)
Geographical Hidden Markov Tree For Flood Extent Mapping

This paper studies Flood extent mapping. The authors propose geographical hidden Markov tree, a probabilistic graphical model that generalizes the common hidden Markov model from a one dimensional sequence to a two dimensional map.

Miao Xie , Zhe Jiang , Arpan Man Sainju
REST: A Reference-based Framework For Spatio-temporal Trajectory Compression

To address the computational issue caused by the large number of combinations of reference trajectories that may exist for resembling a given trajectory, the authors propose efficient greedy algorithms that run in the blink of an eye and dynamic programming algorithms that can achieve the optimal compression ratio.

Yan Zhao , Shuo Shang , Yu Wang , Bolong Zheng , Quoc Viet Hung Nguyen , Kai Zheng
Efficient Similar Region Search With Deep Metric Learning

The authors study the problem of searching similar regions given a user specified query region. They propose a novel solution equipped by (1) a deep learning approach to learning the similarity that considers both object attributes and the relative locations between objects; and (2) an efficient branch and bound search algorithm for finding top-N similar regions.

Yiding Liu , Kaiqi Zhao , Gao Cong
Dynamic Bike Reposition: A Spatio-Temporal Reinforcement Learning Approach

This paper studies Bike-sharing systems.The authors propose a spatio-temporal reinforcement learning based bike reposition model to deal with this problem.

Yexin Li , Yu Zheng , Qiang Yang
Simultaneous Urban Region Function Discovery And Popularity Estimation Via An Infinite Urbanization

The authors propose the infinite urbanization process (IUP) model for simultaneous urban region function discovery and region popularity prediction.

Bang Zhang , Lelin Zhang , Ting Guo , Yang Wang , Fang Chen
Texts, Images and Videos
Paper Name Author(s)
Collaborative Deep Metric Learning For Video Understanding

The goal of video understanding is to develop algorithms that enable machines understand videos at the level of human experts. The authors propose a deep network that embeds videos using their audio-visual content, onto a metric space which preserves video-to-video relationships.

Joonseok Lee , Sami Abu-El-Haija , Balakrishnan Varadarajan , Paul Natsev
Corpus Conversion Service: A Machine Learning Platform To Ingest Documents At Scale.

The authors present a modular, cloud-based platform to ingest documents at scale.

Peter W J Staar , Michele Dolfi , Christoph Auer , Costas Bekas
Rosetta: Large Scale System For Text Detection And Recognition In Images

In this paper the authors present a deployed, scalable optical character recognition (OCR) system, which the authors call Rosetta , designed to process images uploaded daily at Facebook scale.

Fedor Borisyuk , Albert Gordo , Viswanath Sivakumar
Rare Query Expansion Through Generative Adversarial Networks In Search Advertising

The authors explore using GAN to generate bid keywords directly from query in sponsored search ads selection, especially for rare queries.

Mu-Chu Lee , Bin Gao , Ruofei Zhang
Name Disambiguation In AMiner: Clustering, Maintenance, And Human In The Loop

AMiner is a free online academic search and mining system, having collected more than 130,000,000 researcher profiles and over 200,000,000 papers from multiple publication databases.

Yutao Zhang , Fanjin Zhang , Peiran Yao , Jie Tang
Unsupervised Learning I
Paper Name Author(s)
TruePIE: Discovering Reliable Patterns In Pattern-Based Information Extraction

This paper studies Pattern-based methods. The authors propose a novel method, called TruePIE, that finds reliable patterns which can extract not only related but also correct information.

Qi Li , Meng Jiang , Xikun Zhang , Meng Qu , Timothy Hanratty , Jing Gao , Jiawei Han
TaxoGen: Constructing Topical Concept Taxonomy By Adaptive Term Embedding And Clustering

This paper studies Taxonomy construction. The authors propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms.

Chao Zhang , Fangbo Tao , Xiusi Chen , Jiaming Shen , Meng Jiang , Brian Sadler , Michelle Vanni , Jiawei Han
Scalable K-Means Clustering Via Lightweight Coresets

Coresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. The authors provide a single algorithm to construct lightweight coresets for k -means clustering as well as soft and hard Bregman clustering.

Olivier Bachem , Mario Lucic , Andreas Krause
Discovering Non-Redundant K-means Clusterings In Optimal Subspaces

This paper studies non-redundant clustering. The authors show that non-redundant k-means-like clusterings may exist in different, arbitrarily oriented subspaces of the high-dimensional space.

Dominik Mautz , Wei Ye , Claudia Plant , Christian B
TextTruth: An Unsupervised Approach To Discover Trustworthy Information From Multi-Sourced Text Data

The authors propose a novel truth discovery method, named “TextTruth”, which jointly groups the keywords extracted from the answers of a specific question into multiple interpretable factors, and infers the trustworthiness of both answer factors and answer providers.

Hengtong Zhang , Yaliang Li , Fenglong Ma , Jing Gao , Lu Su
Unsupervised Learning II
Paper Name Author(s)
Scalable Spectral Clustering Using Random Binning Features

This paper studies Spectral clustering. In this paper, the authors present a novel scalable spectral clustering method using Random Binning features (RB) to simultaneously accelerate both similarity graph construction and the eigendecomposition.

Lingfei Wu , Pin-Yu Chen , Ian En-Hsu Yen , Fangli Xu , Yinglong Xia , Charu Aggarwal
Model-based Clustering Of Short Text Streams

This paper studies Short text stream clustering. In this paper, the authors propose a model-based short text stream clustering algorithm (MStream) which can deal with the concept drift problem and sparsity problem naturally.

Jianhua Yin , Daren Chao , Zhongkun Liu , Wei Zhang , Xiaohui Yu , Jianyong Wang
MiSoSouP: Mining Interesting Subgroups With Sampling And Pseudodimension

The authors present MiSoSouP, a suite of algorithms for extracting high-quality approximations of the most interesting subgroups, according to different interestingness measures, from a random sample of a transactional dataset.

Matteo Riondato , Fabio Vandin
Spectral Clustering Of Large-scale Data By Directly Solving Normalized Cut

This paper studies spectral clustering algorithms. The authors propose a new optimization algorithm, namely Direct Normalized Cut (DNC), to directly optimize the normalized cut model.

Xiaojun Chen , Weijun Hong , Feiping Nie , Dan He , Min Yang , Joshua Z. Huang
Multiview Clustering Via Adaptively Weighted Procrustes

In this paper, the authors make a multiview extension of the spectral rotation technique raised in single view spectral clustering research.Other than that, the authors propose an Adaptively Weighted Procrustes (AWP) approach to overcome the aforementioned deficiency.

Feiping Nie , Lai Tian , Xuelong Li
Urban Planning
Paper Name Author(s)
Where Will Dockless Shared Bikes Be Stacked?—- Parking Hotspots Detection In A New City

this paper studies the problem of detecting parking hotspots in a new city where no dockless shared bike has been deployed.The authors extract useful features from multi-source urban data and introduce a novel domain adaption network for transferring hotspots knowledge learned from one city with shared bikes to a new city.

Zhaoyang Liu , Yanyan Shen , Yanmin Zhu
Towards Station-level Demand Prediction For Effective Rebalancing In Bike-sharing Systems

This paper studies bike sharing systems. The authors focus on predicting the hourly demand for demand rentals and returns at each station of the system.

Pierre Hulot , Daniel Aloise , Sanjay Jena
Du-Parking: Spatio-Temporal Big Data Tells You Realtime Parking Availability

This paper studies Realtime parking availability information. The authors propose a deep-learning-based approach, called Du-Parking, which consists of three major components modeling temporal closeness, period and current general influence, respectively.

Yuecheng Rong , Zhimian Xu , Ruibo Yan , Xu Ma
Detecting Illegal Vehicle Parking Events Using Sharing Bikes’ Trajectories

This paper studies Illegal vehicle parking. The authors design a ubiquitous illegal parking detection system,.

Tianfu He , Jie Bao , Ruiyuan Li , Sijie Ruan , Yanhua Li , Chao Tian , Yu Zheng
WattHome: Identifying Energy-Inefficient Homes At City-scale

The authors present present WattHome, a data-driven approach to identify the least energy efficient buildings from a large population of buildings in a city or a region.

Srinivasan Iyengar , Stephen Lee , David Irwin , Prashant Shenoy , Benjamin Weil