Mobile devices with touch capabilities often utilize touchscreen keyboards. However, due to the lack of tactile feedback, users often have to switch their focus of attention between the keyboard area, where they must locate and click the correct keys, and the text area, where they must verify the typed output. This can impair user experience and performance. In this paper, we examine multimodal feedback and guidance signals that keep users’ focus of attention in the keyboard area but also provide the kind of information users would normally get in the text area. We first conducted a usability study to assess and refine the user experience of these signals and their combinations. Then we evaluated whether those signals which users preferred could also improve typing performance in a controlled experiment. One combination of multimodal signals significantly improved typing speed by 11%, reduced keystrokes-per-character by 8%, and reduced backspaces by 28%. We discuss design implications.
Probabilistic topic models, such as PLSA and LDA, are gaining popularity in many fields due to their high-quality results. Unfortunately, existing topic models suffer from two drawbacks: (1) model complexity and (2) disjoint topic groups. That is, when a topic model involves multiple entities (such as authors, papers, conferences, and institutions) and they are connected through multiple relationships, the model becomes too difficult to analyze and often leads to in-tractable solutions. Also, different entity types are classified into disjoint topic groups that are not directly comparable, so it is difficult to see whether heterogeneous entities (such as authors and conferences) are on the same topic or not (e.g., are Rakesh Agrawal and KDD related to the same topic?).
Mental illness is one of the most undertreated health problems worldwide. Previous work has shown that there are remarkably strong cues to mental illness in short samples of the voice. These cues are evident in severe forms of illness, but it would be most valuable to make earlier diagnoses from a richer feature set. Furthermore there is an abstraction gap between these voice cues and the diagnostic cues used by practitioners. We believe that by closing this gap, we can build more effective early diagnostic systems for mental illness. In order to develop improved monitoring, we need to translate the high-level cues used by practitioners into features that can be analyzed using signal processing and machine learning techniques. In this paper we describe the elicitation process that we used to tap the practitioners' knowledge. We borrow from both AI (expert systems) and HCI (contextual inquiry) fields in order to perform this knowledge transfer. The paper highlights an unusual and promising role for HCI - the analysis of interaction data for health diagnosis.
Information extraction and user intention identification are central topics in modern query understanding and recommendation systems. In this paper, we propose DeepProbe, a generic information-directed interaction framework which is built around an attention-based sequence to sequence (seq2seq) recurrent neural network. DeepProbe can rephrase, evaluate, and even actively ask questions, leveraging the generative ability and likelihood estimation made possible by seq2seq models. DeepProbe makes decisions based on a derived uncertainty (entropy) measure conditioned on user inputs, possibly with multiple rounds of interactions. Three applications, namely a rewritter, a relevance scorer and a chatbot for ad recommendation, were built around DeepProbe, with the first two serving as precursory building blocks for the third. We first use the seq2seq model in DeepProbe to rewrite a user query into one of standard query form, which is submitted to an ordinary recommendation system. Secondly, we evaluate DeepProbe's seq2seq model-based relevance scoring. Finally, we build a chatbot prototype capable of making active user interactions, which can ask questions that maximize information gain, allowing for a more efficient user intention idenfication process. We evaluate first two applications by 1) comparing with baselines by BLEU and AUC, and 2) human judge evaluation. Both demonstrate significant improvements compared with current state-of-the-art systems, proving their values as useful tools on their own, and at the same time laying a good foundation for the ongoing chatbot application.
This paper investigates the usefulness of segmental phonemedynamics for classification of speaking styles. We modeled transition details based on the phoneme sequences emitted by a speech recognizer, using data obtained from a recording of 39 depressed patients with 7 different speaking styles normal, pressured, slurred, stuttered, flat, slow and fast speech. We designed and compared two set of phoneme models: a language model treating each phoneme as a word unit (one for each style) and a context-dependent phoneme duration model based on Gaussians for each speaking style considered. The experiments showed that language modeling at the phoneme level performed better than the duration model. We also found that better performance can be obtained by user normalization. To see the complementary effect of the phoneme-based models, the classifiers were combined at a decision level with a Hidden Markov Model (HMM) classifier built from spectral features. The improvement was 5.7% absolute (10.4% relative), reaching 60.3% accuracy in 7-class and 71.0% in 4-class classification.
This paper develops the MUFIN technique for extreme classification (XC) tasks with millions of labels where datapoints and labels are endowed with visual and textual descriptors. Applications of MUFIN to product-to-product recommendation and bid query prediction over several millions of products are presented. Contemporary multi-modal methods frequently rely on purely embedding-based methods. On the other hand, XC methods utilize classifier architectures to offer superior accuracies than embedding only methods but mostly focus on text-based categorization tasks. MUFIN bridges this gap by reformulating multi-modal categorization as an XC problem with several millions of labels. This presents the twin challenges of developing multi-modal architectures that can offer embeddings sufficiently expressive to allow accurate categorization over millions of labels; and training and inference routines that scale logarithmically in the number of labels. MUFIN develops an architecture based on cross-modal attention and trains it in a modular fashion using pre-training and positive and negative mining. A novel product-to-product recommendation dataset MM-AmazonTitles-300K containing over 300K products was curated from publicly available amazon.com listings with each product endowed with a title and multiple images. On the all datasets MUFIN offered at least 3% higher accuracy than leading text-based, image-based and multi-modal techniques. Code for MUFIN is available at https://github.com/Extreme-classification/MUFIN
The human voice encodes a wealth of information about emotion, mood, stress, and mental state. With mobile phones (one of the mostly used modules in body area networks) this information is potentially available to a host of applications and can enable richer, more appropriate, and more satisfying hu
Mobile devices with touch capabilities often utilize touchscreen keyboards. However, due to the lack of tactile feedback, users often have to switch their focus of attention between the keyboard area, where they must locate and click the correct keys, and the text area, where they must verify the typed output. This can impair user experience and performance. In this paper, we examine multimodal feedback and guidance signals that keep users' focus of attention in the keyboard area but also provide the kind of information users would normally receive in the text area. We evaluated whether combinations of multimodal signals could improve typing performance in a controlled experiment. One combination reduced keystrokes-per-character by 8% and correction backspaces by 28%.