Ryan A. Rossi

Adobe Systems (United States)

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Nesreen K. Ahmed

Intel (United States)

114

Sungchul Kim

Adobe Systems (United States)

104

Eunyee Koh

Adobe Systems (United States)

Franck Dernoncourt

Adobe Systems (United States)

Nedim Lipka

Adobe Systems (United States)

Anup Rao

Adobe Systems (United States)

Handong Zhao

Adobe Systems (United States)

Jane Hoffswell

Adobe Systems (United States)

Rong Zhou

Shenzhen University

Hoda Eldardiry

Virginia Tech

Cooperative Institutions

Adobe Systems (United States)

Georgia Institute of Technology

Stanford University

University of Ferrara

Chinese Academy of Sciences

University of Maryland, College Park

Carnegie Mellon University

University of Southern California

Zhejiang University

University of Washington

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Diversify-verify-adapt: Efficient and Robust Retrieval-Augmented Ambiguous Question Answering

arXiv (Cornell University) (2024)

Yeonjun In Sungchul Kim Ryan A. Rossi Md Mehrab Tanjim Tong Yu

The retrieval augmented generation (RAG) framework addresses an ambiguity in user queries in QA systems by retrieving passages that cover all plausible interpretations and generating comprehensive responses based on the passages. However, our preliminary studies reveal that a single retrieval process often suffers from low quality results, as the retrieved passages frequently fail to capture all plausible interpretations. Although the iterative RAG approach has been proposed to address this problem, it comes at the cost of significantly reduced efficiency. To address these issues, we propose the diversify-verify-adapt (DIVA) framework. DIVA first diversifies the retrieved passages to encompass diverse interpretations. Subsequently, DIVA verifies the quality of the passages and adapts the most suitable approach tailored to their quality. This approach improves the QA systems accuracy and robustness by handling low quality retrieval issue in ambiguous questions, while enhancing efficiency.

10.48550/arxiv.2409.02361

Cite

Citations (0)

Machine unlearning via Algorithmic stability

Conference on Learning Theory (2021)

Enayat Ullah Tung Mai Anup Rao Ryan A. Rossi Raman Arora

We study the problem of machine unlearning and identify a notion of algorithmic stability, Total Variation (TV) stability, which we argue, is suitable for the goal of exact unlearning. For convex risk minimization problems, we design TV-stable algorithms based on noisy Stochastic Gradient Descent (SGD). Our key contribution is the design of corresponding efficient unlearning algorithms, which are based on constructing a near-maximal coupling of Markov chains for the noisy SGD procedure. To understand the trade-offs between accuracy and unlearning efficiency, we give upper and lower bounds on excess empirical and populations risk of TV stable algorithms for convex risk minimization. Our techniques generalize to arbitrary non-convex functions, and our algorithms are differentially private as well.

Stochastic Gradient Descent

Minification

Empirical risk minimization

Variation (astronomy)

Source

Cite

Citations (0)

Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage

arXiv (Cornell University) (2013)

Ryan A. Rossi David F. Gleich Assefaw H. Gebremedhin Md. Mostofa Ali Patwary

We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Our method employs a branch and bound strategy with novel and aggressive pruning techniques. For instance, we use the core number of a vertex in combination with a good heuristic clique finder to efficiently remove the vast majority of the search space. In addition, we parallelize the exploration of the search tree. During the search, processes immediately communicate changes to upper and lower bounds on the size of maximum clique, which occasionally results in a super-linear speedup because vertices with large search spaces can be pruned by other processes. We apply the algorithm to two problems: to compute temporal strong components and to compress graphs.

Clique

Clique problem

10.48550/arxiv.1302.6256

Cite

Citations (10)

What if CLIQUE were fast? Maximum Cliques in Information Networks and Strong Components in Temporal Networks

arXiv (Cornell University) (2012)

Ryan A. Rossi David F. Gleich Assefaw H. Gebremedhin Md. Mostofa Ali Patwary

Exact maximum clique finders have progressed to the point where we can investigate cliques in million-node social and information networks, as well as find strongly connected components in temporal networks. We use one such finder to study a large collection of modern networks emanating from biological, social, and technological domains. We show inter-relationships between maximum cliques and several other common network properties, including network density, maximum core, and number of triangles. In temporal networks, we find that the largest temporal strong components have around 20-30% of the vertices of the entire network. These components represent groups of highly communicative individuals. In addition, we discuss and improve the performance and utility of the maximum clique finder itself.

Clique

10.48550/arxiv.1210.5802

Cite

Citations (10)

MMR: Evaluating Reading Ability of Large Multimodal Models

arXiv (Cornell University) (2024)

Jian Chen Ruiyi Zhang Yufan Zhou Ryan A. Rossi Jiuxiang Gu

Large multimodal models (LMMs) have demonstrated impressive capabilities in understanding various types of image, including text-rich images. Most existing text-rich image benchmarks are simple extraction-based question answering, and many LMMs now easily achieve high scores. This means that current benchmarks fail to accurately reflect performance of different models, and a natural idea is to build a new benchmark to evaluate their complex reasoning and spatial understanding abilities. In this work, we propose the Multi-Modal Reading (MMR) benchmark in 11 diverse tasks to evaluate LMMs for text-rich image understanding. MMR is the first text-rich image benchmark built on human annotations with the help of language models. By evaluating several state-of-the-art LMMs, including GPT-4o, it reveals the limited capabilities of existing LMMs underscoring the value of our benchmark.

10.48550/arxiv.2408.14594

Cite

Citations (0)

Fluorescent in situ hybridization with rDNA probes on chromosomes of Acipenser ruthenus and Acipenser naccarii (Osteichthyes Acipenseriformes)

Genome (1999)

Francesco Fontana Massimo Lanfredi Milvia Chicca Leonardo Congiu James Tagliavini

The genes for 28S and 5S rDNA were physically mapped on the chromosomes of two sturgeon species, the sterlet (Acipenser ruthenus, 2n = 118 ± 4) and the Adriatic sturgeon (Acipenser naccarii, 2n = 248 ± 4) by fluorescent in situ hybridization. In the sterlet, the 28S rDNA was located on six chromosomes, four of which actively transcribed, while in the Adriatic sturgeon the 28S rDNA was located on a chromosome number ranging from 10 to 12, eight of which actively transcribed. The 5S rDNA was physically mapped on two chromosomes in the sterlet and on four in the Adriatic sturgeon. A more detailed characterization of the latter karyotype was obtained during this study. All these data are discussed in connection with the ploidy relationships among sturgeon species.Key words: karyotype, ploidy, FISH, 28S and 5S rDNA.

Huso

10.1139/g99-030

Cite

Citations (26)

[Histologic changes of the liver in animals treated with pteoyltriglutamic acid].

PubMed (1952)

Ryan A. Rossi

Source

Cite

Citations (0)

Insight-centric Visualization Recommendation

arXiv (Cornell University) (2021)

Camille Harris Ryan A. Rossi Sana Malik Jane Hoffswell Fan Du

Visualization recommendation systems simplify exploratory data analysis (EDA) and make understanding data more accessible to users of all skill levels by automatically generating visualizations for users to explore. However, most existing visualization recommendation systems focus on ranking all visualizations into a single list or set of groups based on particular attributes or encodings. This global ranking makes it difficult and time-consuming for users to find the most interesting or relevant insights. To address these limitations, we introduce a novel class of visualization recommendation systems that automatically rank and recommend both groups of related insights as well as the most important insights within each group. Our proposed approach combines results from many different learning-based methods to discover insights automatically. A key advantage is that this approach generalizes to a wide variety of attribute types such as categorical, numerical, and temporal, as well as complex non-trivial combinations of these different attribute types. To evaluate the effectiveness of our approach, we implemented a new insight-centric visualization recommendation system, SpotLight, which generates and ranks annotated visualizations to explain each insight. We conducted a user study with 12 participants and two datasets which showed that users are able to quickly understand and find relevant insights in unfamiliar data.

Categorical variable

Rank (graph theory)

10.48550/arxiv.2103.11297

Cite

Citations (10)

Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs

arXiv (Cornell University) (2024)

Mihir Parmar Hanieh Deilamsalehy Franck Dernoncourt Seunghyun Yoon Ryan A. Rossi

Extractive summarization plays a pivotal role in natural language processing due to its wide-range applications in summarizing diverse content efficiently, while also being faithful to the original content. Despite significant advancement achieved in extractive summarization by Large Language Models (LLMs), these summaries frequently exhibit incoherence. An important aspect of the coherent summary is its readability for intended users. Although there have been many datasets and benchmarks proposed for creating coherent extractive summaries, none of them currently incorporate user intent to improve coherence in extractive summarization. Motivated by this, we propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback, offering valuable insights into how to improve coherence in extractive summaries. We utilize this dataset for aligning LLMs through supervised fine-tuning with natural language human feedback to enhance the coherence of their generated summaries. Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (~10% Rouge-L) in terms of producing coherent summaries. We further utilize human feedback to benchmark results over instruction-tuned models such as FLAN-T5 which resulted in several interesting findings. Data and source code are available at https://github.com/Mihir3009/Extract-AI.

10.48550/arxiv.2407.04855

Cite

Citations (0)

Revisiting Role Discovery in Networks: From Node to Edge Roles

arXiv (Cornell University) (2016)

Nesreen K. Ahmed Ryan A. Rossi Theodore L. Willke Rong Zhou

Previous work in network analysis has focused on modeling the mixed-memberships of node roles in the graph, but not the roles of edges. We introduce the edge role discovery problem and present a generalizable framework for learning and extracting edge roles from arbitrary graphs automatically. Furthermore, while existing node-centric role models have mainly focused on simple degree and egonet features, this work also explores graphlet features for role discovery. In addition, we also develop an approach for automatically learning and extracting important and useful edge features from an arbitrary graph. The experimental results demonstrate the utility of edge roles for network analysis tasks on a variety of graphs from various problem domains.

10.48550/arxiv.1610.00844

Cite

Citations (4)