Divyakant Agrawal

University of California, Santa Barbara

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Amr El Abbadi

University of California, Santa Barbara

396

K. Selçuk Candan

Arizona State University

Sudipto Das

Amazon (United States)

Mohammad Javad Amiri

University of Tehran

Ambuj K. Singh

University of California, Santa Barbara

Wen‐Syan Li

Seoul National University

Sujaya Maiyya

University of California, Santa Barbara

Wang-Pin Hsiung

NEC (United States)

Oliver Po

NEC (United States)

Sunil Prabhakar

Purdue University West Lafayette

Cooperative Institutions

University of California, Santa Barbara

115

Google (United States)

International Superconductivity Technology Center

University of Otago

University of California System

IBM (United States)

Centre National de la Recherche Scientifique

NEC (Japan)

Microsoft (United States)

National University of Singapore

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

PACMMOD Volume 1 Issue 4: Editorial

Proceedings of the ACM on Management of Data (2023)

Divyakant Agrawal Alexandra Meliou S. Sudarshan

Welcome to this issue of the Proceedings of the ACM on Management of Data (Volume 1, Issue 4 (SIGMOD)). While this issue has papers from the SIGMOD track, PACMMOD will soon also have issues with papers from the newly created PODS track. Out of 189 submissions to the round of reviewing for the PACMMOD SIGMOD track whose submission deadline was April 15, 2023, a total of 49 articles were accepted, and are presented in this issue.

10.1145/3626709

Cite

Citations (0)

Efficient integration and aggregation of historical information

Mirek Riedewald Divyakant Agrawal Amr El Abbadi

Data warehouses support the analysis of historical data. This often involves aggregation over a period of time. Furthermore, data is typically incorporated in the warehouse in the increasing order of a time attribute, e.g., date of sale or time of a temperature measurement. In this paper we propose a framework to take advantage of this append only nature of updates due to a time attribute. The framework allows us to integrate large amounts of new data into the warehouse and generate historical summaries efficiently. Query and update costs are virtually independent from the extent of the data set in the time dimension, making our framework an attractive aggregation approach for append-only data streams. A specific instantiation of the general approach is developed for MOLAP data cubes, involving a new data structure for append-only arrays with pre-aggregated values. Our framework is applicable to point data and data with extent, e.g., hyper-rectangles.

Append

10.1145/564691.564694

Cite

Citations (16)

Engineering High Performance Database-Driven E-commerce Web Sites through Dynamic Content Caching

Lecture notes in computer science (2001)

Wen‐Syan Li K. Selçuk Candan Wang -Pin Hsiung Oliver Po Divyakant Agrawal

Scope (computer science)

Web content

10.1007/3-540-44700-8_24

Cite

Citations (2)

Parallelizing Skyline Queries for Scalable Distribution

Lecture notes in computer science (2006)

Ping Wu Caijie Zhang Ying Feng Ben Y. Zhao Divyakant Agrawal

Skyline

10.1007/11687238_10

Cite

Citations (164)

Clustering declustered data for efficient retrieval

Hakan Ferhatosmanoğlu Divyakant Agrawal Amr El Abbadi

Modern databases increasingly integrate new kinds of information, such as multimedia information in the form of image, video, and audio data. Both the dimensionality and the amount of data that need to be processed is increasing rapidly, increasing the demand for the efficient retrieval of large amounts of multi-dimensional data. Declustering techniques for multi-disk architectures have been effectively used for storage. In this paper, we first establish that besides exploiting the parallelism, a careful organization of each disk must be considered for fast searching. We introduce the notion of page allocation and data space mapping which can be used to organize and retrieve multidimensional data. We develop these notions based on three different partitioning strategies: regular grid partitioning, concentric hypercubes and hyperpyramids. We develop techniques that satisfy efficient retrieval by optimizing the number of buckets retrieved by the query, disk arm movement and I/O parallelism. We prove that concentric hypercube-based mapping satisfies the optimal clustering and optimal parallelism. We develop a technique based on hyperpyramid partitioning that reduces the number of buckets retrieved by the query and has efficient inter- and intra-disk organizations. We evaluate the performance of proposed techniques by comparing them with the current approaches. The new techniques lead to very significant improvement over the existing techniques, and result in fast retrieval of multi-dimensional data.

10.1145/319950.320026

Cite

Citations (9)

G-Store

Sudipto Das Divyakant Agrawal Amr El Abbadi

Cloud computing has emerged as a preferred platform for deploying scalable web-applications. With the growing scale of these applications and the data associated with them, scalable data management systems form a crucial part of the cloud infrastructure. Key-Value stores -- such as Bigtable, PNUTS, Dynamo, and their open source analogues-- have been the preferred data stores for applications in the cloud. In these systems, data is represented as Key-Value pairs, and atomic access is provided only at the granularity of single keys. While these properties work well for current applications, they are insufficient for the next generation web applications -- such as online gaming, social networks, collaborative editing, and many more -- which emphasize collaboration. Since collaboration by definition requires consistent access to groups of keys, scalable and consistent multi key access is critical for such applications. We propose the Key Group abstraction that defines a relationship between a group of keys and is the granule for on-demand transactional access. This abstraction allows the Key Grouping protocol to collocate control for the keys in the group to allow efficient access to the group of keys. Using the Key Grouping protocol, we design and implement G-Store which uses a key-value store as an underlying substrate to provide efficient, scalable, and transactional multi key access. Our implementation using a standard key-value store and experiments using a cluster of commodity machines show that G-Store preserves the desired properties of key-value stores, while providing multi key access functionality at a very low overhead.

Group key

10.1145/1807128.1807157

Cite

Citations (230)

Secure and privacy-preserving database services in the cloud

2022 IEEE 38th International Conference on Data Engineering (ICDE) (2013)

Divyakant Agrawal Amr El Abbadi Shiyuan Wang

Cloud computing becomes a very successful paradigm for data computing and storage. Increasing concerns about data security and privacy in the cloud, however, have arisen. Ensuring security and privacy for data management and query processing in the cloud is critical for better and broader uses of the cloud. This tutorial covers recent research on cloud security and privacy, while focusing on the works that protect data confidentiality and query access privacy for sensitive data being stored and queried in the cloud. We provide a comprehensive study of state-of-the-art schemes and techniques for protecting data confidentiality and access privacy, and explain their tradeoffs in security, privacy, functionality and performance.

Cloud database

10.1109/icde.2013.6544921

Cite

Citations (23)

Dynamic Multidimensional Data Cubes for Interactive Analysis of Massive Datasets

IGI Global eBooks (2005)

Mirek Riedewald Divyakant Agrawal

Rapidly improving computing and networking technology enables enterprises to collect data from virtually all its business units. The main challenge today is to extract useful information from an overwhelmingly large amount of raw data. To support complex analysis queries, data warehouses were introduced. They manage data, which is extracted from the different operational databases and from external data sources, and they are optimized for fast query processing. For modern data warehouses, it is common to manage Terabytes of data. According to a recent survey by the Winter Corporation (2003), for instance, the decision support database of SBC reached a size of almost 25 Terabytes, up from 10.5 Terabytes in 2001 (Winter Corporation, 2001).

Terabyte

Online analytical processing

Data cube

10.4018/978-1-59140-553-5.ch162

Cite

Citations (1)

Using data migration for heterogeneous databases

Divyakant Agrawal E.A. Abbadi

The authors propose an approach to execute transactions in heterogeneous distributed databases. Instead of using the traditional approach of executing global transaction by remotely accessing distributed data, they propose that transactions be executed locally, and data is dynamically migrated to the appropriate sites. Thus, they eliminate the need for global transactions. Since there are no global transactions, the problem of distributed commitment does not arise. This is an important issue related to database recovery that is often ignored by protocols for transaction processing in heterogeneous databases. A special protocol is executed for migrating data objects. They present a protocol for localizing the access of a data object.< >

Distributed database

10.1109/ims.1991.153712

Cite

Citations (3)

Multiple query optimization in middleware using query teamwork

Software Practice and Experience (2004)

Kevin D. O’Gorman Amr El Abbadi Divyakant Agrawal

Abstract Multiple concurrent queries occur in many database settings. This paper describes the use of middleware as an optimization tool for such queries. Since common subexpressions derive from common data and the data is usually greatest at the source, the middleware exploits the presence of sharable access patterns to underlying data, especially scans of large portions of tables or indexes, in environments where query queuing or batching is an acceptable approach. The results show that simultaneous queries with such sharable accesses have a tendency to form synchronous groups (teams) which benefit each other through the operation of the disk cache, in effect using it as an implicit pipeline. The middleware exploits this tendency by queuing and scheduling the queries to promote this interaction, using an algorithm designed to promote such teamwork. This is implemented as middleware for use with a commercial database engine. The results include tests using the query mix from the TPC Benchmark ™ R, achieving a speed‐up of 2.34 over the default scheduling provided by one database. Other results show that the success depends on the details of the computing environment. Copyright © 2004 John Wiley & Sons, Ltd.

Benchmark (surveying)

10.1002/spe.640

Cite

Citations (11)