Welcome to this issue of the Proceedings of the ACM on Management of Data (Volume 1, Issue 4 (SIGMOD)). While this issue has papers from the SIGMOD track, PACMMOD will soon also have issues with papers from the newly created PODS track. Out of 189 submissions to the round of reviewing for the PACMMOD SIGMOD track whose submission deadline was April 15, 2023, a total of 49 articles were accepted, and are presented in this issue.
Data warehouses support the analysis of historical data. This often involves aggregation over a period of time. Furthermore, data is typically incorporated in the warehouse in the increasing order of a time attribute, e.g., date of sale or time of a temperature measurement. In this paper we propose a framework to take advantage of this append only nature of updates due to a time attribute. The framework allows us to integrate large amounts of new data into the warehouse and generate historical summaries efficiently. Query and update costs are virtually independent from the extent of the data set in the time dimension, making our framework an attractive aggregation approach for append-only data streams. A specific instantiation of the general approach is developed for MOLAP data cubes, involving a new data structure for append-only arrays with pre-aggregated values. Our framework is applicable to point data and data with extent, e.g., hyper-rectangles.
Modern databases increasingly integrate new kinds of information, such as multimedia information in the form of image, video, and audio data. Both the dimensionality and the amount of data that need to be processed is increasing rapidly, increasing the demand for the efficient retrieval of large amounts of multi-dimensional data. Declustering techniques for multi-disk architectures have been effectively used for storage. In this paper, we first establish that besides exploiting the parallelism, a careful organization of each disk must be considered for fast searching. We introduce the notion of page allocation and data space mapping which can be used to organize and retrieve multidimensional data. We develop these notions based on three different partitioning strategies: regular grid partitioning, concentric hypercubes and hyperpyramids. We develop techniques that satisfy efficient retrieval by optimizing the number of buckets retrieved by the query, disk arm movement and I/O parallelism. We prove that concentric hypercube-based mapping satisfies the optimal clustering and optimal parallelism. We develop a technique based on hyperpyramid partitioning that reduces the number of buckets retrieved by the query and has efficient inter- and intra-disk organizations. We evaluate the performance of proposed techniques by comparing them with the current approaches. The new techniques lead to very significant improvement over the existing techniques, and result in fast retrieval of multi-dimensional data.
Cloud computing has emerged as a preferred platform for deploying scalable web-applications. With the growing scale of these applications and the data associated with them, scalable data management systems form a crucial part of the cloud infrastructure. Key-Value stores -- such as Bigtable, PNUTS, Dynamo, and their open source analogues-- have been the preferred data stores for applications in the cloud. In these systems, data is represented as Key-Value pairs, and atomic access is provided only at the granularity of single keys. While these properties work well for current applications, they are insufficient for the next generation web applications -- such as online gaming, social networks, collaborative editing, and many more -- which emphasize collaboration. Since collaboration by definition requires consistent access to groups of keys, scalable and consistent multi key access is critical for such applications. We propose the Key Group abstraction that defines a relationship between a group of keys and is the granule for on-demand transactional access. This abstraction allows the Key Grouping protocol to collocate control for the keys in the group to allow efficient access to the group of keys. Using the Key Grouping protocol, we design and implement G-Store which uses a key-value store as an underlying substrate to provide efficient, scalable, and transactional multi key access. Our implementation using a standard key-value store and experiments using a cluster of commodity machines show that G-Store preserves the desired properties of key-value stores, while providing multi key access functionality at a very low overhead.
Cloud computing becomes a very successful paradigm for data computing and storage. Increasing concerns about data security and privacy in the cloud, however, have arisen. Ensuring security and privacy for data management and query processing in the cloud is critical for better and broader uses of the cloud. This tutorial covers recent research on cloud security and privacy, while focusing on the works that protect data confidentiality and query access privacy for sensitive data being stored and queried in the cloud. We provide a comprehensive study of state-of-the-art schemes and techniques for protecting data confidentiality and access privacy, and explain their tradeoffs in security, privacy, functionality and performance.
Rapidly improving computing and networking technology enables enterprises to collect data from virtually all its business units. The main challenge today is to extract useful information from an overwhelmingly large amount of raw data. To support complex analysis queries, data warehouses were introduced. They manage data, which is extracted from the different operational databases and from external data sources, and they are optimized for fast query processing. For modern data warehouses, it is common to manage Terabytes of data. According to a recent survey by the Winter Corporation (2003), for instance, the decision support database of SBC reached a size of almost 25 Terabytes, up from 10.5 Terabytes in 2001 (Winter Corporation, 2001).
The authors propose an approach to execute transactions in heterogeneous distributed databases. Instead of using the traditional approach of executing global transaction by remotely accessing distributed data, they propose that transactions be executed locally, and data is dynamically migrated to the appropriate sites. Thus, they eliminate the need for global transactions. Since there are no global transactions, the problem of distributed commitment does not arise. This is an important issue related to database recovery that is often ignored by protocols for transaction processing in heterogeneous databases. A special protocol is executed for migrating data objects. They present a protocol for localizing the access of a data object.< >