Article Share on Keys for XML Authors: Peter Buneman University of Pennsylvania University of PennsylvaniaView Profile , Susan Davidson University of Pennsylvania University of PennsylvaniaView Profile , Wenfei Fan Temple University Temple UniversityView Profile , Carmem Hara Universidade Federal do Parana, Brazil Universidade Federal do Parana, BrazilView Profile , Wang-Chiew Tan University of Pennsylvania University of PennsylvaniaView Profile Authors Info & Claims WWW '01: Proceedings of the 10th international conference on World Wide WebMay 2001 Pages 201–210https://doi.org/10.1145/371920.371984Online:01 April 2001Publication History 121citation892DownloadsMetricsTotal Citations121Total Downloads892Last 12 Months7Last 6 weeks0 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my AlertsNew Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteGet Access
We investigate the problem of aligning two RDF databases, an essential problem in understanding the evolution of ontologies. Our approaches address three fundamental challenges: 1) the use of "blank" (null) names, 2) ontology changes in which different names are used to identify the same entity, and 3) small changes in the data values as well as small changes in the graph structure of the RDF database. We propose approaches inspired by the classical notion of graph bisimulation and extend them to capture the natural metrics of edit distance on the data values and the graph structure. We evaluate our methods on three evolving curated data sets. Overall, our results show that the proposed methods perform well and are scalable.
Archiving is important for scientific data, where it is necessary to record all past versions of a database in order to verify findings based upon a specific version. Much scientific data is held in a hierachical format and has a key structure that provides a canonical identification for each element of the hierarchy. In this article, we exploit these properties to develop an archiving technique that is both efficient in its use of space and preserves the continuity of elements through versions of the database, something that is not provided by traditional minimum-edit-distance diff approaches. The approach also uses timestamps. All versions of the data are merged into one hierarchy where an element appearing in multiple versions is stored only once along with a timestamp. By identifying the semantic continuity of elements and merging them into one data structure, our technique is capable of providing meaningful change descriptions, the archive allows us to easily answer certain temporal queries such as retrieval of any specific version from the archive and finding the history of an element. This is in contrast with approaches that store a sequence of deltas where such operations may require undoing a large number of changes or significant reasoning with the deltas. A suite of experiments also demonstrates that our archive does not incur any significant space overhead when contrasted with diff approaches. Another useful property of our approach is that we use XML format to represent hierarchical data and the resulting archive is also in XML. Hence, XML tools can be directly applied on our archive. In particular, we apply an XML compressor on our archive, and our experiments show that our compressed archive outperforms compressed diff-based repositories in space efficiency. We also show how we can extend our archiving tool to an external memory archiver for higher scalability and describe various index structures that can further improve the efficiency of some temporal queries on our archive.
We present a type system that naturally couples two different, and apparently contradictory, notions of inheritance that occur in object-oriented databases. To do this we distinguish between the type and a kind of a value: A type describes the entire structure of a value, while a kind describes only the availability of certain fields or methods. This distinction allows us to manipulate heterogeneous collections (collections of values with differing types) in a statically type-checked language. Moreover, the type system is polymorphic and types may be inferred using an extension of the technique used in ML. This means that it is easy to express general-purpose operations for the manipulation of heterogeneous collections. We believe that this system not only provides a natural approach to static type-checking in object-oriented databases; it also offers a technique for dealing with external databases in a statically typed language.
The DARPA Intelligent Integration of Information (I 3 ) effort is based on the assumption that systems can easily exchange data. However, as a consequence of the rapid development of research, and prototype implementations, in this area, the initial outcome of this program appears to have been to produce a new set of systems. While they can perform certain advanced information integration tasks, they cannot easily communicate with each other.With a view to understanding and solving this problem, there was a group discussion at the DARPA Intelligent Integration of Information/Persistent Object Bases (I 3 /POB) meeting in San Diego, in January, 1996; and a further workshop was held on this topic at the University of Maryland in April, 1996. The list of participants is in Appendix A. The idea emerging from these meeting a was not to force all systems to communicate according to specified standards, but to agree on the following:• A minimal core language, or Level 1 option, which would be a restriction of the object-oriented query language OQL, such that it will accept queries for relational databases. We recommend that all system components be able, at a minimum, to accept queries in this syntax, provided they address concepts (e.g., relations or classes, attributes or instance variables) known to that component. There must be a simple protocol to determine the schema of a system (its set of supported concepts).• A simple format for representing answers. This could also be a fragment of OQL and will be included in the core language specification.• A set of extensions, one of which could be full OQL, and would handle complex structures and abstract types (with methods). Other extensions will be needed to support rules (e.g., definitions of terms that can be shared among components), semistructured data (for self-describing objects), and shared code. A system component could support one or more of these extensions, independently, and there should be some simple protocol to determine the particular extensions that are supported.
The concept of the \ac{IXP}, an Ethernet fabric central to the structure of the global Internet, is largely absent from the development of community-driven collaborative network infrastructure. The reasons for this are two-fold. \acp{IXP} exist in central, typically urban, environments where strong network infrastructure ensures high levels of connectivity. Between rural and remote regions, where networks are separated by distance and terrain, no such infrastructure exists. In this paper we present RemIX a distributed \acp{IXP} architecture designed for the community network environment. We examine this praxis using an implementation in Scotland, with suggestions for future development and research.