There are thousands of data repositories on the Web, providing access to millions of datasets. National and regional governments, scientific publishers and consortia, commercial data providers, and others publish data for fields ranging from social science to life science to high-energy physics to climate science and more. Access to this data is critical to facilitating reproducibility of research results, enabling scientists to build on others' work, and providing data journalists easier access to information and its provenance. In this paper, we discuss Google Dataset Search, a dataset-discovery tool that provides search capabilities over potentially all datasets published on the Web. The approach relies on an open ecosystem, where dataset owners and providers publish semantically enhanced metadata on their own sites. We then aggregate, normalize, and reconcile this metadata, providing a search engine that lets users find datasets in the "long tail" of the Web. In this paper, we discuss both social and technical challenges in building this type of tool, and the lessons that we learned from this experience.
In this paper we present our current experience of aggregating user data from various Social Web applications and outline several key challenges in this area. The work is based on a concrete use case: reusing activity streams to determine a viewer’s interests and generating television programme recommendations from these interests. Three system components are used to realise this goal: (1) an intelligent remote control: iZapper for capturing viewer activities in a cross-context television environment; (2) a backend: BeanCounter for aggregation of viewer activities from the iZapper and from different social web applications; and (3) a recommendation engine: iTube for recommending relevant television programmes . The focus of the paper is the BeanCounter as the first step to apply Social Web data for viewer and context modelling on the Web. This is work in progress of the NoTube project.
Separation between content and presentation has always been one of the important design aspects of the Web. Historically, however, even though most Web sites were driven off structured databases, they published their content purely in HTML. Services such as Web search, price comparison, reservation engines, etc. that operated on this content had access only to HTML. Applications requiring access to the structured data underlying these Web pages had to build custom extractors to convert plain HTML into structured data. These efforts were often laborious and the scrapers were fragile and error-prone, breaking every time a site changed its layout.
SUMMARY SUMMARY This article introduces the Friend Of A Friend (FOAF) vocabulary specification as an example of a Semantic Web technology. A real world case study is presented in which FOAF is used to solve several specific problems of identity management. The main goal is to provide some basic theory behind the Semantic Web and then attempt to ground that theory in a practical solution. KEYWORDS: Semantic WebFOAFFriend Of A FriendSocial Networks
MedCERTAIN (MedPICS Certification and Rating of Trustworthy Health Information on the Net, http://www.medcertain.org/) is a recently launched international project funded under the European Union's (EU) "Action Plan for safer use of the Internet. It provides a technical infrastructure and a conceptual basis for an international system of "quality seals", ratings and self-labelling of Internet health information, with the final aim to establish a "trustmark" for networked health information. Digital "quality seals" are evaluative metadata (using standards such as PICS = Platform for Internet Content Selection, now being replaced by RDF/XML) assigned by trusted third-party raters. The project also enables and encourages self-labelling with descriptive meta-information by web authors. Together these measures will help consumers as well as professionals to identify high-quality information on the Internet. MedCERTAIN establishes a fully functional demonstrator for a self- and third-party rating system enabling consumers and professionals to filter harmful health information and to positively identify and select high quality information. We aim to provide a system which allows citizens to place greater trust in networked information, exemplified in the domain of health information, whilst also making a significant contribution for similar projects with different target domains. The project will demonstrate how PICS-based content rating and filtering technologies can automate and exploit value-adding resource description services. It further proposes standards for interoperability of rating services.
The role that ontologies play or can play in designing and employing semantic technologies has been widely acknowledged by the Semantic Web and Linked Data communities.But the level of collaboration between these communities and the Applied Ontology community has been much less than expected.Also, ontologies and ontological techniques appear to be of marginalized use in Big Data and its applications.To understand this situation and foster greater collaboration, Ontology Summit 2014 brought together representatives from the Semantic Web, Linked Data, Big Data and Applied Ontology communities, to address three basic problems involving applied ontology and these communities:(1) The role of ontologies [in these communities],(2) Current uses of ontologies in these communities, and (3) Engineering of ontologies and semantic integration.The intent was to identify and understand: (a) causes and challenges (e.g.scalability) that hinder reuse of ontologies in Semantic Web and Linked Data, (b) solutions that can reduce the differences between ontologies on and off line, and (c) solutions to overcome engineering bottlenecks in current Semantic Web and Big Data applications.Over the past four months, presentations from, and discussions with, representatives of the Semantic Web, Linked Data, and Applied Ontology communities have taken place across four tracks.Each Track focused on different aspects of this year's Summit topic: (Track A) Investigation of sharable and reusable ontologies; (Track B) Tools, services and techniques for a comprehensive and effective use of ontologies; (Track C) Investigation of the engineering bottlenecks and the ways to prevent and overcome them; (Track D) Enquiry on the variety problem in Big Data.In addition to the four Tracks' activities there was a Hackathon.Six different Hackathon projects took place, all available at their individual project public repositories.An online Community Library and an online Ontology Repository have been created as freely accessible Community resources.This Ontology Summit 2014 Communique presents a summary of the results, original in its attempt both to merge different communities' discourses and to achieve consensus across the Summit participants with respect to open problems and recommendations to address them.