Data Model for Analysis of Scholarly Documents in the MapReduce Paradigm

Adam Kawa,Lukasz Bolikowski,Artur Czeczko,Piotr Jan Dendek,Dominika Tkaczyk

Data Model for Analysis of Scholarly Documents in the MapReduce Paradigm

2013

Adam Kawa
Lukasz Bolikowski
Artur Czeczko
Piotr Jan Dendek
Dominika Tkaczyk

At CeON ICM UW we are in possession of a large collection of scholarly documents that we store and process using MapReduce paradigm. One of the main challenges is to design a simple, but effective data model that fits various data access patterns and allows us to perform diverse analysis efficiently. In this paper, we will describe the organization of our data and explain how this data is accessed and processed by open-source tools from Apache Hadoop Ecosystem.

Keywords:

Data model
Data access
Possession (law)
Data mining
Computer science
World Wide Web

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations