Data Model for Analysis of Scholarly Documents in the MapReduce Paradigm
2013
At CeON ICM UW we are in possession of a large collection of scholarly documents that we store and process using MapReduce paradigm. One of the main challenges is to design a simple, but effective data model that fits various data access patterns and allows us to perform diverse analysis efficiently. In this paper, we will describe the organization of our data and explain how this data is accessed and processed by open-source tools from Apache Hadoop Ecosystem.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
18
References
6
Citations
NaN
KQI