Building Scholarly Data Forest

Marko Požega,Dario Poljak,Kristina Kocijan

Building Scholarly Data Forest

2016

In this paper, we will demonstrate syntactic analysis and visualization of scientific data, namely references from scientific papers. Our main goal is to build a parser which could extract references from scientific papers, convert them to XML format, send to custom visualization algorithm and present in a web interface as a ReferenceTree for a single author. For this process, we use several different technologies such as NLP software NooJ, programming languages PHP and JavaScript in combination with HTML5. Our main problem was dissimilarity in reference styles between articles. Thus, our parser was designed to recognize different reference source (book, paper, web page) in APA, MLA and Chicago reference styles. As for the visualization idea, we have chosen the concept of presenting an author as a tree, the publication years as the main branches, the articles/books as twigs and references used in each article/book as the leaves. The books are grouped on the left side of the tree while the articles are grouped on the right side. With final output, every processed author should have a unique tree (preferences of references) and could be compared with the rest of the scientific forest.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations