Multiplex Network Approach for Scientific Articles

2015 
Lenovo Innovation Center ³CPqD - Telecom and IT Solutions barbosal@dca.fee.unicamp.br, attux@dca.fee.unicamp.br, godoy@dca.fee.unicamp.br Keywords: Assortativity, Digital Library, Graph Databases, Knowledge Discovery 1. INTRODUCTION “Big Data” is a recent phenomenon characterized by the increasingly fast growth of data availability over many human spheres, including Science, Engineering, Biology and others. Academic publications are no exception to this trend and this rising number of scientific documents arouses the interest in the new kinds of scientific discoveries and envisions a distinct way to analyze and visualize such information. In this work in progress, we are developing a method to represent and analyze metadata from scientific papers using multiplex complex networks. The proposed method makes use of text mining techniques to extract suitable data as attributes and later insert them as elements in a complex network. The main goal is, given a set of scientific papers, to investigate how concepts, research institutions and individuals are related and to reflect about the collaboration in the field under analysis. How partnerships between authors and countries are established? Is there any correlation between topics researched in different countries? These are some of the questions we want begin to tackle in this study. 2. IMPLEMENTATION The proposed method is based on the technique defined in [1] to extract attributes (titles, authors, countries, keywords and publication year) and their relationships from scientific articles in PDF format, and lastly, use a graph database to process the data [2]. In this work we intend to analyze assortativity and similarities in the network formed by the extracted data. Previous studies have explored these approaches, however considering complex networks composed by a single node type. In contrast, we propose to apply bibliometrics measures on a multiplex network, illustrated in Figure 1. Thereby, it is possible to evaluate different views of collaboration in academia, in order to identify information and patterns which are hidden a priori. This approach, described in Table 1, allows, for instance, a study of the correlation between publication dates and keywords, showing possible research trends. Characteristic Description Directed Relationship direction Weighted Number of occurrences of each relationship Semantic Sub-networks induction by label filtering Multiplex Multiple node types and relationships Graph Database Supports the dynamic study of the network Table 1. Proposed network profile The analysis performed in scientific networks is based on:  Degree assortativity – How are authors, countries and articles connections associated in the network?  Discrete assortativity by year – How are keywords related according to the [year of publication?  Discrete assortativity by continent – Do countries tend to collaborate with countries from the same continent?  Similarities – Which is the research correlation between countries regarding the article keywords? Are the most cited authors (and their countries) also collaborative? Useful information about assortativeness and network science can be respectively found at [3, 4]. Figure 1. Scientific network partially demonstrated 3. DISCUSSION Preliminary results were obtained from an initial sample with 214 scientific articles related to the index term “collaborative” [1], gathered from the IEEE Xplore database. Positive degree assortativity for authors (r=0.788) and negative for countries (r=-0.116) might indicate that the scientific collaboration is more prone to be established among individuals than among countries. Negative assortativity for articles (r=-0.327) shows that the few most cited articles are associated with several less cited ones. Through the discrete assortativity method, it was possible to identify in the database a moderate propensity (r=0.307) of articles to cite papers written by authors from the same continent. Regarding the similarity analysis, citations are correlated to others aspects: authors and countries of the most cited articles do collaborate in their respective networks and are network hubs. Keywords from these articles are also widely adopted by authors. Also it was possible to highlight some patterns in the international collaboration, such as research syntony, based on article keywords and the information source, considering the origin country. In this initial phase of the work, we could observe that the approach using multiplex networks allows an analysis from different points of view of the same network, using edge labels to induce subnets, apply metrics and then correlate the values found. The results from this work corroborate the idea that collaboration in the scientific world has great impact on the dissemination of information and knowledge throughout the world, as well for the researchers themselves. Acknowledgements to the National Council for Scientific and Technological Development (CNPq) for the financial support. 4. FUTURE WORK  Expand the database in order to validate the results  Analyze the geographic influence on scientific collaboration 5. REFERENCES 1]Barbosa,L.M.,Attux,R.,GodoyA.2014.Umaanalisedeassortatividade e similaridade para Artigos Cientificos. XI Brazilian Symposium on Collaborative Systems, Curitiba, Brazil. [2] Neo4j Graph Database. http://www.neo4j.org. [3] Newman, M. 2002. Assortative Mixing Networks. University of Michigan. Santa Fe Institute. [4] Baronchelli, A., Ferrer-i-Cancho, R., Pastor-Satorras, R., Chater, N., Christiansen, M. 2013. Networks in Cognitive Science. Elsevier.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    1
    References
    0
    Citations
    NaN
    KQI
    []