Modeling the structure and evolution of scholarly knowledge

Grantee: Indiana University

Project Lead	Katy Börner Ph.D.
Amount	$420,164
Year Awarded	2005
Duration	3 years
DOI	https://doi.org/10.37717/220020085
Summary	Today, humanity's scholarly knowledge is stored not only in a rapidly increasing number of scholarly publications but also in the minds of living scientists. No man or machine can process this enormous amount of data and hence most of the knowledge is reinvented, duplicated across sciences, or simply lost forever after a short period of time. However, to survive as a species, we will need to preserve our planet or sustain life as we know it by other means. Besides achieving survival, we should aim to enable all human beings to live a healthy, productive, and fulfilling life. Meeting these challenges requires more efficient access to humanity's collective knowledge and tools that support the 'global brain' that is emerging on our planet. This project is part of a larger initiative to chart and communicate the evolution of science on a global scale and to provide more effective means of accessing and managing humanity's scholarly knowledge and expertise. It aims to study science using the tools and methods of science as suggested by Derek J. de Solla Price more than 40 years ago. In particular, we propose to develop a comprehensive, multidisciplinary approach to model the structure and evolution of scholarly knowledge. Recent work in network science aims to analyze and model the statistical and structural properties of paper-citation or co-author datasets. Of particular interest is the identification of elementary mechanisms that lead to the emergence of small-world and scale-free network structures. This project attempts to develop and validate agent-based, computational models that describe the dynamically evolving structure of scholarly knowledge ecologies. In particular, we are interested in analyzing and modeling the co-evolution of author and paper networks, the merging and splitting of existing scientific topics, the emergence of novel topics, the diffusion of reputations, and also the diffusion of scientific concepts via co-authorship and via the production and consumption of papers. We propose to extend our previously developed TARL model of scholarly scientific growth (Börner, Maru and Goldstone, 2004). This computational model simulates individual scientists and their production and consumption of scholarly works. The original model assigned one topic selected from a fixed set of topics to each author and paper. However, one of the most important aspects of science is the spontaneous emergence of new scientific concepts. New topics and scientific disciplines might surface as existing topics differentiate into more refined specializations, e.g., HIV research as a part of immunology. Other topics, like nanotechnology, synthesize knowledge from different domains of science, such as engineering, chemistry, and physics. As science becomes more interdisciplinary, a single author will often possess knowledge in more than one topic area. Informed by these trajectories in the evolution of science, our extended model will simulate the specialization, unification, and shift of scientific concepts over time. Our elaborated representation of authors will include their geographic and topical position. More importantly, the location authors have in the coupled network of authors and papers will determine their access to high quality knowledge and expertise, their ability to collaborate with the best co-authors, their effectiveness in diffusing their knowledge, and ultimately their reputation. Rather than treating an author's knowledge as a set of independent and isolated concepts, the extended TARL model will represent authors' expertise by webs of interdependent concepts. Scientific communities, then, are social networks of conceptual networks. Scientific communication resembles the collective alignment and 'patchwork' of conceptual networks. Empirical validation of the extended TARL model will use a database of about one million full-text papers covering two fields of science: physics (a 110-year Physical Review dataset) and computer/information science (using CiteSeer data). In addition, a 30-year Journal Citation Report dataset comprising Science Citation Index and Social Science Citation Index will be employed to validate the model at the level of journals. We will analyze and compare these empirical corpora and simulated data in terms of general statistics, local and global network properties, and general laws that describe the mechanisms of science. Information visualization and cartographic techniques will be employed and further developed to visualize and communicate the diverse growth and interaction patterns of networks, as well as the diffusion of information over time, geographic space, and topic space. Domain experts will be consulted to validate data analysis results and to optimize visualizations. The validated model is expected to formally reproduce and help analyze the dynamic characteristics of scholarly networks and knowledge diffusion. It will help answer questions such as: How will the structure of science evolve? How do authors and author communities use and contribute to the existing body of scholarly knowledge? How do scientific topics split and merge and how do new topics emerge? How does the reputation of authors and author teams diffuse? How do scientific concepts diffuse via co-authorship and paper-citation links? What is the influence of geospatial proximity and semantic proximity on the two above mentioned diffusion processes? What are the temporal dynamics of co-evolving author-paper networks? The research is timely. Progress in applying the tools of science to understand science itself is facilitated by access to high-volume and high-quality data sets of scientific output, as well as computers and algorithms capable of handling this enormous stream of data. A deeper understanding of the basic principles of the evolution and diffusion of scholarly knowledge will facilitate the design and support of more effective research structures. The final outcome will be better support of knowledge production, access, and dissemination. By modeling the evolution and diffusion of concepts in dynamically evolving network ecologies, the proposed research will advance methodological frontiers in agent-based modeling, cognitive science, complex network analysis, information dynamics, scientometrics, information diffusion theories, and information visualization. Findings of this project will have practical value for researchers and educators, research managers, grant agencies, and society, all of whom would benefit from a global view of the structure and evolution of science. A workshop and summer school on 'Modeling Scholarly Knowledge' organized by the PIs in summer 2008 will bring together leading experts in this field, disseminate results of this project, and train interested researchers and students in using the developed models.

Modeling the structure and evolution of scholarly knowledge

Grant Details