21st Century Science Initiative Grant: Studying Complex Systems
The technology of social media impacts our society on a planetary scale: revolutions such as the recent unrest in many Arab countries are triggered and coordinated by Facebook posts; Twitter informs us of an earthquake and mobilizes help more rapidly than ofﬁcial organizations; in the aggregate, microblog posts predict stock market ﬂuctuations, movie box ofﬁce earning, and election outcomes. Even scientists increasingly use social media to share discoveries within the scientiﬁc community. Social media may have a profound impact on the acceleration of innovation and the breakdown of disciplinary boundaries, but like any technology it is fraught with inherent dangers as well. Used to spread false information, spam, and malware, social media affect our political discourse, making it easier to talk to echo chambers of people who think like us, thus facilitating the polarization of public opinion.
It has been argued that information ﬂows are key determinants of the functioning and the structure of society. From this thesis it follows that the study of information ﬂows may elucidate the complex working mechanisms behind human organizations. With this motivation, our project explores one key question: How do ideas propagate through complex online social networks?
As an increasing portion of human activities shift to the online world, social media have become a powerful lens for observing broad swaths of human behavior: whom we befriend, how we communicate, what information we consume, produce, and propagate. The dynamics of this vast, networked, complex system emerge from the interactions of countless individuals. Despite the importance of online social networks in our society, our understanding of the mechanisms regulating the collective dynamics of online information diffusion is still in its infancy. The good news is that the study of the mathematical regularities behind complex patterns of human dynamics is for the ﬁrst time feasible, thanks to the availability of social media data. In particular, our access to massive data streams from microblogging networks such as Twitter, Yahoo! Meme, and Google Buzz allows us to revisit the assumptions made in classical models of information diffusion. The proposed project directly answers a call for a computational social science, a radically new way to study social phenomena via a complex systems approach. Our coupled data analysis and in silico modeling efforts seek to understand the complex emergent patterns generated by online human interaction through the simulation of stylized agent behaviors.
There is a long tradition of modeling the spread of information as an epidemic diffusion process. A piece of information can pass from one individual to another through social contact and ‘infected’ individuals can, in turn, propagate this information to others, possibly generating a full-scale contagion. Although noticeable progress has been made in the last decade in the study of idea contagion, most of the approaches are rather speculative and borrow heavily from the vast literature on the spread of infectious disease. The ﬁeld has struggled with the lack of large-scale data and the intrinsic difﬁculties in quantitatively modeling any process of social contagion. For, while the analogy with biological epidemics is conceptually appealing, the information contagion process encompasses many more facets than the biological one. For instance, almost no social network can be considered to be a closed system, therefore a mixture of endogenous and exogenous factors shape the behavior of information spread. Broadcasts by traditional media may have enormous inﬂuence on the speed and duration of diffusion processes as well, with the mutual interaction and feedback loops between traditional and online media further complicating the modeling of exogenous factors. A more practical obstacle is that the structure of the underlying social network is often unknown. Because of these difﬁculties we lack the commonly-accepted, empirically-validated, and increasingly-detailed models that are amenable to producing testable hypotheses. So far, discriminating between competing models of the microscopic processes that drive the diffusion of information has been problematic.
The advent of the Web and social media have given us an opportunity to study social phenomena in more quantitative ways. For example, in our group we have analyzed Web trafﬁc to understand how we satisfy our information needs. We have examined how our attention is indirectly driven by Web authors, as their links are aggregated by the ranking algorithms employed by search engines. Finally, we have explored sudden bursts of global attention as revealed by the temporal shifts in the trends of Wikipedia topics. Data from microblogging networks are now providing researchers with a much more direct probe into the dynamics of social communication. Twitter in particular has generated much attention due to its peculiar features, enormous popularity, and data sharing policy.
To carry out the proposed research, we will develop a computational infrastructure that will enable the study of meme diffusion in large-scale social media by collecting, analyzing, classifying, visualizing, and modeling massive streams of public microblogging data. The results of this effort will then drive the development of general models for the behavior of users and the spread of ideas in social networks.
We will start by analyzing diffusion patterns for ideas, focusing on cascade size distributions and popularity time series. The driving questions of our analysis will be: Why do some ideas cause viral explosions while others do not receive any attention? Why do some topics remain popular for months while others are forgotten in a matter of minutes? Is it possible to identify a set of representative behaviors that can help us design quantitative, predictive models of idea contagion? To what extent do online information diffusion processes lend themselves to being modeled like epidemics?
Social media are blurring the boundaries between information production and consumption. Traditionally these roles were clearly distinct, while today anyone can create and share content with anyone who cares to read it with a simple click. Can we identify different roles in this new landscape? Do we observe patterns similar to traditional mass-media, where there are only a few broadcasters and masses of passive consumers of information? Does everyone play both roles?
In our modeling effort we will explore several key ingredients that might explain the observed information contagion patterns: the social network on which ideas travel; the role and inﬂuence of users; the topical interests of users as reﬂected in the content of their information exchanges; competition for limited attention; and the aging of information. All of these elements interact with each other. For instance, our social links change with our interests; the inﬂuence of users depends on their social status and topical expertise; and our friends and the information to which we are exposed affect our interests. While all of these mechanisms can be explored through agent-based models, it is clear that such processes are much more complex than, for example, the spread of a virus like H1N1. Quantitatively modeling the basic ingredients of such a process along with their interplay is the crucial challenge that the present proposal aims to meet.