Home Visualizations About Publications Contact Us Download

Reality Mining: Complex Social Systems

Sociology in the 21st Century

Trade-offs in traditional social data gathering

For over a century social scientists have studied relatively small, cohesive social groups [Tönnies, (1887); Cooley, (1909)]. Interaction and relationship data began to be collected in earnest in the 1930s [Davis et al. (1941)], typically through surveys as well as by placing an observer in a particular social setting whom continuously takes notes on the behavior of the group. Below shows data collected from a human observer placed in the Western Electric Company who was studying the interaction patterns between twelve employees [Roethlisberger & Dickson (1939)]. This traditional method of conducting ethnographic research is still quite prevalent, and captures rich sociological data yet is constrained to a limited number of subjects simply due to its time consuming nature. However a new method of collecting data on social systems has emerged with the prevalence of the internet. Today physicists such as Lada Adamic can now automatically collect large-scale social network datasets from digital information such as email [Adamic & Huberman, (2003)]. These networks now represent a large number of people and have a variety of interesting properties, yet the rich interpersonal relationship information that was traditionally collected by the human observer has been lost.

Dealing with the inherent tradeoffs between traditional ethnographic and today's internet-enabled social network data has spawned attempts to generate both rich and large-scale data. Agent-based models have been proposed as a solution to this problem of dearth of data and detail by simulating people's behavior in groups using simple rules. However this has been seen not only as an oversimplification of human behavior, but also in many instances completely wrong. The latest models of gossip dissemination across an organization of agents makes the assumption the agents move with Brownian motion - an assumption almost all people could recognize as spurious [Moreno et al. (2004)].

The limitations of these methods can be seen as the rationale why social scientists, unlike almost any other type of scientist, are still conducting analysis and publishing papers on datasets collected well over fifty years ago [Freeman, 2003]. The massive technical breakthroughs over the past few decades that have revolutionized virtually every other science have yet to dramatically impact social science. The data collected by the human observer on the behavior of those twelve workers back in 1935 is still some of the best data a social network analyst can get today. However, we are beginning to enter another era of technical breakthrough - a breakthrough that will manifest itself by outfitting each employee in tomorrow's electric company with his own personal "observer" that tirelessly logs everything he does. Sociologists are now becoming aware of the possibility that the data collected by the human observer of 1935 could now be collected by today's pervasive mobile phone.

This new era of mobile communication technology has had truly global ramifications. More than one billion mobile phones were sold during 2003, six times as many as the number of personal computers sold that year [Wood (2004), MM] - or one new phone for every six people on Earth. Mobile phones are now available to the majority of people who earn more than $5 a day, making them the fastest technology adoption in mankind's history. And the potential functionality of this ubiquitous infrastructure of mobile devices is dramatically increasing. Many of these billion phones currently have a processor equivalent in power to the ones in our desktop computers just a decade ago. No longer constrained to simply placing and receiving voice calls, or even simple calendar and address book applications - now that hundreds of millions of people are carrying pocket-sized, networked computers throughout their daily lives, the possibilities are staggering.

New Instruments for Behavioral Data Collection

With the rapid technology adoption of mobile phones comes an opportunity to unobtrusively collect continuous data on human behavior [Himberg (2001), Mäntyjärvi (2004)]. The very nature of mobile phones makes them an ideal vehicle to study both individuals and organizations: people habitually carry a mobile phones with them and use it as a medium through which to do much of their communication. Now that handset manufacturers are opening their platforms to developers, standard mobile phones can be harnessed as networked wearable sensors. The information available from today's phones includes the user's location (celltower ID), people nearby (repeated Bluetooth scans), communication (call and SMS logs), as well as application usage and phone status (idle, charging, etc). However, because the phones themselves are networked, their functionality transcends merely a logging device that augments social surveys. Rather phones can start being used as a means of social network intervention - supplying introductions between two proximate people who don't know each other, but probably should.

Research is being pursued to develop a new infrastructure of devices that while not only aware of each other, are also infused with a sense of social curiosity. Work is ongoing to create devices that attempt to figure out what is being said, infer the type relationship between the two people, and even suggest additional subjects to discuss. These devices see what the user sees, hear what the user hears, and are beginning to learn patterns in people's behavior. This enables them to make inferences regarding whom the users knows, whom the user likes, and even what the user may do next. Although a significant amount of sensors and machine perception are required, it will only be a matter of a few years before this functionality will be realized on standard mobile phones.

Self-Report vs. Observations from Mobile Phones

In return for the use of the Nokia 6600 phones, students have been asked to fill out web-based surveys regarding their social activities and the people they interact with throughout the day. Comparison of the logs with survey data has given us insight into our dataset's ability to accurately map social network dynamics. Through surveys of approximately forty senior students, we have validated that the reported frequency of (self-report) interaction is strongly correlated with the number of logged BTIDs (R=.78, p=.003), and that the dyadic self-report data has a similar correlation with the dyadic proximity data (R=.74, p~=.0001). Additionally, a subset of subjects kept detailed activity diaries over several months. Comparisons revealed no systematic errors with respect to proximity and location, except for omissions due to the phone being turned off.

Interestingly, the surveys were not significantly correlated with the proximity logs of the incoming students. This phenomena will be addressed in a later paper (Eagle, Lazer, and Pentland, 2005) discussing the fallibility of self-report data in particular situations.




© 2008 Massachusetts Institute of Technology