Home Visualizations About Publications Contact Us Download

Reality Mining: Complex Social Systems

Research Design and Methodology

The Reality Mining research project has three aims: developing technology and algorithms for sensing, modeling, and changing human behavior. The sensing component is accomplished with mobile phone applications that capture data on users' location, proximity, communication and device usage behavior. The models are being generated using data from an ongoing study consisting of one hundred human subjects over the course of eight months and representing approximately 500,000 hours (~60 years) of human behavior. Seventy of the users are at the MIT Media Laboratory, while the remaining thirty are incoming students at the MIT Sloan business school adjacent to the laboratory. For the final aim, we develop algorithms for generating theoretically improved social network topologies and methods to implement these changes in a real social network through proximity-based notifications.

Continuous Bluetooth Scanning

It is possible to exploit the fact that modern phones have both short-range RF network (e.g., Bluetooth) and a long-range RF network (e.g., GSM), and that the two networks can augment each other for location and activity inference. The idea of logging cell tower ID to determine approximate location will be familiar to readers, but the idea of logging Bluetooth IDs (BTIDs) is relatively recent and provides very different types of information.

Bluetooth is a wireless protocol in the 2.40-2.48 GHz range, developed by Ericsson in 1994 and released in 1998 as a serial-cable replacement to connect different devices. Although market adoption has been initially slow, according to industry research estimates, by 2006 90% of PDAs, 80% of laptops, and 75% of mobile phones will be shipped with Bluetooth [23]. Every Bluetooth device is capable of 'device discovery', which allows them to collect information on other Bluetooth devices within 5-10 meters. This information includes the Bluetooth MAC address (BTID), device name, and device type. The BTID is a hex number unique to the particular device. The device name can be set at the user's discretion; e.g., "Tony's Nokia". Finally, the device type is a set of three integers that correspond to the device discovered; e.g., Nokia mobile phone, or IBM laptop.

Although hyped for sometime, the RF protocol Bluetooth is finally seeing mass-market adoption in mobile electronics; currently over one million Bluetooth devices are sold each week [10]. Although its primary use is to enable wireless headsets or laptops to connect to phones, as a by-product, Bluetooth devices are becoming aware of other devices carried by people nearby. This "accidental" functionality provides mobile communication devices with the capabilities of online introduction systems, except the introduction is situated in an immediate social context, rather than asynchronously in front of a desktop computer. To log BTIDs we designed a software application, BlueAware, that runs passively in the background on MIDP2-enabled mobile phones. Bluetooth was primarily designed to enable wireless headsets or laptops to connect to phones, but as a by-product, devices are becoming aware of other Bluetooth devices carried by people nearby. Our application records and timestamps the BTIDs encountered in a proximity log and makes them available to other applications, similar to the Jabberwocky project developed by Paulos et al. [14]. BlueAware is automatically run in the background when the phone is turned on, making it essentially invisible to the user.

BlueAware is a MIDP2 application designed to passively run in the background on many Bluetooth phones currently on the market. The key technological element behind this social scanning application resides in the fact that mobile phones with personal area network capabilities, such as Bluetooth, continuously transmit a unique identification code (BTID) that can be received by other devices. BlueAware records and timestamps the BTIDs encountered in a proximity log, similar to the Jabberwocky project developed by Paulos et al. [6]. If a device is detected that has not been recently recorded in the proximity log, the application automatically sends the discovered BTID over the GPRS network to the Serendipity server. Continually scanning and logging BTIDs can expend an older mobile phone battery in about 18 hours. While continuous scans provide a rich depiction of a user's dynamic environment, most individuals are used to having phones with standby times exceeding 48 hours. Therefore BlueAware was modified to only scan the environment once every five minutes, providing at least 36 hours of standby time.

BlueAware was designed to automatically begin running in the background when the phone is turned on, alerting the user to its presence with a dialogue box at startup. These types of alerts were incorporated into the system to adequately remind users the application is indeed logging Bluetooth devices. Additionally, the application was designed with a user interface that allows the users to read and delete the specific data being collected, as well as to stop the logging completely.

A variation on BlueAware is Bluedar. Bluedar was developed to be placed in a social setting and continuously scan for visible devices, wirelessly transmitting detected BTIDs to a server over an 802.11b network. The heart of the device is a Bluetooth beacon designed by Mat Laibowitz incorporating a class 2 Bluetooth chipset that can be controlled by an XPort web server [10]. We integrated this beacon with an 802.11b wireless bridge and packaged them in an unobtrusive box. An application was written to continuously telnet into multiple Bluedar systems, repeatedly scan for Bluetooth devices, and transmit the discovered proximate BTIDs to our server. Because the Bluetooth chipset is a class 2 device, it is able to detect any visible Bluetooth device within a working range of up to twenty-five meters. We are currently using the system to prototype a proximity-based introduction service [6].

Cell Tower Probability Distributions

There has been a significant amount of research which correlates cell tower ID with a user's location [2, 3, 8]. For example, Laasonen et al. describe a method of inferring significant locations from cell tower information through analysis of the adjacency matrix formed by proximate towers. They were able to show reasonable route recognition rates, and most importantly, succeeded in running their algorithms directly on the mobile phone [9].

Obtaining accurate location information from cell towers is complicated by the fact that phones can detect cell towers that are several miles away. Furthermore, in urban areas it is not uncommon to be within range of more than a dozen different towers. The inclusion of information about all the current visible towers as well as their respective signal strengths would help solve the location classification problem, although multipath distortion may still confound estimates.

We observe that relatively high location accuracy may also be achieved if the user spends enough time in one place to provide an estimate of the cell tower probability density function. Phones in the same location can be connected to different cell towers at different times depending on a variety of variables including signal strength and network traffic. Thus, over time each phone 'sees' a number of different cell towers, and the distribution of detected towers can vary substantially with even small changes in location. Figure 3 shows the distribution of cell towers seen for a given area with a 10m radius. Towers were only included in these distributions if the common area's static Bluetooth desktop computer was also visible, ensuring the users' location within 10m (or less). Discrepancies in the distributions are attributed to the users' typical position within the 10m radius. Users 2 and 4 both share a window office and have virtually the same cell tower distribution, despite having a very different distribution of hours spent in the office (as verified by the Bluetooth and cell tower logs). Users 1 and 5 both spend the majority of their time in the common area away from the windows and see only half as many towers as the others. User 3 is in a second office in the same area, and has a distribution of cell towers that is intermediate between the two other sets of users.

Despite progress in mapping cell tower to location, the resolution simply cannot be as high as many location-based services require. GPS is an alternative approach that has been used for location detection and classification [1, 12, 19], but the line-of-sight requirements prohibit it from working indoors. We have therefore incorporated the use of static Bluetooth device ID as an additional indicator of location, and shown that it provides a significant improvement in user localization, especially within office environments. This fusion of data is particularly appropriate since areas where cellular signals are weak, such as in the middle of large buildings, often correspond to places where there are many static Bluetooth devices, such as desktop computers. On average, the subjects in our study were without mobile phone reception 6% of the time. When they did not have reception, however, they were within range of a static Bluetooth device or another mobile phone 21% and 29% of their time, respectively. We expect coverage by Bluetooth devices to increase dramatically in the near future as they become more common in computers and electronic equipment.

We believe Bluetooth ID may become as important as cell tower mapping for estimation of user location. Figure 4 below shows the ten most frequently detected Bluetooth devices for one subject averaged over the month of January. This figure not only provides insight into the times the user is in his office (from the frequencies of the top 'Desktop'), but as mentioned in Section 4, also the type of relationship with other subjects. For example, the figure suggests the user leaves his office during the hour of 14:00 and becomes increasingly proximate to Subject 4. Judging from the strong cutoffs at 9:00 and 17:00, it is clear this subject had very regular hours during the month, and thus has fairly predictable high-level behavior. This "low entropy" behavior is also depicted below.

Privacy Implications

Mining the reality of our one hundred users raises justifiable concerns over privacy. However, the work in this paper is a social science experiment, conducted with human subject approval and consent of the users. Outside the lab we envision a future where phones will have greater computation power and will be able to make relevant inferences using only data available to the user's phone. In this future scenario, the inferences are done in realtime on the local device, making it unnecessary for private information to be taken off the handset. However, the computational models we are currently using cannot be implemented on today's phones. Thus, our results aim to show the potential of the information that can be gleaned from the phone, rather than presenting a system that can be deployed today outside the realm of research.




© 2008 Massachusetts Institute of Technology