Removing roadblocks for mobile phone sensing

(1)

(2)

(3)

S

ENSING

Proefschrift

ter verkrijging van de graad van doctor aan de Technische Universiteit Delft,

op gezag van de Rector Magnificus prof. ir. K. C. A. M. Luyben, voorzitter van het College voor Promoties,

in het openbaar te verdedigen op donderdag 9 oktober 2014 om 15:00 uur

door

Niels B

ROUWERS

ingenieur in de technische informatica geboren te Waalwijk, Nederland.

(4)

Prof. dr. K. G. Langendoen Samenstelling promotiecommissie:

Rector Magnificus, voorzitter

Prof. dr. K. G. Langendoen, Technische Universiteit Delft, promotor Prof. dr. ir. D. H. J. Epema, Technische Universiteit Delft

Prof. dr. E. Visser, Technische Universiteit Delft Prof. dr. M. B. Kjærgaard, Syddansk Universitet Prof. dr. ing. P. J. M. Havinga, Universiteit Twente

Prof. dr. O. Amft, Universität Passau

Dr. M. Woehrle, Robert Bosch, GmbH

Prof. dr. ir. H. J. Sips, Technische Universiteit Delft, reservelid

An electronic version of this dissertation is available at

http://repository.tudelft.nl/.

Advanced School for Computing and Imaging

This work was carried out in the ASCI graduate school. ASCI dissertation series number 312.

(5)

means to other things, are knowledge, art, instinctive happiness, and relations of friendship or affection. Bertrand Russell

(6)

(7)

A

CKNOWLEDGEMENTS

First of all I would like to thank my promotor, Koen Langendoen, for his unwavering support. You took me in, supported me, and kept believing in me throughout. Your style of management may be distinctly hands-off, but you where always there for me when it counted.

Of those I had the privilege of working with, I would like to thank Matthias, for being always critical, and nearly always right. Marco Zuniga, for his enthusiasm and experi-ence, and Prezemek for his kind words, friendship, and of course lunches at the sports centre. Thank you for tolerating my moods, especially around deadlines.

I would further like to thank my fellow PhD students and colleagues, Andreas, Andrei, Marco, Martin, Otto, Philipp, Venkat, and Ioannis for being good company and great friends. I will miss our discussions over coffee more than anything else.

One cannot survive without good friends. In no particular order, I would like to thank Thijs, Paco, Steve, Harry, Peter, Marieke Kos, Marieke Oud, Hilde, Inge, Alex, Niels, and of course all those I forgot. Special thanks to Tijs, Lotte, Marjon and Maysam for getting me through some of the rougher patches, I am truly thankful for having awesome people like you to fall back on.

Finally I would like to thank my parents, who have always been there for me. I love you both.

Niels Brouwers Delft, September 2014

(8)

(9)

C

ONTENTS

1 Introduction 1

1.1 Applications . . . 2

1.1.1 Traffic Monitoring . . . 2

1.1.2 Health . . . 2

1.1.3 Computational Social Science . . . 3

1.2 Problem Statement . . . 4

1.2.1 Localization . . . 4

1.2.2 Experimentation at Scale . . . 5

1.2.3 Energy Analysis . . . 5

1.3 Thesis Contributions and Outline. . . 6

2 Dwelling in the Canyons 9 2.1 Approach. . . 10 2.1.1 Sensors . . . 10 2.1.2 Data collection. . . 12 2.1.3 Sensor idiosyncrasies . . . 16 2.2 Evaluation . . . 16 2.2.1 Classification . . . 18 2.2.2 Dwelling Locations . . . 23 2.3 Related Work . . . 28 2.3.1 Mobility Classification. . . 28

2.3.2 Extracting Points of Interest . . . 28

2.3.3 Path Tracking . . . 29

2.4 Conclusions. . . 29

3 Incremental Scanning 31 3.1 The Need for a Thorough Evaluation . . . 32

3.2 Scanning Cost. . . 33

3.2.1 Implementing Incremental Scanning . . . 33

3.2.2 Energy Consumption . . . 35 3.2.3 Active State . . . 36 3.2.4 Passive State. . . 37 3.2.5 Energy Model . . . 38 3.3 Incremental Scanning. . . 39 3.3.1 Diminishing Returns. . . 39 3.3.2 Channel Popularity . . . 40 3.3.3 Scan Similarity. . . 41 3.3.4 Algorithm . . . 43 ix

(10)

3.4 Evaluation . . . 44 3.4.1 Insight 1 . . . 44 3.4.2 Insight 2 . . . 44 3.4.3 Insight 3 . . . 45 3.5 Related Work . . . 46 3.6 Conclusions. . . 47

4 A Middleware for Mobile Phone Sensing 49 4.1 Related work . . . 50

4.1.1 Context-aware computing. . . 50

4.1.2 Mobile phone sensing . . . 51

4.1.3 Phone Usage Traces . . . 51

4.2 Design . . . 51 4.2.1 Testbed Organization . . . 52 4.2.2 Deployment . . . 52 4.2.3 Participation. . . 53 4.2.4 Experiment Description . . . 53 4.2.5 Programming Abstractions. . . 54 4.3 Implementation . . . 55 4.3.1 Example Application. . . 55 4.3.2 Node Architecture . . . 56 4.3.3 Publish-Subscribe Framework. . . 57 4.3.4 Scripting. . . 57 4.3.5 Event Scheduling . . . 58 4.3.6 Communication. . . 59 4.3.7 Tail Detection . . . 59 4.4 Evaluation . . . 61 4.4.1 Program Complexity. . . 61 4.4.2 Power Consumption. . . 62 4.4.3 Experimental Results. . . 63 4.5 Conclusions. . . 64

5 Mobile Power Metering 67 5.1 Background. . . 69

5.1.1 Power Measurement. . . 69

5.1.2 Phone State . . . 70

5.2 NEAT . . . 71

5.2.1 Installation and Use of NEAT. . . 72

5.3 Power Analysis . . . 73

5.3.1 Trace Mapping. . . 73

5.3.2 Visualization. . . 75

5.3.3 Processing. . . 77

5.4 Data Acquisition . . . 79

5.4.1 The NEAT Mobile Power Metering Board . . . 79

5.4.2 Event Logger. . . 82

(11)

5.5 Evaluation . . . 84

5.5.1 Display. . . 84

5.5.2 Power Consumption of Background Wi-Fi Scanning. . . 86

5.5.3 User Study. . . 89

5.6 Related Work . . . 92

5.7 Conclusions. . . 93

6 Conclusions and Future Work 95 6.1 Localization. . . 95 6.1.1 Dwelling. . . 95 6.1.2 Incremental Scanning . . . 96 6.2 Experimentation at Scale . . . 96 6.3 Energy Analysis. . . 97 6.4 Future Work. . . 98 6.4.1 Localization . . . 98 6.4.2 Experimentation at Scale . . . 99 6.4.3 Energy Analysis . . . 99 Bibliography 101 Summary 113 Samenvatting 115 Curriculum Vitæ 117

(12)

(13)

1

I

NTRODUCTION

In a time span of just a few years, the title of dominant consumer computing device has passed from the desktop to the smartphone. There are now more people accessing the internet from a phone or tablet, than from what would traditionally be classified as a ‘computer’ [92]. This shift is about more than form factor alone, as smartphones offer a completely unique computing platform that introduces new opportunities and challenges.

The always on, always connected nature of smartphones means they are now more deeply embedded into our daily lives than ever before. We take our phones with us to stay connected with the world, but this also means that our phones stay connected with us. Smart applications can take advantage of this property to collect information about the user over long periods of time, infer trends and patterns, and predict future events. Correlating data from a population of devices makes it possible to extend analysis to spatial and social patterns.

Modern devices come equipped with a range of sensors, such as motion sensors, a light sensor, camera, microphone, and so on. By processing this sensor data a phone can detect whether its user is walking or driving, assist in locating a parked car, and even gaugue stress levels. These new sources of contextual information, combined with the deep integration of user and device, represent a treasure-trove of information with ap-plications in social science, health care, traffic analysis, and many more.

The field of mobile phone sensing (MPS) [76] has emerged to explore the possibilities of collecting and processing sensor data on smartphones. Of course, new technology presents new challenges as well. A major strength of smartphones is that they are mo-bile, but tagging sensor data with location still comes with a high energy cost, and may be altogether unavailable in in-door environments. Scaling up experiments over large numbers of phones comes with a large overhead in terms of programming, organization, and deployment, which makes experimental research difficult. Finally, mobile devices depend on battery power, which makes energy consumption a major concern when de-veloping applications and algorithms. Typical devices need to be charged once a day just from moderate use, and any application that significantly reduces battery life even

(14)

further will have difficulty being accepted by end users.

The goal of this dissertation is to move the field of mobile phone sensing forward by addressing these issues, which present major road blocks that hinder widespread adop-tion of mobile phone sensing in the wider scientific community.

1.1. A

PPLICATIONS

To better understand the concepts behind mobile phone sensing, we believe it is helpful to illustrate a number of MPS applications.

1.1.1. T

RAFFIC

M

ONITORING

Global urbanization and rising populations are putting strains on road infrastructure, re-sulting in traffic congestion and pollution. Monitoring road health, pin-pointing bottle-necks, predicting traffic jams ahead of time, and dynamically routing traffic around prob-lem areas can all help alleviate these probprob-lems. Mobile phones represent an ideal plat-form for traffic monitoring due to excellent localization capabilities and an always-on internet connection.

Analysis of traffic patterns starts with obtaining location traces from a large group of users. Location information is commonly taken from GPS, but Wi-Fi and cellular in-formation may be used as a fall-back in case of bad GPS reception or in areas where there is no line-of-sight to satellites, such as close to high-rise buildings [122]. More-over, accelerometers can be used to classify the transporation mode of a person, such as standing still, walking, driving, etc [68].

Given a large database of location traces, road networks can be analyzed for temporal patterns such as rush hour, and spatial properties like road segment ‘popularity’ [30]. In the interest of prevention, models may be constructed that can predict congestion [56], and making these predictions available to drivers gives them the option to take alterna-tive routes.

A further creative use of smartphone sensors is automatically detecting potholes as a car drives over them, and reporting their location to the municipal government who can then dispatch a repair crew. This can be done by detecting the shaking of the car us-ing the phone’s accelerometers [33], or using the microphone to listen for characteristic ‘bumping’ sounds [84]. Besides detecting potholes, the microphone can also be used to detect other cars honking their horns, a sign of congestion [87].

1.1.2. H

EALTH

The strong tie between a phone and its user – we carry it with us wherever we go – makes it an ideal device for measuring a broad scope of health-related information. An impor-tant, although unfortunate, side-effect of modern-day life is stress, which has a direct effect on mood, as well as a host of other physical ailments. It is well-known that stress influences a person’s speech, and systems have been proposed that use the smartphone’s microphone to unobtrusively learn the user’s stress levels [82].

Heart rate is another important health metric, and historically has been monitored mainly using specialized on-body sensors such as chest straps. However, the camera on a smartphone can detect the heart rate of a person looking directly into it. Tiny changes

(15)

Figure 1.1: The Septimu hardware platorm inte-grates and IMU, thermometer, and inward-facing microphone into an earphone to measure user activity and heart rate. Source: Microsoft Re-search [107]

Figure 1.2: Fitbit Force, a wearable activity sen-sor that measures step count, length and quality of sleep, etc. Source: fitbit.com [40]

in skin color, caused by the change of blood volume in the face, are not visible to the human eye but can be picked up by the camera [60]. Figure1.1shows a way to monitor heart rate without active involvement from the user, by embedding an inward facing microphone inside a set of earbuds, as the inner ear is an excellent place for measuring blood pressure variation [93].

The introduction of low-power radios designed for connecting phones with external sensors, such as Ant+ [124] and BlueTooth Low Energy (BLE) [50], has sparked a number of commercial health-related products. An example is the FitBit [40] wristband, shown in Figure1.2, that measures the length and quality of sleep, and acts as a step counter. Closing the loop, projects such as UbiFit [72] not only monitor, but actively stimulate the user to improve their health. In the case of UbiFit, this is done with an app that shows a stylized garden. The application detects walking and running, which makes the garden ‘grow’, promoting a healthier life style.

1.1.3. C

OMPUTATIONAL

S

OCIAL

S

CIENCE

Social scientists have long struggled with the problem of data gathering, and often have had to rely on surveys, which are notoriously biased and suffer from memory effects [70]. The field of computational social science [76,78] deals with harvesting and analyzing large data sets of human actions and interactions. While much can be learned from digital sources such as e-mail traffic and social networks, researchers have started to take a more direct approach by putting sensors on people, giving rise to the field of reality mining [31] or people-centric sensing [15].

In an effort to understand work-floor communication at a German bank, purpose-built devices capable of detecting proximity (Bluetooth) and face-to-face conversations (infra-red and microphone) were handed out to employees. It was found that these so-cial cues were a valuable source of information, for example in explaining how the com-pany operated during times of crisis. This is illustrated in Figure1.3, which shows com-munication intensity between different departments. The large amount of face-to-face communication between product development and support (indicated by the heavy arc between these two departments) happened just after a new financial product was

(16)

in-Manager Development Sales Support Customer Service

Figure 1.3: Communication between departments at a German bank. Face-to-face communication (top), and digital communication (bottom). Source: Pentland 2010 [100]

troduced that support was not sufficiently briefed on. This shows how people find it easier to use ad-hoc meetings to resolve problems quickly rather than use digital means of communication [99].

In more recent work the smartphone has supplanted the somewhat cumbersome specialized hardware. A powerful example is using the smartphone’s microphone to de-tect conversation: if two devices overhear the same speech patterns, it means they are close to each other and their owners are probably talking to each other or are in a larger meeting of multiple people [85,125]. By tracking a persons’s location, and where one spends longer amounts of time (so-called ‘hang outs’), it is possible to cluster partici-pants into groups of similar interest. This can be seen as the next step ‘beyond demo-graphics’ with both scientific and commercial applications [99].

1.2. P

ROBLEM

S

TATEMENT

In this dissertation we tackle a number of important challenges inherent to mobile phone sensing. The ones we will focus on are localization, experimentation at scale, and power consumption. We will briefly touch upon these topics here.

1.2.1. L

OCALIZATION

As the previous section illustrates, location is one of the most important and useful types of contextual information that can be gathered from mobile devices. At first glance loca-tion tracking may appear to be a solved problem, since all modern smartphones come with a GPS receiver. Although GPS is relatively accurate with errors typically within a few meters, there are several problems related to its use in mobile phone sensing applica-tions.

Location can either be a primary information stream (i.e. computational social sci-ence, traffic analysis) or act as meta-data for other sensors. In the first case, the system must track user location over time. Given that human mobility exhibits long periods of inactivity, with short ‘bursts’ of mobility, temporal coverage is needed to avoid loss of detail. In other words, localization must be continuous. In the second case, location is

(17)

used to place readings from other sensors (i.e. noise, pollution, etc.) into a geographical context. Here it is important that a location fix is available quickly, before the user has moved too far from the place where the data was collected. Localization must therefore be instantaneous.

GPS fails to meet both requirements due to the prohibitive energy consumption in-volved with keeping the sensor active continually, and the time required to acquire a ‘fix’ when duty cycling is used. Another issue is that GPS is not available in-doors, which is where most people spend most of their time.

These problems have prompted investigation into alternative forms of localization, such as the use of motion sensors (i.e. accelerometers) and Wi-Fi scanning. Neither of these sensors are meant to be used to estimate location directly however, so that intelli-gent processing is needed to obtain good results. In particular, there is an inherent ener-gy/accuracy trade-off that needs to be investigated and quantified before such systems can be used in the field.

1.2.2. E

XPERIMENTATION AT

S

CALE

Mobile phone sensing holds great promise, but bridging the gap between idea and re-sult is decidedly non-trivial because of a lack of experimental infrastructure. Consider that bringing a new sensing application into the world requires a researcher to write the application, upload it to an app store, recruit volunteers, and then wait for results to come back. These steps, development, deployment, recruitment, and data collection all introduce overhead detrimental to the process.

Development requires domain knowledge related to mobile phones, such as alarm timers, wake locks, wireless networking, and the like. There are many subtleties involved that may not be relevant to the experiment and should ideally be hidden from a devel-oper simply interested in collecting sensor data. Typical app stores such as the Google Play Store [49] and Apple’s App Store [58] provide a convenient way to deploy the app once built, but new versions may take days to propagate as users have to either manu-ally update, or relaunch the application after it has been updated automaticmanu-ally. A quick development cycle is therefore out of the question.

On the organizational side, the process of recruiting and incentivizing volunteers places another burden on the researcher. Ideally one would find enthusiastic guinnea-pigs willing to share their data for free, but in practice one may expect to offer some form of compensation, e.g. monetary, study credit, raffle prizes, etc. Finally, collecting all the data at a central point for off-line processing involves setting up a server available to the outside world, and having the client devices upload their data to it. Without the ability to rapidly prototype algorithms and ideas researchers run the risk of getting stuck in an endless loop of organizational overhead.

1.2.3. E

NERGY

A

NALYSIS

Where on traditional systems such as servers and desktops performance is typically mea-sured in terms of CPU efficiency, mobile systems are primarily bound by battery capac-ity. Applications and algorithms therefore need to be energy-efficient in order to be vi-able. An application that drains the battery too quickly runs the risk of being uninstalled. Analyzing the power consumption of modern smartphones is a difficult problem

(18)

be-cause of the complex interplay between soft- and hardware. Power analysis depends on measuring both power consumption and system state simultaneously and correlat-ing the two. Unfortunately there is a distinct lack of proper toolcorrelat-ing, which limits our understanding of the energy footprint of algorithms, and makes it hard to track down energy-related bugs.

External power measurement tools such as Monsoon’s Power Monitor [88] offer pre-cision, but constrain the mobility of the device. Internal voltage/current sensors on the other hand suffer from limited accuracy and frequency. Attempts at building accurate models to augment the internal sensor’s lack of detail have met with limited success be-cause of unobservable state (e.g. the GSM modem’s power state), and non-linear power consumption (e.g. OLED displays).

1.3. T

HESIS

C

ONTRIBUTIONS AND

O

UTLINE

In this thesis we make the following contributions in the fields of localization, MPS testbed organization, and energy analysis.

Dwelling Detection In Chapter2we investigate dwelling, which is the act of non move-ment. People are dwelling when they are at home, in the office, in a restaurant, and so on. Mobility studies have shown that we are dwelling on average 89% of the time [73], and the places where we dwell convey a lot of information about our lives [99]. We com-pare three different localization technologies: i) GPS, ii) Wi-Fi scanning, and iii) Wi-Fi geolocation, and evaluate their performance on the tasks of dwelling detection (tempo-ral) and location clustering (spatial). We show that Wi-Fi and GPS are complementary on both tasks primarily because of coverage, with Wi-Fi being available in-doors, and GPS outside in areas such as parks. We also evaluate the effects of sampling rate, and show that intervals in the order of tens of seconds still show good performance, which allows applications to save energy by sampling less often.

Incremental Scanning The energy cost of Wi-Fi based localization comes from two

seper-ate operations: a scan to discover nearby access points, and a query to an internet service to convert this scan result into a geographical coordinates [47,118]. In many MPS appli-cations, such as tracking user location over time in order to learn behavioral patterns and points of interest, the second operation can be batched or even performed post-facto. In these scenarios the scanning operation accounts for the largest part of the energy cost.

In Chapter3we propose a new method of scanning that reduces energy consump-tion by more than half on modern smartphones. Our incremental scanning algorithm visits channels in turn and terminates early when i) enough information has been col-lected to determine user location, or ii) it can be reliably determined that the user has not moved since the previous scan. Our technique effectively doubles the amount of data collected for a given energy budget, or can be used to reduce the energy needs of a localization-oriented MPS application.

Mobile Phone Sensing Middleware Deployment of mobile phone sensing experiments

onto a large set of devices places technological, organizational, and sometimes financial burdens on researchers, making real-world experimental research difficult and

(19)

cumber-some. Chapter4describes the design, implementation, and evaluation of Pogo, an ap-plication for Android devices that turns a modern smartphone into a mobile sensor as part of a large-scale test bed. Pogo is designed to unlock experimental research using mobile phones to non-domain specialists such as behavioral scientists, offering a com-prehensive and complete tool set.

Mobile Power Metering In Chapter5we address the challenges of mobile power anal-ysis with a novel power metering toolkit called Neat. Neat comprises a voltage- and current measurement board that fits inside a typical Android smartphone, and analy-sis software that automatically fuses event logs taken from the phone with the obtained power trace. The power meter samples voltage, current, and a separate trigger channel at 2kHz, and writes out traces to a micro-SD card. The trigger channel is used for time-synchronization with an event logger tool, and is connected to a free IO pin on the device (e.g. the buzzer motor output). We use commercially available back covers designed for double-sized replacement batteries, and use the extra space to fit the power meter and its own battery inside, yielding a fully usable phone. In this way Neat makes it possible to obtain day-long traces from free-roaming smartphones.

We found that by automatically annotating the power trace, Neat greatly simplifies the analysis and optimization of short-lived energy patterns such as CPU wake-up and suspend cycles. Components whose power draw is a higher-order function of multiple parameters, such as OLED panels, are notoriously difficult to evaluate because accurate measurements are required for many different parameter configurations. Neat greatly simplifies this process by offering a scriptable environment for post-facto processing of traces obtained from controlled experiments. Finally, the onobtrusive nature of Neat allowed us to crowd-source fine-grained power measurements from end users to inves-tigate the correlation between 3G energy consumption and signal strength.

Chapters2, 3, 4, and 5are based on the following papers:

• N. Brouwers and M. Woehrle. Dwelling in the Canyons: Dwelling Detection in Ur-ban Environments using GPS, Wi-Fi, and Geolocation. Pervasive and Mobile Com-puting, Special issue on Pervasive Urban Applications (Chapter 2)

• N. Brouwers, M. Zuniga, and K.G. Langendoen. Incremental Wi-Fi Scanning for Energy-Efficient Localization. IEEE PerCom 2014 (Chapter 3)

• N. Brouwers and K.G. Langendoen. Pogo, a Middleware for Mobile Phone Sensing. ACM MiddleWare 2012 (Chapter 4)

• N. Brouwers, M. Zuniga, and K.G. Langendoen. NEAT: A Novel Energy Analysis Toolkit for Free-Roaming Smartphones. ACM SenSys 2014 (Chapter 5)

(20)

(21)

2

D

WELLING IN THE

C

ANYONS

Understanding human mobility in urban environments is crucial for many different ap-plication areas such as traffic prediction, city planning, and for determining social in-teractions. Therefore human mobility has been widely empirically studied in the social sciences, e. g., [46,120]. Note that understanding mobility has two components: (i) Understanding how we move, i. e., determining transportation modes [106]. (ii) Under-standing where we stop, i. e., determining the (important) points of interests (POIs) in our life [120]. For determining whether we stop at a POI, we need to distinguish whether we are dwelling at a location, e. g., at home, a local park or at a supermarket, or if we are mobile. Dwelling identifies that the user is in a locally constrained environment yet not particularly still or stationary. The focus of this chapter is to determine whether users are dwelling based on traces collected on their mobile phones.

In previous work, empirical data has often been very coarse-granular, e. g., GSM cell tower information [119] or merely cellphone call data [46,120]. However, modern smart-phones provide a wealth of sensor data including GPS and Wi-Fi connectivity. Social sci-ences can benefit from these additional ‘sensors’ by increasing the fidelity of models of human mobility. As with any sensor technology, the information measured is subject to noise, uncertainty and availability issues. Additionally, sensing, e. g., using the GPS chip, consumes energy. This may negatively impact user experience by draining the battery; hence, the use of sensors needs to be carefully examined. Moreover the rate of sensing is a major factor for power consumption.

Various sensor modalities have been employed for distinguishing between dwelling and mobility, most notably accelerometers [104,106], GPS [106], and signal strength readings from Wi-Fi and GSM cell towers [89,91,119]. However, the use of geoloca-tion services as a sensor for user locageoloca-tion has often not been considered in these works. Geolocation services provide a location estimate based on scanned Wi-Fi fingerprints. While geolocation is not generally available, it is a prime sensor candidate in urban re-gions where Wi-Fi access points (APs) are densely deployed [25].

This chapter contains a comparative study of three different sensors and their quality w. r. t. determining whether a user was dwelling, and where. The sensors we consider are

(22)

GPS, the ‘raw’ Wi-Fi scan information about surrounding APs, and a geolocation service based on Wi-Fi data.

The contributions of this chapter can be summarized as:

1. We compare data from three sensors for determining whether users were mobile or dwelling and the corresponding POIs based on off-line analysis of mobile phone measurement traces.

2. We survey 4 different methods for classification and 14 POI extraction strategies and study their relative performance on sensor data collected in urban environ-ments in four european countries.

3. We study the effects of subsampling of the traces in order to investigate the effect of sensing rates on detection performance.

4. We present idiosyncrasies of the sensor types and identify that the different sen-sors can also be used complementary.

The rest of this chapter is structured as follows. Section2.1discusses the data we col-lected and details the sensors and their corresponding features that we utilize. In Sec-tion2.2we present an evaluation of dwelling- and POI detection on the collected data. An overview of related work is presented in Section2.3. Conclusions and directions for future work are presented in Section2.4.

2.1. A

PPROACH

We collected an extensive data set from the mobile phones of seven users. We first de-scribe the sensor sources that we sampled and then characterize the data set.

2.1.1. S

ENSORS

We consider three different sensors: GPS, Wi-Fi and geolocation. We extract several fea-tures from each sensor source that we describe in the following. Note that we refrain from using additional sensor sources such as accelerometers and cell-tower informa-tion. While these may improve the results, we focus in this chapter on the relative per-formance of GPS, Wi-Fi and geolocation.

GPS

Most, if not all, modern smart phones come equipped with GPS sensors. These provide accurate measurements of both position and speed in open space, but signal quality is reduced or completely lost in indoor environments. Moreover, phone users tend to keep GPS turned off when not in use to avoid battery drain. When the GPS signal is available however, it tends to be a very good candidate for differentiating between dwelling and mobility [106].

We extract the following GPS features for GPS: (i) Measured speed provided directly by the GPS, (ii) speed calculated from the distance between GPS locations (start/end location within a given time window), (iii) the difference between calculated and mea-sured speed, (iv) a binary sensor that shows whether the GPS had a fix, i. e., whether

(23)

the GPS could acquire sufficient satellite signals to determine an accurate position, (v) the number of GPS satellites available for a given measurement and (vi) the number of location samples around the current location within a specific radius r .

WI-FI

Continuous scanning for Wi-Fi APs has been used in context-aware computing to de-tect user mobility. This method is attractive because it can be performed on-line and in real-time, both desirable qualities for this class of applications. Wi-Fi scan results, also called fingerprints, consist of a list of APs and corresponding RSS, where signal strength is measured in d B m. Two commonly used functions for finding similarity between two fingerprint vectors ~f1, ~f2are Cosine-similarity C and the Tanimoto-coefficient [109] T

shown below: C (~f1, ~f2) = ~f1· ~f2 k~f1kk~f2k (2.1) T (~f1, ~f2) = ~f1· ~f2 k~f1k2+ k~f2k2− ~f1· ~f2 (2.2) In order to use these measures for fingerprints we need to map raw RSS values to a rel-ative measure s ∈ [0,1] called the relrel-ative strength value. The relrel-ative strength value s is computed by:

s = RSS − RSSmin RSSmax− RSSmin

(2.3) where RSSminand RSSmaxare upper and lower signal strength bounds. Since k~f k2= ~f ·~f , we only need to define the dot product of two fingerprints to be able to use equations (2.1) and (2.2). We calculate this product by multiplying the relative strength values of the Wi-Fi AP they have in common, and taking their sum. Another way of measuring the strength of an AP is to measure its response rate, which is a fraction of how how often a given AP was found in a given time window. A case is made in [18] that this is a more robust metric for distance estimation than RSSI, which is why we used it here.

Several classification features have been discussed in literature, of which we have se-lected the following Wi-Fi features: (i) The Euclidean distance of relative signal strength values [91], (ii) the number of Wi-Fi APs that are in the fingerprint (scan result) [90], (iii) the Jaccard index as a measure of similarity between consecutive fingerprints, (iv) the Tanimoto-coefficient [68] and (v) Cosine-similarity [68] applied to signal strength, the (vi) sum-of-squares of differences in AP response rate [18], and (vi) the Tanimoto-coefficient applied to AP response rate. The number of APs found in a scan allows a clas-sifier to distinguish between areas of low- and high AP density and change its behavior accordingly. The other features are all measures of similarity between consecutive scans, based either on RSSI or response rate. An exception is the Jaccard index [64] which is a measure of set similarity.

In the context of clustering, it is important to note that the mean of a set of finger-prints cannot always be mapped back onto a geographical location, and therefore the

(24)

operation does not have a spatial meaning. Consider a set of two fingerprints with each a different single entry for an access point with a relative strength value of s = 1. The mean of this hypothetical set would contain both access points with a signal strength of s = 0.5 for each, because each point is perfect (s = 1) in one fingerprint and non-existent, i. e., s = 0, in the other. However, if these APs are several kilometers apart, there is no location where such a fingerprint could be measured. In other words, the set of measur-able fingerprints is not closed under the mean operation.

GEOLOCATION

An alternative to using Wi-Fi scan results directly is to pass them into a localization ser-vice such as Google’s geolocation API [47] or Skyhook Wireless’ localization service [118]. These services use large databases of location-annotated Wi-Fi fingerprints to compute a user location based on Wi-Fi scan results. In this way the Wi-Fi chip can act as a ‘poor man’s’ GPS, providing estimates of user location. Google trains its database using a back-ground service built into Android devices that reports GPS coordinates and Wi-Fi scan results to their servers at regular intervals. The results that we will present in this chapter indicate that Google’s geolocation provides good accuracy and broad coverage. Because of this and the open nature of the API, we chose this service for our experiments.1

We extract the following geolocation features: (i) the speed calculated from location distances, and (ii) the number of location samples around the current location within a specific radius rloc. Note that these features are corresponding to those based on GPS locations.

FEATURE EXTRACTION

We extract the complete set of features for each of the sensor sources. The feature extrac-tion has three distinct parameters: (i) a window size as a time interval over which each feature is computed, (ii) an RSSI threshold parameter for Wi-Fi APs below which APs are ignored and (iii) the density parameter rl oc that is used for computing the number of location samples around the current location for geolocation and GPS.

Features are extracted for each sample. Since our collection application samples fine-granular at a 2s sample interval for Wi-Fi and 1s for GPS, we may also look at the impact of subsampling the sensor sources. Subsampling selects a subset of the samples such that the interval between these samples is a multiple of the original sample trace. Subsampling is attractive, since it reduces energy consumption, geolocation overhead, and the amount of data to be stored. Hence, subsampling allows us to trade-off im-plementation costs for the fidelity of our classification and clustering approaches. We perform subsampling by removing elements from a trace such that the time between consecutive data points is increased to the desired sampling interval. The feature ex-traction is then performed on this subsampled trace.

2.1.2. D

ATA COLLECTION

We collected data traces from various locations in over a dozen cities in The Netherlands, Germany, Denmark, and Switzerland using a custom Android application. Seven users 1_{While geolocation is based on Wi-Fi, the introduction of the external geolocation service (and its localization}

(25)

User Phone POIs

Total Unique Indoors Outdoors

User 1 Motorola Defy 14 7 7 0

User 2 Samsung Galaxy S 4 2 2 0

User 3 Sony Ericcson X10 Mini 9 4 3 1 User 4 SE X10 Mini, ZTE Blade 191 84 56 28 User 5 SE X10 Mini, ZTE Blade 26 21 14 7 User 6 Sony Ericcson X10 Mini 35 11 9 2

User 7 Samsung Galaxy Ace 5 3 3 0

Total 284 132 94 38

Table 2.1: Overview of the users in our study, the Android phones used, and the number of POIs visited.

Country City Population Density (per km2)

Denmark Aalborg 124,921 888 Germany Stuttgart 606,588 2,925 Ulm 122,801 1,035 Esslingen 91,869 1,979 Ludwigsburg 87,735 2,025 Bietigheim-Bissingen 42,810 1,368 Erbach 13,218 207 Netherlands Amsterdam 783,364 3,506 Rotterdam 616,003 2,850 The Hague 500,000 5,894 Delft 96,168 4,180 Rijswijk 47,117 3,354 Drunen 17.783 544 Switzerland Zurich 372,047 4,049

Table 2.2: Overview of cities visited in our study. Population and density data from Wikipedia (http://en.wikipedia.org/)

collected GPS and Wi-Fi data on several types of Android phones. All of these phones support assisted GPS. The users are knowledge workers at a university and were asked to log and annotate parts of their day where they traveled to some place. During the col-lection process users manually annotated traces with their activity (walking, dwelling, ...) using the Android application. Since the emphasis is on detecting dwelling and POIs, users visited favorite locations such as supermarkets, bars, tram stops, university build-ings, and homes. Dwelling was explained to users as ‘staying at a certain place for some minutes’. This means waiting for the bus is dwelling, but waiting at a traffic light isn’t. Users additionally annotated their POI. The granularity of these annotations was build-ings, i. e., street addresses, for both indoor and outdoor POIs. Note that we distinguish between the concept of POI – some place where the user dwelled for a certain amount of time such as a bus stop or at home – and a significant place [5] – a POI that is impor-tant for the person, have some significance to her and that she returns to, such as home, work, and a favorite bar. Table2.1gives an overview of the users that participated in our data collection campaign.

Table2.2outlines some the cities our users visited. We collected data in different-sized cities (from ≈ 13,000 to about 800,000 inhabitants) across Western Europe with highly varying population density (from ≈ 200 to about 6,000 inhabitants per km2). While

(26)

Users Phone Traces Samples POIs

types Total Unique Indoors Outdoors

7 5 142 229,417 284 132 94 38

Table 2.3: Overview of the collected data from four european countries. Note that we selected traces for a variety in activity and points of interests.

it is hard to quantify the density of buildings and urban canyons for the different cities, we can see from the population density in the table that dutch cities and Zurich are gen-erally denser than the german cities and Aalborg. As such our selection combines very dense urban areas, e. g., The Hague and Delft, and more open areas such as Aalborg and Ulm. In the following we discuss our data collection approach and the sensor data id-iosyncrasies we identified in the collected data traces.

DATA OVERVIEW

Our collection software scanned for Wi-Fi APs at 2 second intervals; for each scan we recorded the returned list of APs and their signal strengths, along with the most recent GPS measurement. Note that the GPS hardware actually samples with 1 Hz. While we generally use 2s intervals for feature extraction, we use the higher GPS granularity for POI extraction. The raw traces were sanitized by removing GPS outliers as well as Wi-Fi beacons from locally administered APs as described by Kim et al. [68] in order to rely on fixed APs only. We then obtained geolocation data by passing the Wi-Fi scan results to the Google geolocation service [47]. The returned result is a location and an estimate of its accuracy. Querying can be done either on-line from the phone if an Internet connection is available, or off-line at a central server.

Table2.3summarizes the data we collected over the course of four months. We se-lected 142 traces, containing a total of 229,417 measurement samples. These traces in-clude 284 dwelling locations; 128 locations are unique, i. e., some locations are visited multiple times like the users’ home and favorite cafés. 90 of these unique locations are indoors, while 38 are outdoors. Users annotated locations post-facto by providing a tex-tual description (including whether it was in- or outdoors) as well as longitude and lati-tude for each POI2.

COVERAGE

We characterize coverage of GPS and Wi-Fi based on our extensive data set. Note that Wi-Fi coverage obviously also influences the geolocation results. Table2.4summarizes the GPS and Wi-Fi coverage for the different activities found in our data set.

Most (≈2₃) of the time the users were dwelling. Wi-Fi coverage at these POIs is very good, better than GPS. GPS is particularly hampered indoors, yet still available. We de-tail on coverage at POIs below. For walking and cycling in urban environments, we can see that coverage is good. Coverage while running, in particular for Wi-Fi, is lower as there is limited (or no) coverage in parks and forests. Coverage in transportation is rea-sonable and only deteriorates when exiting densely populated areas. However there is almost no coverage in subways, with even Wi-Fi having a mere 25.6%; this high number is the result of including stations/platforms for all transportation activities. Note that 2_{Longitude and latitude were determined using Google Maps [}₄₈_].

(27)

Activity Samples Coverage (%)

Count Fraction Wi-Fi GPS

Dwelling 155,050 67.6 95.6 49.2 Walking 52,345 22.8 93.1 88.3 Driving 10,664 4.6 72.1 83.7 Running 3,384 1.5 81.0 92.8 Cycling 3,234 1.4 95.3 74.6 Train 2,696 1.2 77.9 65.3 Tram 1,931 0.8 87.2 87.7 Subway 113 < 0.1 25.7 1.8 Table 2.4: Wi-Fi and GPS coverage for different activities.

0 20 40 60 80 100 WiFi GPS Indoors Coverage percentage 0 20 40 60 80 100 WiFi GPS Outdoors Coverage percentage

Figure 2.1: Coverage (%) of Wi-Fi and GPS location services for indoor and outdoor locations.

the samples collected for some of the mobile activities are rather low. However, in this chapter we only differentiate between mobile and dwelling and as such do not consider the differences among the mobile activities.

An important question is how often sensors actually provide a valid sensor value, i. e., if GPS returns a position or a Wi-Fi scan returns at least one AP. We term the cov-erage of the sensors the fraction of sensor samples that returned a valid sensor value. Figure2.1depicts the coverage at the unique POIs. Note that GPS coverage is hampered at indoor locations, yet still often available. Moreover, GPS coverage is constrained out-doors, e. g., due to urban canyons. In contrast, Wi-Fi coverage is generally better than GPS in urban environments. As we can see, there are only a few outdoor locations such as parks that may exhibit low Wi-Fi coverage. The outliers that we see in Wi-Fi indoor coverage are only found in smaller cities (≤ 50,000 inhabitants).

In order to understand why we find good Wi-Fi coverage we look at the distribution of Wi-Fi APs in our traces. Figure2.2shows an empirical cumulative distribution function (ECDF) of APs for each Wi-Fi scan. The three annotated data points in the figure indicate that (i) 6.6% of the scans contained no AP at all, (ii) most of the scans, i. e., over 95.5% of the scans found up to 50 APs and (iii) there is a long tail with up to 151 APs that were scanned near the university in downtown Zurich, Switzerland. We also see a step initial increase of APs per scan up to about 25 with a probability of 83.3%.

(28)

0 50 100 150 0 0.2 0.4 0.6 0.8 1 P[Scanned <= X] Number of APs: X

Figure 2.2: Empirical cumulative distribution function of APs per Wi-Fi scan.

2.1.3. S

ENSOR IDIOSYNCRASIES

During our data collection process we observed a number sensor-specific artifacts worth mentioning.

GPS: GPS locations are not always perfect. Figure2.3(a)shows the GPS trace drifting sig-nificantly through housing blocks. The geolocation estimates show the true path taken. On the other hand, we found that we were able to get a GPS lock in a surprisingly large number of indoor locations. For example, Figure2.3(b)shows a trace from a supermar-ket. This is an example for the coverage results in Figure2.1and shows that the assump-tion that GPS is always lost indoors, which has been used for locaassump-tion extracassump-tion in [5], simply does not hold.

Geolocation: The quality of Google’s geolocation service depends on the accuracy of

their database and how many APs are in range at a given location. We identified two in-teresting phenomena when inspecting the geolocation data. First, when a user is moving through a city at constant speed (e.g. walking), the expected localization result would be a set of points spread out at regular intervals. Instead, we found that the returned loca-tions tend to ‘clump’ together at points along the route, as illustrated in Figure2.3(a)and Figure2.3(c). The second phenomenon we found for dwelling locations was that the re-ported location sometimes ‘jumped’ between two points spaces several hundred meters apart when the user was dwelling, as shown in the example in Figure2.3(d).

2.2. E

VALUATION

Based on the collected data, we performed two experiments: (i) classification of user state, i. e., whether a user is mobile or dwelling, and (ii) extracting points of interest. Note that these two problems are closely related, yet take a different approach. While the first merely relies on a single data point at a given instant in time, the second ap-proach focuses on the distribution of sensed data and their relation. Another difference is that for classification, we perform supervised learning and rely on user annotations in the traces. Since we are focusing on dwelling in this chapter, we label all non-dwelling activities (walking, cycling, ...) as mobile.

For evaluating our classification methods on dwelling, we use the following defini-tions; a true positive (TP) means that we detect dwelling, and the user is actually dwelling in the ground truth. If we detect dwelling (or a dwelling location respectively), yet the ground truth indicates that actually the user was mobile this was a false positive (FP). False negatives (FN) and true negatives (TN) follow the same reasoning for the mobile

(29)

(a) GPS Drift (b) Supermarket

(c) Walking (d) University Building

Figure 2.3: Sensor idiosyncrasies. GPS and geolocation measurements shown in blue crosses and magenta circles, respectively. Note that in (d) there are two geolocation clusters.

case. Note that the definition for dwelling locations follows accordingly, e. g., a visited location that is detected is a TP.

Based on these definitions, we use the following standard metrics to quantify our results:

• Precision: pr =_{T P +F P}T P

Precision indicates how selective our dwelling detection is. • Recall: rc =_{T P +F N}T P

Recall shows whether there are actual dwelling instances that the detection mech-anism misses.

• F1-score: F1= 2 ·_pr+rcpr·rc

F1score is the harmonic mean of both precision and recall. Note that we are

typ-ically not interested in methods that have a high precision and poor recall or vice versa. Hence, we use the F1-score as a summary measure.

(30)

2.2.1. C

L ASSIFICATION

In the following, we perform classification on the set of features of each individual sensor as well as on combinations thereof. In particular, we want to distinguish whether a user is mobile or dwelling.

EXPERIMENTALSETUP

We focus on supervised learning based on two class labels3. We compare several classi-fiers inspired by the work of Reddy et al. [106]:

1. Decision Trees, in particular the J48 implementation of the C4.5 decision tree al-gorithm [103].

2. Two-stage models that use decision trees as described above as the first stage. As a second stage, we use empirically trained hidden Markov models (HMMs). We dis-tinguish two approaches: (i) A discrete HMM (DHMM) that uses the classification results of the first stage and (ii) a posterior HMM (PHMM) that uses the posterior probabilities determined in the first stage classification.

3. A continuous hidden Markov model (CHMM) that models individual features as independent gaussian distributions.

For the decision tree classification we use the weka data mining toolset [51]. For ex-periments with HMM models, we use GHMM [112]. Classification results are obtained using stratified 10-fold cross-validation. Hence, we generate traces of homogeneous length and create folds of approximately equal size and distribution of class values.

Note that in general we build all models on the full feature set for a given sensor, i. e., we do not perform feature selection4. Since we only distinguish between mobile and dwelling, all HMM models have these two activities as hidden states. In this study, we restrict ourselves to the class of Gaussian output CHMMs, which do not allow us to use many of the discrete features available. Hence, for the CHMM we merely use a single feature, in particular the speed for GPS and geolocation and the euclidian fingerprint for the Wi-Fi classifier.

As previously discussed, the feature extraction has three distinct parameters: a RSS threshold for Wi-Fi APs, a density-related parameter r for the determining the number of close-by location samples for geolocation and GPS, and most importantly the window size used for feature extraction. In our previous work [10], we explored different param-eter settings and compared the results based on the hyper-volume of the solution set. Our exploration yielded good results for an RSS threshold of -80 dBm and a radius r of 20m. In the following we show the sensitivity of the results based on the window size. Figures2.4(a),2.4(b)and2.4(c)show the F1score of the classifiers dependent on the

window size for the three sensor sources. Please note that the y-axis starts at a F1 score of 0.75, so the differences between individual results are visually extended. As a single 3_{A more fine-granular learning variant would be possible using all different transportation mode labels and}

mapping results after classification. For the sake of simplicity of our comparative study, we refrain from in-vestigating this variant.

4_{We performed experiments using principal component analysis and greedy feature selection in weka. Both}

(31)

10 20 30 40 50 60 70 80 90 100 110 120 0.75 0.8 0.85 0.9 0.95 1

Window size (in s)

F1 score DT DHMM PHMM CHMM (a) GPS sensitivity 10 20 30 40 50 60 70 80 90 100 110 120 0.75 0.8 0.85 0.9 0.95 1

Window size (in s)

F1 score DT DHMM PHMM CHMM (b) Geolocation sensitivity 10 20 30 40 50 60 70 80 90 100 110 120 0.75 0.8 0.85 0.9 0.95 1

Window size (in s)

F1 score DT

DHMM PHMM CHMM

(c) Wi-Fi sensitivity

Figure 2.4: Classification results dependent on the window size of feature extraction. Note that the y-axis starts at 0.75, hence differences are rather small. We explore from a window size of 4s, i. e., two samples, to 120s, i. e., 60 samples.

example in Fig.2.4(b), the differences for the CHMM for the Wi-Fi sensor are within a mere 0.025, i. e., small compared to the actual F1 score. We can see in the figure that the GPS results are largely insensitive to the window size. The classification results based on Wi-Fi and geolocation data benefit from a larger window size, especially the Markov models. As such we select 60s as the window size and analyze the results in more detail in the following.

DETAILEDCL ASSIFICATION RESULTS

Table2.5summarizes the results of the different classification approaches including pre-cision and recall. We can see from the table that the depre-cision tree works equally well for all sensor sources. We can improve the results (marginally) by using a sequential model

(32)

Method Sensor(s) F1score Precision Recall CHMM GPS 0.876 0.759 0.983 Geoloc 0.894 0.819 0.985 Wi-Fi 0.821 0.810 0.832 GPS/Geoloc 0.884 0.801 0.987 Wi-Fi/Geoloc 0.898 0.826 0.985 GPS/Wi-Fi 0.880 0.797 0.982 DHMM GPS 0.95 0.943 0.958 Geoloc 0.965 0.963 0.966 Wi-Fi 0.948 0.958 0.938 GPS/Geoloc 0.972 0.972 0.972 Wi-Fi/Geoloc 0.966 0.968 0.964 GPS/Wi-Fi 0.963 0.969 0.957 PHMM GPS 0.95 0.942 0.958 Geoloc 0.965 0.964 0.965 Wi-Fi 0.947 0.957 0.938 GPS/Geoloc 0.970 0.969 0.971 Wi-Fi/Geoloc 0.963 0.963 0.962 GPS/Wi-Fi 0.960 0.965 0.955 DT GPS 0.947 0.938 0.957 Geoloc 0.962 0.959 0.965 WiFi 0.941 0.945 0.936 GPS/Geoloc 0.970 0.969 0.970 WiFi/Geoloc 0.962 0.962 0.962 GPS/WiFi 0.960 0.964 0.955

Table 2.5: Detailed classification results of F1, precision and recall for each sensor source and combinations

thereof. All results are for window sizes of 60s, a radius of 20m and a Wi-Fi threshold of −80dBm.

on top of these results (either DHMM or PHMM). Another interesting aspect is that re-sults based on geolocation are the best, even better than GPS. While Wi-Fi rere-sults are usually the worst the results are comparable to the other sensor sources. As we would expect, combining sensor sources improves the results. Hence, the best results are typ-ically achieved by combining GPS and geolocation information. We see that precision and recall are fairly balanced, with recall being slightly better for classifiers based on GPS and geolocation. Among the classifiers, only CHHM shows a considerably better re-call than precision. Since Markov models favor staying in a certain (activity) state and we have proportionally more sensor data from dwelling, we would assume that these mod-els rather generate more FP than FN and are more apt to having a higher recall. CHMM perform the worst of all classifiers, probably due to the fact that they do not benefit from the full set of features. This is particularly visible for the Wi-Fi CHMM only relying on the Euclidian fingerprint that (for all window sizes) performs worse than the CHMMs based on geolocation and GPS data.

SUBSAMPLING

We also investigate the impact of subsampling on classification performance. Figure2.4 summarizes the results for different sampling intervals and classification approaches. Please note that a sample interval of 2 s corresponds to using the raw trace.

We can see that subsampling has a negative effect on CHMM classification results for sample intervals < 10s. For all other classifiers the results are fairly constant. Counter to

(33)

5 10 15 20 25 30 0.75 0.8 0.85 0.9 0.95 1

Sample interval (in s)

F1 score DT DHMM PHMM CHMM (a) GPS sensitivity 5 10 15 20 25 30 0.75 0.8 0.85 0.9 0.95 1

F1 score DT DHMM PHMM CHMM (b) Geolocation sensitivity 5 10 15 20 25 30 0.75 0.8 0.85 0.9 0.95 1

F1 score DT

DHMM PHMM CHMM

(c) Wi-Fi sensitivity

Figure 2.5: Classification results dependent on the sampling interval given a window size of 60s. Note that the y-axis starts at 0.8, hence differences are rather small.

the intuition that subsampling should deteriorate classification fidelity, for most sources and methods the F1score actually improves slightly. We hypothesize that the reason

for this improvement is due to the fact that subsampling removes hard-to-classify data points on transitions between activities.

In order to investigate this hypothesis, we look at how the transitions between states influence the classification. To this end, we remove the transitions around activity changes from traces. Table2.6compares the original and subsampling of 30s results to traces where we removed 30s, and 60s respectively, before and after each transition between ac-tivities. Note that the transition removal deletes proportionally more mobile data points

(34)

Method Sensor(s) 2s Samples 30s (Sub-) 2s Samples 2s Samples

Sampling Rem. 30s Rem. 60s

CHMM GPS 0.876 0.888 0.878 0.882 Geoloc 0.894 0.890 0.890 0.894 Wi-Fi 0.821 0.861 0.840 0.849 GPS/Geoloc 0.884 0.895 0.884 0.888 Wi-Fi/Geoloc 0.898 0.892 0.894 0.896 GPS/Wi-Fi 0.880 0.888 0.880 0.883 DHMM GPS 0.950 0.970 0.962 0.967 Geoloc 0.965 0.973 0.976 0.98 Wi-Fi 0.948 0.967 0.965 0.968 GPS/Geoloc 0.972 0.982 0.982 0.987 Wi-Fi/Geoloc 0.966 0.976 0.979 0.982 GPS/Wi-Fi 0.963 0.977 0.979 0.981 PHMM GPS 0.950 0.965 0.962 0.967 Geoloc 0.965 0.971 0.976 0.980 Wi-Fi 0.947 0.967 0.964 0.967 GPS/Geoloc 0.970 0.982 0.981 0.986 Wi-Fi/Geoloc 0.963 0.972 0.977 0.981 GPS/Wi-Fi 0.960 0.974 0.977 0.980 DT GPS 0.947 0.964 0.960 0.965 DT Geoloc 0.962 0.969 0.974 0.979 DT WiFi 0.941 0.952 0.959 0.963 DT GPS/Geoloc 0.970 0.981 0.982 0.987 DT WiFi/Geoloc 0.962 0.971 0.977 0.982 DT GPS/WiFi 0.960 0.973 0.973 0.980

Table 2.6: F1results for all classifiers comparing results without subsampling (2s samples), subsampling of 30s

and results where we remove trace data around activity transitions for 30s and 60s respectively.

than dwelling data points as the ratio of dwelling to mobile is approximately 2:1. We can see that the transition removal improves the classification results: for most classifiers it holds that the more we remove, the better the results. This supports our hypothesis why subsampling performs better than classification on fine-granular traces. This is particular evident for the sequential (HMM) models that exhibit better fidelity when removing transitions. Note that DT actually has slightly worse F1scores for 60s

extraction. This is actually due to a slightly worse precision; this means that these models have a few more false positives. As the transition-extracted traces feature proportionally less mobile sensor data it seems that this change in distribution deteriorates the learning of mobility.

FINDINGS

We have seen in the experiments that classification works very well across all sensor sources. Surprisingly, geolocation seems even slightly better than the other sensor sources. Nevertheless, best results are achieved by combining sensor sources. All classification methods perform similar (except the continuous HMM). Among the classification meth-ods the simple decision trees work surprisingly well. By adding an additional temporal notion using DHMMs or PHMMs the results only improve marginally. However, we need to consider that features are generated over time windows and hence already incorpo-rate some temporal notion. Subsampling does not hurt the accuracy of the classification. We identified that the reason for this is that by sampling less often we typically avoid

(35)

Sensor Strategy Algorithm Metric Parameters GPS ASHBROOK-GPS Ashbrook Euclidian r = 300m, tmi n= 600s HARIHARAN-GPS Hariharan r = 75m, tmi n= 540s TIMEBASED-GPS Time-based r = 300m, tmi n= 120s

DJ-CLUSTER-GPS DJ-Cluster r = 50m, mi nPoi nt s = 210 DBSCAN-GPS DBSCAN r = 50m, mi nPoi nt s = 210 MEANSHIFT-GPS Mean-Shift σ = 50, θ = 200 Geoloc ASHBROOK-GEO Ashbrook Euclidian r = 200m, tmi n= 480s HARIHARAN-GEO Hariharan r = 50m, tmi n= 1080s TIMEBASED-GEO Time-based r = 360m, tmi n= 60s

DJ-CLUSTER-GEO DJ-Cluster r = 25m, mi nPoi nt s = 120 DBSCAN-GEO DBSCAN r = 25m, mi nPoi nt s = 120 MEANSHIFT-GEO Mean-Shift σ = 100, θ = 200 Wi-Fi DBSCAN-TM DBSCAN Tanimoto r = 0.40, mi nPoi nt s = 155

DBSCAN-CS DBSCAN Cosine r = 0.55, mi nPoi nt s = 155 Table 2.7: Clustering strategies under evaluation.

difficult-to-classify state transitions. Hence, we actually can benefit from sampling less often to save energy without hurting the classification performance.

2.2.2. D

WELLING

L

OCATIONS

We apply clustering algorithms to our data set to find POIs. We evaluate 14 different combinations of clustering algorithms and sensors, which we shall refer to as clustering strategies. Such a strategy takes a data trace as input and returns a set of dwelling loca-tions. Evaluation is based on how well the found dwelling locations match the ground truth.

Several clustering algorithms for extracting POIs have been proposed in literature. We implemented six different ones; Ashbrook’s algorithm et al. [5], Hariharan et al. [53], the time-based clustering algorithm found in [66], the DJ-Cluster algorithm from [128], DBSCAN [110], and mean-shift [23].

EXPERIMENTALSETUP

We are interested in how well each strategy performs at detecting POIs, as well as in how accurate the returned locations are w. r. t. the ground truth. To measure detection performance, we count how many ground-truth locations are correctly identified by the algorithm being evaluated. If a returned cluster is found within 100m of a ground-truth location we count this as a true positive (TP). If there are more than one such cluster only the closest one is considered valid, and all other clusters are declared false positives (FP). Ground-truth locations that cannot be matched to a cluster are counted as false negatives (FN).

The location of a spatial cluster is simply the mean of its elements. For Wi-Fi clusters, the mean itself has no spatial relevance (see Section2.1.1). Hence, we take the nearest neighbor to the mean and transform this point into a geographical location using ge-olocation. Deviation, i. e., distance from the ground truth POI, is based only on the true positives, and is calculated by taking the distance from the cluster location to the corre-sponding ground-truth location.

(36)

Strategy F1score Precision Recall Avg. Deviation ASHBROOK-GPS 0.66 0.71 0.61 38.76m HARIHARAN-GPS 0.43 1.00 0.27 43.86m TIMEBASED-GPS 0.61 1.00 0.43 37.27m DJCLUSTER-GPS 0.85 0.94 0.78 24.20m DBSCAN-GPS 0.85 0.93 0.78 23.69m MEANSHIFT-GPS 0.83 0.88 0.78 23.68m ASHBROOK-GEO 0.80 0.71 0.93 28.92m HARIHARAN-GEO 0.44 1.00 0.28 44.23m TIMEBASED-GEO 0.67 1.00 0.51 39.22m DJCLUSTER-GEO 0.93 0.95 0.91 19.17m DBSCAN-GEO 0.93 0.95 0.91 19.18m MEANSHIFT-GEO 0.93 0.98 0.90 23.73m DBSCAN-TM 0.92 0.95 0.89 23.32m DBSCAN-CS 0.91 0.95 0.87 23.51m

Table 2.8: POI extraction results of F1, precision, recall, and average distance to ground truth for each clustering

strategy.

require some special consideration because there is a lower bound on dwelling time be-low which POIs cannot be extracted robustly. In other words, they may often be indistin-guishable from noise. We therefore define a clear cut-off point: dwelling sessions of less than five minutes are ignored. If a clustering algorithm finds such a location this result is not counted as a true positive, nor is it counted as a false negative if it did not.

All algorithms except for mean-shift use a disc kernel with radius r to determine the neighbor set of a given data point. Ashbrook’s algorithm, Hariharan, and time-based clustering use a time-threshold tmi nto filter out noise, which corresponds to the num-ber of samples within a candidate cluster multiplied by the sampling interval (1 and 2 s for GPS and Wi-Fi respectively). Similarly, DJ-Cluster and DBSCAN rely on a mi nPoi nt s parameter that defines a critical density at which a point becomes a candidate for clus-tering. Mean-shift iteratively searches for the modes of a distribution. These modes correspond to dwelling locations in our data set. We use a Gaussian kernel with stan-dard deviationσ, and filter out noise by removing modes with weights below a certain threshold valueθ.

POIs can also be extracted directly from Wi-Fi scan results. DBSCAN can be applied to any kind of data as long as there is some metric, similar to the Euclidean distance used with spatial data, that can be used in determining whether two points are neighbors. We evaluate both the Cosine similarity and Tanimoto coefficient described in Section2.1.1 as distance metrics. In this case the radius r corresponds not to a Euclidian distance but rather a minimum similarity score at which scan results are considered neighbors. Since the mean of a set of fingerprints does not have any spatial meaning, mean-shift cannot be applied to Wi-Fi fingerprints.

The performance of the clustering strategies depends heavily on the parameter set-tings. The parameter settings for each strategy were determined empirically by searching the parameter space and choosing values that maximized the F1score for our data set.

(37)

RESULTS

The clustering results are shown in Table2.8. Of the spatial algorithms we found Ash-brook’s algorithm, Hariharan, and time-based clustering to perform poorly in our setup. One particular problem we noticed was that these were unable to cope with the specifics of our data set and in particular with noisy and oddly shaped point clouds. To illustrate, Figure2.6shows an elongated point cloud of Wi-Fi geolocation samples collected over a time span of fifteen minutes when the collector was stationary. Ashbrook’s algorithm, Hariharan, and time-based clustering all generate several false positives for this exam-ple, while DJ-Cluster and DBSCAN correctly identify a single cluster. The number of FPs generated by the first three algorithms can be reduced by increasing the radius r , by rais-ing the time threshold tmi n, or by doing both. However, raising these thresholds causes more false negatives, resulting in a subsequent drop in recall, because the algorithms can no longer detect smaller clusters or will merge multiple POIs that are close to each other.

In contrast, DJ-Cluster and DBSCAN are designed to handle non-circular point clouds and contain logic to merge intermediate clustering results. They can operate with a rel-atively small radius r and low threshold mi nPoi nt s. In fact, these two algorithms are very similar. Generally DBSCAN attempts to incorporate more edge points into the gen-erated cluster, whereas DJ-Cluster is slightly easier to implement. We have found them to be equivalent in terms of performance. Mean-shift also produces very good results on both GPS and geolocation data, and even yields the best F1score overall when used with

geolocation. For Wi-Fi clustering we found little difference between using the Tanimoto-or Cosine similarity metrics, as both yield good results.

The best F1scores for the GPS, geolocation, and Wi-Fi are 0.86, 0.94, and 0.92,

re-spectively. The relatively poor performance of GPS is mainly due to low coverage in in-door locations. Many dwelling locations are not found simply due to a lack of GPS data, which results in more false negatives, and a lower recall. Geolocation does not yield sig-nificantly better results over Wi-Fi, which implies that spatial clustering is not superior to clustering on scan results directly.

When considering average deviation from ground truth, Ashbrook’s algorithm, Hari-haran, and time-based clustering performed worst. We speculate that this is due to the high r and tmi n values required to get a reasonable F1score. The best results are

ob-tained with DJCLUSTER-GEO and DBSCAN-GEO, which yield ≈ 20% smaller deviation than their GPS counterparts. The deviation of the Wi-Fi clusters is comparable to that of GPS.

SUBSAMPLING

In the previous sections we used our data set with GPS sampled at 1Hz, and Wi-Fi scans every 2 s. There are many cases however where such aggressive sampling rates are un-desirable, for example due to power consumption of the GPS and Wi-Fi chips.

To find out how lowering these rates affects the fidelity of our clustering strategies we applied subsampling to our traces, as described in Section2.1.1. We used sampling intervals that varied between 10 to 60 seconds at 10-second intervals. In the following we focus on DBSCAN applied to our three sensor sources, because they were found to perform best in the previous section.

(38)

(a) Ashbrook’s algorithm (b) Hariharan

(c) Time-based Clustering (d) DJ-Cluster/DBSCAN Figure 2.6: Clustering algorithms applied to a noisy geolocation point cloud.

Recall from Section2.2.2 that we used parameter search to find values for r and mi nPoi nt s that maximize F1. Of course, changing the sampling interval requires

chang-ing the mi nPoi nt s parameter, i. e., the minimum number of points that constitutes a dwelling location, for each of the strategies. We found that deriving these values from the ones found in Table2.8by scaling them linearly with the change in sampling rate not al-ways yielded the best results. For example, DBSCAN-GPS performs optimally at a rate of 1Hz when mi nPoi nt s = 210. Scaling these results for 30s intervals, mi nPoi nt s should be set to 210/3 = 7. However, we obtained a slightly better F1score with mi nPoi nt s = 9

(F1= 0.842 for 9 versus 0.831 for 7a). Table2.9shows the value of mi nPoi nt s used for

each of the strategies at the different sampling intervals.

Figure2.7shows the clustering results. The effect of subsampling on the F1score is

relatively small for all three strategies, but it is important to tune the parameters to the chosen sampling rate.

FINDINGS

Of the six algorithms we tested, DBSCAN and mean-shift work best for extracting points of interest from our location traces. DJ-Cluster is very similar to DBSCAN but slightly easier to implement and yields comparable results. The other algorithms do not perform well on our data set and problem specification.