• Nie Znaleziono Wyników

LINKED GEODATA FOR PROFILING OF TELCO USERS

N/A
N/A
Protected

Academic year: 2021

Share "LINKED GEODATA FOR PROFILING OF TELCO USERS"

Copied!
15
0
0

Pełen tekst

(1)

Studia Ekonomiczne. Zeszyty Naukowe Uniwersytetu Ekonomicznego w Katowicach ISSN 2083-8611 Nr 234 · 2015

Krzysztof Węcel

Uniwersytet Ekonomiczny w Poznaniu Katedra Informatyki Ekonomicznej krzysztof.wecel@ue.poznan.pl

LINKED GEODATA FOR PROFILING OF TELCO USERS

Summary: There is a growing interest in location-based profiling of users de-fined as com- bining geo-data with anonymous on-line profiles. The profile of an entity usually consists of concepts accompanied by a weight specifying a relative importance of the given concept for making an analysed entity distinct. The proposed profiling method of telco users is a two-step approach. First, profiles of mobile tower stations (BTS) are created based on crowdsourced geographical information. Second, they are used to generalise the behaviour of a calling user, which is determined from Call Detail Records (CRD). The linked data cloud is considered as an additional knowledge source in the user modelling process.

Keywords: linked data, user profiling, linked geodata, call detail record, mo-bile user, telco, cdr, bts, lgd, osm.

Introduction

There is a growing interest in location-based profiling defined as combining geo-data with anonymous on-line profiles. New methods for capturing informa- tion on where are the users and how their position changes over time are con- stantly developed. This information is becoming increasingly more valuable for a growing number of location-based or location-aware services. Some research- ers try to estimate value of mobile data information utilising proximity-based advertising valuation (Baccelli, Bolot, 2011). Tourists have been identified as the most rewarding target group.

Call Detail Record (CDR) is the most widely used source of mobile loca- tion data in academic research (Song et al., 2010). Presented location is not very precise as it is “rounded” to the co-ordinates of the nearest base transceiver sta- tion (BTS). Various granularities of location impact the value of location infor-

(2)

Krzysztof Węcel 200

mation (Baccelli, Bolot, 2011). CDR has been identified to be sufficient for drawing conclusions at the area level (Qu, Zhang, 2013), hence it is also suffi- cient for our purposes.

Linked data cloud is very often considered as additional knowledge source in user modelling process. Concepts defined in various ontologies can be used to characterise entities. Profile of the entity consists then of the concept accompa- nied by a weight specifying relative importance of the given concept for making analyse entity distinct.

In this paper we focus on cell tower granularity of location information and annotate it with geographical ontology derived from OpenStreetMap. The goal of the method is to provide profiles of the users based on profiles of the BTS sta- tions. Therefore, the profiling process has been split into several steps. First, the information about BTS location and its neighbourhood has to be retrieved and analysed. Then, based on this information a summary of BTS profiles is pre- pared. We propose an improvement in profiling process by leveraging TF-IDF ranking to address the issue of uneven distribution of categories describing mo- bile tower locations (skewness). In the last step we can characterise users that log-in into specific BTS stations.

Section 2 presents related research. Section 3 explains our general approach to profiling of entities based on geographical context. Section 4 introduces a method for characterisation of BTS stations, from data collection, through analy- sis, to data aggregation. Section 5 provides a method for profiling of telco users.

1. Related research

In the literature, there are various approaches to profiling of mobile users.

Some authors base purely on telco data, i.e. data available for mobile operators.

Majority of methods leverage the social media where users manifest their opin- ions, feelings, reveal location etc. The most sophisticated approaches add mining for generalisation of patterns and classification of users. Having access to ano- nymised call data, we base our method on this data.

Most methods base on social data like Twitter, Foursquare, Flickr, or Insta- gram as data is relatively easily retrievable (API available). These services are then widely used and generate large volumes of data. One of the challenges is how to model the use of such social (and mobile) applications by various users.

It is essential to understand the semantics of messages posted by them. Two trends are observed here: extraction of meaning and location.

(3)

Linked geodata for profiling of telco users 201

In many studies in order to enrich and disambiguate information gathered from user, semantic technologies are considered. They are particularly useful for providing context, and geographical context is one of the most important. Abel et al. introduced a user modelling framework that utilises semantic background knowledge and use it for point of interest (POI) specification (Abel et al., 2012).

Two knowledge sources of linked data are considered: GeoNames and DBpedia.

They demonstrate that user modelling quality improves when LOD-based back- ground knowledge is considered. DBpedia is unfortunately too coarse-grained for our purposes.

Instead of analysing separate check-ins, some approaches build activity- travel profile – a spatial trajectory is built from mobile phone call records only.

The tricky part is in classification of trajectories, where data mining methods can be applied (Görnerup, 2012). Not only the sequences have to be derived but in order to make sense they have to be classified into typical activity-travel pat- terns. Their relative frequencies constitute an activity travel profile (Liu et al., 2014). Our method also bases on a number of BTSes visited. However, we do not consider sequences of visits as our experiments have shown that this would not produce meaningful results.

Trajectories, or sequences of visited locations, are not very useful unless they are confronted with activities of the users. Lie et al. investigated to what ex- tent the behavioural routines could reveal the activities being performed at mo- bile phone call locations (Liu et al., 2013). The real value is in annotating loca- tions with activity purposes but for this additional information is required.

Although they have devised mathematical models to quantitatively characterize travel patterns, the motivating activities “were still in a less-explored stage” (Liu et al., 2013). Our method provides just another context for reasoning about pos- sible activities of the users, e.g. shopping, playing tennis.

According to (Cano et al., 2013) little has been done in modelling location entities. Therefore, they proposed to profile geographical areas by providing topical categorisation. Cano et al. used additional source – linked data to inter- link information from social stream with geographical objects (Cano et al., 2013). They have introduced LinkedPOI ontology, which uses DBpedia catego- ries to profile geographic space and proposed geo-lattice Awareness Stream model as one of the ways to represent location. The process consisted of filter- ing, enriching, structuring and interlinking microposts from Twitter, Facebook, and TripAdvisor. Our method in fact models location entities but as we do not analyse microposts we do not have to guess the correct location – it is provided in CDR.

(4)

Krzysztof Węcel 202

Qu proposed a framework and corresponding analytic methods to use User Generated Mobile Location Data (UGMLD) for Trade Area Analysis (Qu, Zhang, 2013). They have defined three key processes: “identifying the activity centre of a mobile user, profiling users based on their location history, and mod- elling users’ preference probability.” Application of the method is meant for analysis of customers’ visits to business venues. However, they rejected CDR as the data source in their research as it was too coarse. Our method is able to util- ise CDR in order to provide profiles of certain locations although we cannot provide profiles of certain venues belonging to bigger chains.

When specific locations are considered, Chen et al. also presented a method for profiling businesses at specific locations that was based on mining informa- tion from social media (Chen et al., 2014). They matched geo-tagged tweets against locations from Foursquare to build a profile of mentioned businesses.

Going back to the user, Ostuni et al. presented Cinemappy – a location- based application that computes film recommendations by exploiting contextual information related to current location of the user, leveraging information from DBpedia (Ostuni et al., 2013). Similarly, DBpedia was used by (Cano et al., 2013) who proposed a semantic travel mash-up as possible application. Ap- proach for museums was presented in (Ruotsalo et al., 2013).

One of the obstacles by user modelling is uneven distribution of categories of business, e.g. there are much more visits to a cinema than to second-hand and there are much more shops than theatres. It was first noted by Qu who explained this by social motivation, not necessarily by the differences in the number of various categories (Qu, Zhang, 2013). In their work the categories were very fine grained, for example Foursquare has a hierarchical category structure with 9 top categories and ca. 400 sub-categories. For the clarity of interpretation the catego- ries have been later collapsed to 6 groups. Our methods addresses this issue by applying specific methods from information retrieval domain.

2. Approach to geographical linked data-based profiling

This section describes user profiling based on BTS characteristics derived from the geographical linked data. We follow the idea of location-based user profiling – one of the approaches is geoprofiling, a commonly used method to approximate user characteristics based on neighbourhood demographic data.

In most approaches, there is a venue or a place given and authors are look- ing for coordinates. This is particularly important when text is analysed, espe- cially in social media, e.g. Twitter. In order to simplify disambiguation, some

(5)

Linked geodata for profiling of telco users 203

portals allow the so called check-ins where users select precise location, e.g.

Foursquare, Facebook. We take the reverse process: starting from coordinates we are interested in the objects nearby, thus describing the geographical context, further referred to as location profile.

Our experiments have been carried out on anonymised data, where the only reasonable data for linking was location of BTS towers. There was just one type of information that was widely used and could supplement our records – geo- graphical information. There are several open data sources concerning geo- graphical information that we could use, DBpedia and OpenStreetMap being the most prominent. Taking into account the granularity of available data and the re- quirement to display results on the map we made a decision to base our method on OpenStreetMap and its triplified counterpart – LinkedGeoData (Auer, Leh- mann, Hellmann, 2009). As a crowdsourced data, it is kept relatively up to date and but without breaking fluctuations.

LinkedGeoData (LGD) provides an ontology for classification of locations.

There are ca. 1200 categories grouped into ca. 45 top-level categories. Compar- ing LGD to Foursquare, the latter has ten top-level, 436 second-level, and 266 third-level categories1. Foursquare’s maps in fact use OpenStreetMap2. In our approach more general categories are advantageous as they can make interpreta- tion of generalisation results easier. Sub-categories could be more valuable as they can distinguish users better but the solution is then less stable from the sta- tistical point of view. This is a well-known trade-off of specificity vs. sensitivity (Fawcett, 2006). In order to best characterise the BTS stations, we have re- stricted our further analysis to some predefined objects (see Fig. 3).

The reasoning behind our profiling approach is presented in the following user story. A user often visits sport amenities. They are in the scope of some BTS stations. Profiles of such stations contain sport amenities with higher frequency than an average station. This is further reflected in a user profile where sport amenities gain higher weight when user trajectory is aggregated. Some visited venues are additionally annotated with a kind of sport, for example tennis3. Looking into calendars of sport events we can even reason further in which kind of sport user is interested or whom the user is supporting (whether team or indi- vidual).

1 https://developer.foursquare.com/categorytree.

2 https://foursquare.com/about/osm.

3 This depends on data availability in OpenStreetMap.

(6)

Krzysztof Węcel 204

3. Characteristics of BTS 3.1. Retrieval

In our experiments we have used BTS towers located in Poland, with ca.

8000 unique locations, stored in MySQL. At the beginning the information about BTS locations has been retrieved. Using a Python script, for each location a SPARQL query was prepared to retrieve list of objects in the neighbourhood along with their categories. As a source of data for our queries we have used LinkedGeoData which is a derivative of OpenStreetMap. Two main categories of objects are distinguished therein: nodes (just a point according to GIS termi- nology) and ways (lines or polygons). Separate queries for nodes and ways had to be prepared because the Virtuoso’s built-in distance function has different be- haviour for nodes and ways. As there were ca. 8000 locations, two kinds of ob- jects, two means for object capturing (bounding box and circle) and 3 various distances, we had to post ca. 80 thousand queries.

Below sample SPARQL query is presented:

PREFIX lgdm:<http://linkedgeodata.org/meta/>

PREFIX geom:<http://geovocab.org/geometry#>

PREFIX ogc:<http://www.opengis.net/ont/geosparql#>

SELECT distinct ?class ?way WHERE { ?way a lgdm:Way .

?way a ?class .

?way geom:geometry [ogc:asWKT ?geo ] . filter(bif:st_within( ?geo,bif:st_point(%f,%f),%f )) }

Listing 1. Sample SPARQL query using geospatial functions

We have decided to use LGD’s endpoint instead of OSM’s API as it was possible to use Virtuoso built-in SPARQL functions for spatial queries. For the retrieval of nodes, it was possible to provide detailed query and the radius was in fact expressed in kilometres. For example, Figure 1a presents various venues lo- cated in a circle of 1 km diameter.

Retrieval of ways was more complicated. Function bif:st_within was not returning what it was expected for Way objects when the other parameter was a point. Several methods to get satisfactory results have been tested, includ- ing generation of boxes to simulate containment (see Fig. 1b and 1c). Finally, the circle overlap after toleration parameter tuning to 0.01 has been chosen as a method to query for neighbouring ways (Fig. 1d).

(7)

F

3

a t i T c t ( u

(a)

(c) Fig.

3.2

and thei inte The cati tion (0.0 user

) no

) wa . 1.

. A

Va d qu ir n eres e Fi A ion.

n an 06).

r pr des

ays, Var tion

Ana

Vario

uant neig

sting g. 2 Anot

. Ge nd

Su rofi

, cir

box riou nal

lys

ous tita ghbo

g to 2b s ther ene 1.2 uch

les.

rcle

x, 5 us v Fai

sis

asp ative ourh o ob

sho r as eral re asy .

e, 1k

km venu

r

pec e. F hoo bser ws pec fin stau ym

L

km d

ues s

ts o For od.

rve a c ct an ndin

uran mme

Link

diam

sele

of lo ex Th tha clos

naly ngs

nts;

try ked

mete

ecte

oca amp e si at th er l yse are

; on ne

geo

er

d ba

ation ple ize he c look ed is e as n th eds

odat

ased

n ch Fi

of city k.

s nu fol he o s to

ta fo

d on

har g. 2

the y mo

umb llow oth

be for p

n ob

ract 2a e ci ost ber ws:

er e e ad

profi

(b

(d) bjec

eris sho rcle pac of on end ddre

filing

b) w

) wa ct ty

stic ows e sp cke

ven av d ar esse

g of

ways

ays, ype

s ca s B peci ed w nue

era re u ed w

f tel

, bo

circ and

an b TS ifie with es o

ge univ

whe co u

ox, 1

cle, d dis

be a sta es th h ho f gi the vers en b

user

1km

dis stan

ana atio he n otel

iven re a sitie bui

rs

m

stanc nce,

alys ons num ls in n ca are es ( ildin

ce 0 nea

sed, tha mbe n Po ateg 4.0 (0.0 ng

0.01 ar P

bo at h er o olan gory 0 sh 04) the 1 Pozn

oth have of h nd i y pe hops an loc

nań

qua e ho hote

is G er B s pe nd c

cati Inte

alita otel els.

Gda BTS

er l cine ion

20

erna

ativ ls i It i ansk S lo loca ema an 5

a-

ve in is k.

o- a- as nd

(8)

2

F

r a a O h a m s t a p p 206

(a) Fig.

resu a co and Obj hen a w mar sent the a ty prof park 6

Pol . 2. N

A ults omp d GI ject nce way

rked ted

mo A ypic

file king

land Num

Altho pr preh IS t ts li the in d a as ost b After

cal e is

gs a d

mbe

oug rovi hen typ ike ey a alm s a

nod bala r ex BT pre and

er o

gh n ided nsiv

e a par are mo no des anc xper TS l esen d 15

f ho

nod d b ve lo and

rkin mo st ode.

an ed rim loca nted 5.3%

otels

des y t oca the ng o ore

110 . On nd o

enti ment

atio d in

% le s lo

and thes ation

e Op or l

pop 0 th

n th only

ity:

ts w on, n th eisu

cate

d w se t n pr

pen eisu pul hou he y 4 24 we c inc he F ure-

ed w

ways two

rofi nStr ure lar usan

oth as a 4899 conc clud Fig.

-rela K

with

s w o ty ile.

reet are as w nds her

a w 9 w clud ding

3.

ated Krzy

hin B

were ypes Th tMa eas way

cas end way.

ways ded g in On d.

yszto

(b) BTS

e di s of here ap c

are ys.

ses;

d ar Th s vs d tha nput n av

of W

Gd S sta

istin f ob e is

com e be Fo

; on re A he s

s. 2 at it t fr vera

Węc

dańs ation

ngu bjec

a s mmu

etter or e

nly ATM sam

887 t is om age,

cel

k ns

uish cts tron unit r re exam y 3.

Ms me ap

71 n use m bo , 22

ed hav ng p ty h epre mpl 5 t – o ppl nod eful oth 2.1%

due ve pre has esen le, p thou over lies des.

l to nod

% o e to

to fere

ad nted

par usan

r 6 for pre des of o

o te be enc dapt d by rkin

nds 500 r tra epa

an obje

echn me ce b ted y sh ng i

s pa 0 en am are c nd w ects

nica erge betw cer how

s re arki ntiti sto cha way s wi

al li ed t ween

rtain wing epr ing ies ops.

arac ys.

ithi imi to p n c n p g th ese

ar are Sh teri Com n B

itati pro ateg patte he a nte reas e re hops istic mb BTS

ions vid gor erns area d a s ar epre s ar cs o ine S ar

s, de

ry s.

a, as re e- re of ed re

(9)

F

4 4

f s s o k p T d d

w Fig.

4. P 4.1

ficie solu sult of l kind pos TF- doc deci

whe . 3.

Pro . T

Si ent ute ts w loca d of In e t -IDF cum

TF ided

ere Ave

ofile F-I

imp to val wou atio f ob n or

to u F is ment

F ca d to

ni,j

erag

es o IDF

ple cor lues ld b ons.

bjec rder use s ac t fre an b o use

is n ge d

of l F-in

agg rrec s. F be b

In cts m r to e T ctua equ be c e re

num distr

loca nsp

greg ctly For

bias fac mor all TF-I ally uenc calc elati

mbe L

ribut

atio pire

gati pro exa sed ct, w

re o levi IDF y a p

cy ( cula ive f

r of Link

tion

ons ed

ion ofil amp

if w we ofte iate F w

pro (IDF ated

freq

f tim ked

n of

s an me

of e lo ple, we are en th e the weig oduc F).

d in quen

mes geo

f obj

nd eth

geo ocat as had e in

han e ef ghti ct o diff ncy

s th odat

jects

us od

ogra tion the d no ntere n an

ffec ing of tw ffere y (w

at t ta fo

s am

ers for

aph ns. R

ere ot in

este n av

ct o sc wo ent w whic

the t for p

mon

s r lo

hica Rel are nclu ed i vera of u chem

sta way ch is

term profi

ng p

oca

l ca lativ e m

ude in i age unev

ma atist ys, e s no

m ti

filing

rede

atio

ateg ve v much ed th info

use ven

kn tics e.g.

orma

i occ g of

efin

on p

gori valu h m his orm er.

n dis now s: te . bo alis

cur f tel

ned

pro

ies a ues more cor matio strib wn

erm oole ed t

rred co u

30 c

ofili

assi are e sh rrec on, but

fro m fre

ean o to 1

d in user

cate

ing

ign e m hop ctio if g tion

m equ

or r 1.0),

doc rs

egor

g

ned more ps th on in giv n of inf uenc

raw , ex

cum ries

to p e im han n th en f ca form cy ( w fre xpre

men of o

plac mpo n lib

he c use teg mati

(TF eque esse

nt dj

obje

ces ortan

brar char ers

orie ion F) an

ency d as

j. ects

is n nt th ries

ract visi es w n re

nd y. W s fo

s

not han the teri

it s we etrie

inv We h ollow

20

suf n ab e re stic om pro eva vers hav ws:

7

f- b- e- cs me o- al.

se ve

(10)

2

n

w b o P n T s

F T 208

nati

whe ber of t Pitc not TF- stric

Fig.

Tab 8

ID ion

ere of A the ch –

ver -IDF

cted

. 4.

ble 1 DF i

pow

|D|

doc An e

mo – is ry u F m d lis

Per 1. S

is a wer

is cum entry ost p s 4.2 usef meth st o

rcen amp

a me as

a n men

y to pop 234 ful hod of 3

ntage ple

easu they

num ts c o ID pula 4. C to d ar 0 ob

e of pro

ure y ca

mber cont DF c ar c Con dist re g

bje

f loc file

of t an b

r of tain calc ateg nclu

ting give

cts.

catio s of

term bett

f do ning cula gor usion

guis en in

.

ons f BT

m sp ter c

ocu g th atio ry – n: s sh l

n th

con TS l

K

peci char

ume he te on i – Sh

sho loca he T

ntain loca

Krzy

ifici ract

ents erm s pr hop

ps, atio Tab

ning ation

yszto

ity – teris

in m ti.

rese p – i pre ons.

ble

g ve ns c

of W

– le se a

cor ente is 0 esen

Sa 1. T

enue calcu

Węc

ess f a do

rpu ed i 0.76 nt i amp The

es (n ulat

cel

freq ocum

us, a in F 66 a in a ple r ey h

nod ted w

quen men

and Fig.

and alm resu have

des) with

nt t nt. I

d de 4.

d for ost ults e be

of g h TF

erm t is

enom TF r le ha s ob

een

give F-ID

ms h cal

min -ID east lf o btain n bu

en c DF

have lcula

nato DF f po of th

ned uilt

categ e bi

ated

or c fact opul he d fro

bas

gory igge d as

cont tor i lar c

loc om sed

y er d s fol

tain in t cate atio the on

discr llow

ns n the ego ons e ab n the

rimi ws:

num cas ory , ar bov e re i-

m- se

– re ve e-

(11)

n a S I m q i 5 t T m n b t

F

4

t 1

4

neig are Som In b mos que inte 5 m thos Thi mea neig blue the

(a) Fig.

4.2

tain 10 B

4 Pl ri

Th ghb

nod me l

bold st c ent ( erest monu se c

s is Fi asur ghb e – wh

Pol . 5.

. U

Fo ning

BTS

leas od c

hey bour des loca d w char (loc ting um cate to ig.

re.

bour his hole

land Mo

User

or e g da S lo

e no can b

y pr rhoo (po atio we h

ract catio

g to ents egor som 5 p

Ea rho stor e Po

d ost p

r pr

eval ata f ocat

ote th be c

rofi od oint ns h hav teris on i o o s an ries me e pres ch od.

rica olan

popu

rof

luat for tion

hat t consi

files of t ts), hav ve m

stic id 3 bse nd 8

wi exte sent BT Th al ob nd (

ular

filin

tion 3 m ns a

this ider

L

in the inc ve ju mark for 32), erve 8 b th s ent ts a TS l he m

bjec (Fig

r ann

ng

n, w mon and

is il red, a

Link

n th loc clud ust

ked r a , the e th ank sma jus a vi loca mos cts, g. 5a

nota

we h nths the

llust any

ked

he catio

ding one d th giv e on hat ks. W aller

tifie isua atio st im

ora a) a

ation

have s. F en r

tratio othe

geo

Tab on g 2 e ob he c ven ne w

13 Wei r nu ed – alisa on i

mp ang and

ns o

e us rom rand

on o er se

odat

ble id 3

bu bjec cate n loc

with sh ight umb – su atio is a orta ge – Gd

of B

sed m th

dom

of th et of

ta fo

1 34 t us st ct (e egor cati h th ops ts c ber uch on o anno ant – sc dańs

BTS

d the his d mly

he m f obj

for p

sho ther top e.g.

ry w ion.

he h s in can of c

cat of t otat obj choo sk (

loc

e da data sel

metho ject

profi

oul re a s, 1 id 3 with . W high n lo be cate tego top ted bjec

ols, (Fig

(b) catio

atab aba lect

od an type

filing

d b are 1 pl 36) h th When

hest ocat

com ego orie cla wi ts a , gr g. 5

Gda ons

base ase w ted u

nd s es ca

g of

be 4 g lace , ot he h

n m t ID tion mpa

ries es cl asse ith are reen b) a

ańsk

e w we use

some an b

f tel

inte geog e of ther high man DF m

n id ared s ge

lear es r the cod n –

are

k

with hav ers4.

e de be us

co u

erpr grap f wo r hav hest ny c mak d 41 d be et hi

rly p rank e to

ded pub atta

a s ve q . Th

ecisi sed.

user

rete phic orsh ve m t m cate kes 1 ar

etw igh pro ked op-r d as blic ach

am que he c

ons rs

ed cal hip man meas

egor top re ween

er w ofile d ac rank s fo c tra hed.

mple eried

calc

are

as obj , an ny m sure ries p of less n lo weig e the ccor ked ollow

ansp .

d 1 d u cula

arbi

fol ject nd 1

mor e, th are f the

s im ocat

ght e lo rdin

ca ws:

por

0.0 sers atio

itrar

llow ts, a 1 fu re ( hus e eq e ran mpo tion

s (c ocat ng t ateg

red rt. F

000 s w n o

ry. A

ws:

all o uel (e.g s be

qua nkin orta ns, b

case tion to T gory

d – Figu

use with of th

Any

in of t

sta . id eing ally ng.

ant but e id n.

TF- y in – sh ures

ers at l he c

othe

20

th them ation

41 g th fre It i tha the 36) -IDF n th hops

s fo

con leas char

er pe

9

he m n.

).

he e-

is an en

).

F he s, or

n- st r-

e-

(12)

2

a a n v

F

b w h v i s 210

acte as a num visu

Fig.

bett with hav valu in F shou

0

eris a w mbe uali

. 6.

Fo ter e h th ve n ue o Fig.

uld tics weig

er o ise p

Geo or c emp his k norm of g

7.

d fur s of ghte

f co pro

ogra com pha kind mal give Gr rthe

f us d s onn file

aphi mpar asiz

d o lised en o roup er im

sers um nect es o

ical riso ze d

f ch d th obje ping mpr

s is m of tion of us

Lin on o diffe hart he v ect g o rov

s ra f pro ns in

sers

nked of u eren ts is valu typ f ce ve th

athe ofil nitia s as

d D user nces s th

ues pe in

erta he r

r st les ated s a p

Data- r pr

s an hat a s in n pr ain read

K

trai of v d w pie

-Ba rofil nd s axes ch rofi cate dab

Krzy

ght visi with cha

sed les sim s sh harts ile h ego ility

yszto

tfor ited

a g art.

pro mu milar hou s in has orie y of

of W

rwar d BT give

Fig

ofile uch ritie uld h

n su va s or f th

Węc

rd.

TS en B g. 6

e of mo es b hav uch alue

r ev he ch cel

Th loc BTS pre

f the ore betw ve th a w 1.0 ven

har he p

cati S. S

esen

e sam suit wee

he s way 0. T n red

rt.

prof ons Sim

nts

mpl tabl n u sam y, t The

duc file s, w ilar

pro

e us le i user me m

that sam cing

of wher rly t ofile

sers s a s. O mea

t us me g nu

f a re t to l es f

s rad One asur ser

use umb

use the oca for

dar e of

re. T wit ers ber

er is we atio sam

cha f the The th t are

of s pr eigh

ns, mple

art t e pr eref the com cat

rep ht is

we e us

that robl fore hig mp tego

are s th e ca sers

t ca lem e, w ghes

are orie

ed he an s.

an ms we st ed es

(13)

F

C

L u e i o p J t b t t t c a Fig.

Con

Lin user erat ing or r prof Jord tion bles that to p the cate an e

. 7.

ncl

In ked rs. T tors

any reve

So file dan n is s ou t are prep

hie egor exte

Com

lusi

n th dGe

The s. N y bu enu

o fa e. In n, 20

mo ur a e pr pare erar ries ensi

mpa

ion

his eoD e m Neve

usin e es ar w n th

012 odel app rov e ap rchy s wh

ion ariso

ns a

pa Data meth erth nes stim we

e fu 2). I

lled proa vide ppro y of hen of

on o

and

ape a an hod hele s re mati hav utur It is d as ach ed b opri f ca n an the

L

of u

d fu

er w nd a

has ess, equi ion ve re w s a g s a f for by l

iate ateg nnot e me

Link

user

utur

we also s be

the irin . app we p gen fini r pr

oca e mi gori

tati etho

ked

pro

re w

ha the een e el ng p plie

plan nera

ite m rofi atio ixtu es.

ng od t

geo

ofile

wo

ave e TF n ap

labo prof

d a n to ative

mix iling

ns.

ure.

As spe to h

odat

es on

ork

e c F-ID pplie

orat file a sim o us

e pr xtur

g: u Wh Th of ecif hier

ta fo

n ra

ont DF- ed i ted of mp se L

rob re o user hat he m f no

fic v rarc

for p

adar

tribu -ba in t me nei ple a Late

abi over rs a

we meth

w L venu chic

profi

cha

ute ased he p etho ighb app ent

list r an are e ne hod Link

ue cal L

filing

arts

d d me

par od c bou proa Dir tic m n un

mo eed d ca ked (e.g LDA

g of

– re

the etho rticu can urho ach rich mod nder ode

to an a dGe g. b A (h

f tel

estr

m od f ular n be ood

for hlet del rlyi elled

det also eoD both hLD

co u

ricte

meth for r se e us

, e.

r ag All wh ing d as term o be Data h sh DA

user

ed to

hod pro ettin sed g. f ggr loca here

set s a mine e im a co hop A).

rs

o tw

d fo ofili ngs

un for m rega

atio e ea t of mi e ar mpro onta

and wo u

for ing

of nive mar atio on ( ach f top

ixtu re w oved ains

d am users

ex of mo ersa rke n to LD item pics ure weig

d b s all men s

plo loc obile

lly eting

o o DA) m o s. T of ght y c l int nity

oitat catio

e te for g pu obta

(B of a This cat ts al ons term y). T

tion ons elco r pr urp ain lei, co res tego llow side med The

21

n o s an o op rofil ose use Ng llec sem orie win erin diat ere i 1

of nd p-

l- es er g, c- m- es ng ng te is

(14)

Krzysztof Węcel 212

References

Abel F., Hauff C., Houben G.-J., Tao K. (2012), Leveraging User Modeling on the So- cial Web with Linked Sata [in:] Web Engineering, Springer-Verlag, Berlin- Heidelberg, pp. 378-385.

Auer S., Lehmann J., Hellmann S. (2009), LinkedGeoData: Adding a Spatial Dimension to the Web of Data, ISWC 2009, Vol. 5823, Springer, Heidelberg, pp. 731-746.

Baccelli F., Bolot J. (2011), Modeling the Economic Value of the Location Data of Mo- bile Users, INFOCOM, IEEE, pp. 1467-1475.

Blei D.M., Ng A.Y., Jordan M.I. (2012), Latent Dirichlet Allocation, “Journal of Machine Learning Research”, Vol. 3(4-5), pp. 993-1022, doi:10.1162/jmlr.2003.3.4-5.993.

Cano A.E., Dadzie A.-S., Burel G., Ciravegna F. (2013), Topica-Profiling Locations through Social Streams. Semantic Technology, Springer-Verlag, Berlin-Heidelberg, pp. 290-305.

Chen F., Joshi D., Miura Y., Ohkuma T. (2014), Social Media-based Profiling of Busi- ness Locations, Proceedings of the 3rd ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia, Orlando, FL, pp. 1-6.

Fawcett T. (2006), An Introduction to ROC Analysis, “Pattern Recognition Letters”, Vol. 27(8), pp. 861-874, doi:10.1016/j.patrec.2005.10.010.

Görnerup O. (2012), Scalable Mining of Common Routes in Mobile Communication Network Traffic Data [in:] J. Kay, P. Lukowicz, H. Tokuda, P. Olivier, A. Krüger (eds.), “Pervasive Computing”, Vol. 7319, Springer-Verlag London, pp. 99-106, doi:10.1007/978-3-642-31205-2_7.

Liu F., Janssens D., Cui J., Wang Y., Wets G., Cools M. (2014), Building a Validation Measure for Activity-based Transportation Models Based on Mobile Phone Data,

“Expert Systems with Applications”, Vol. 41(14), pp. 6174-6189, doi: 10.1016/

j.eswa.2014.03.054.

Liu F., Janssens D., Wets G., Cools M. (2013), Annotating Mobile Phone Location Data with Activity Purposes Using Machine Learning Algorithms, “Expert Systems with Applications”, Vol. 40(8), pp. 3299-3311. doi:10.1016/j.eswa.2012.12.100.

Ostuni V.C., Gentile G., Di Noia T., Mirizzi R., Romito D., Di Sciascio E. (2013), Mobile Movie Recommendations with Linked Data [in:] Availability, Reliability, and Security in Information Systems and HCI, Springer, Berlin-Heidelberg, pp. 400-415.

Qu Y., Zhang J. (2013), Trade Area Analysis Using User Generated Mobile Location Data, Proceedings of the 22nd International Conference on World Wide Web, Re- public and Canton of Geneva, Switzerland, International World Wide Web Confer- ences Steering Committee, pp. 1053-1064, http://dl.acm.org/citation.cfm?id=

2488388.2488480 (accessed: 30.08.2015).

Ruotsalo T., Haav K., Stoyanov A., Roche S., Fani E., Deliai R., Mäkelä E., Kauppinen T., Hyvönen E. (2013), SMARTMUSEUM: A Mobile Recommender System for the Web of Data. “Web Semantics: Science, Services and Agents on the World Wide Web”, Vol. 20(0), pp. 50-67, doi:10.1016/j.websem.2013.03.001.

Song C., Qu Z., Blumm N., Barabási A.-L. (2010), Limits of Predictability in Human Mobility, “Science”, Vol. 327(5968), pp. 1018-1021.

(15)

Linked geodata for profiling of telco users 213

POWIĄZANE GEODANE DLA PROFILOWANIA UŻYTKOWNIKÓW TELCO

Streszczenie: Obserwuje się rosnące zainteresowanie geograficznym profilowaniem użytkowników, rozumianym jako łączenie danych geograficznych z anonimowymi pro- filami użytkowników. Profil jednostki zazwyczaj składa się z pojęć geograficznych oznaczonych wagami, odzwierciedlającymi względną ważność poszczególnych pojęć dla odróżniania użytkowników. Proponowana metoda profilowania użytkowników sieci komórkowych jest dwuetapowa. W pierwszej kolejności tworzone są profile stacji prze- kaźnikowych (BTS) na podstawie społecznie dostarczonych informacji geograficznych.

Następnie te profile są wykorzystywane do uogólnienia zachowania użytkownika, wyni- kającego z analizy logów jego połączeń (CDR). Chmura danych powiązanych (linked data) jest wykorzystywana jako dodatkowe źródło wiedzy w procesie modelowania użytkownika.

Słowa kluczowe: dane powiązane, profilowanie użytkownika, powiązane geodane, logi połączeń, użytkownik mobilny, telco, cdr, bts, lgd, osm.

Cytaty

Powiązane dokumenty

A nalizując sytuację społeczno-gospodarczą Polski międzywojennej w ydaje się jednak, że kom ercjalizacja przem ysłu zbrojeniowego była wówczas jedynym w yjściem ,

Kudlaczyk zastanaw ia się nad przyczynam i słabości ruchu katolicko- -spolecznego w Galicji, upatrując je w niechęci konserw atystów do now ych form pracy

For efficient storage and data retrieval at different resolu- tions we embraced a column-oriented format for voxel-based 3D city models.. Columnar formats have

Nel quadro del contesto descritto e nell’alveo degli studi sulle produzioni scrit- te degli studenti universitari in contesti formali condotti in Italia, dall’anno accade-

An important task of the comparative history of law should be to conduct contrastive-historical legal diachronic and synchronous research aimed at identifying both universal,

Firstly, the amplifier noise, which gives a flat spectrum, the level of which can be determined through the noise level at frequencies of 200-300 kHz (-92.6.. a, Different

Bachtijar Ashari Delft University of Technology Faculty of Aerospace Engineering Wind Energy Section Kite Power Research Group Kluyverweg 1 2629 HS Delft The

Z kolei zatrudnienie na stanowisku adiunkta uwarunkowane jest po­ siadaniem stopnia doktora (niekoniecznie z „odpowiedniej&#34; dziedziny nau­ ki) oraz wykazaniem się osiągnięciami