• Nie Znaleziono Wyników

Estimation of a recursive link-based logit model and link flows in a sensor equipped network

N/A
N/A
Protected

Academic year: 2021

Share "Estimation of a recursive link-based logit model and link flows in a sensor equipped network"

Copied!
21
0
0

Pełen tekst

(1)

Estimation of a recursive link-based logit model and link flows in a sensor equipped

network

van Oijen, Tim P.; Daamen, Winnie; Hoogendoorn, Serge P.

DOI

10.1016/j.trb.2020.08.003

Publication date

2020

Document Version

Final published version

Published in

Transportation Research Part B: Methodological

Citation (APA)

van Oijen, T. P., Daamen, W., & Hoogendoorn, S. P. (2020). Estimation of a recursive link-based logit

model and link flows in a sensor equipped network. Transportation Research Part B: Methodological, 140,

262-281. https://doi.org/10.1016/j.trb.2020.08.003

Important note

To cite this publication, please use the final published version (if applicable).

Please check the document version above.

Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

(2)

ContentslistsavailableatScienceDirect

Transportation

Research

Part

B

journalhomepage:www.elsevier.com/locate/trb

Estimation

of

a

recursive

link-based

logit

model

and

link

flows

in

a

sensor

equipped

network

Tim

P.

van

Oijen

,

Winnie

Daamen,

Serge

P.

Hoogendoorn

Department of Civil Engineering and Geosciences, Delft University of Technology, Transport & Planning, PO Box 5048, Delft, 2600 GA, the Netherlands

a

r

t

i

c

l

e

i

n

f

o

Article history: Received 23 September 2019 Revised 19 April 2020 Accepted 26 August 2020 Keywords:

Discrete choice modeling Recursive link-based logit model Wi-Fi-sensors

Crowd monitoring Route choice Link flow estimation

a

b

s

t

r

a

c

t

ThispaperdescribesamethodtoestimatetheparametersofaRecursivelink-basedLogit model(RL)usingmeasurementsofasetofspatiallyfixedproximitysensors,withlimited hitrates,whichcanuniquelyidentifypeople,suchasWi-Fi-,RFID-orBluetooth-sensors. Theobserved‘route’ofanindividual,wherewefocusonpedestriansinanurbanorevent context,ismodelledasthesequenceofsensorsthathaveidentifiedtheindividualduring hisorhertrip.Obviously,these‘routes’containlargegaps,whichmakestraditional esti-mationtechniquesnotapplicable.Althoughwedonotexactlyknowwhathappenswithin thesegaps,wedohavesomespecificinsightabouttheindividualsbehaviorbetweentwo identifications;weknowwithacertainprobabilitywhichisrelatedtothehitrateofthe sensors,thattheindividualdidnot crossanothersensorlocationbetweenthetwo iden-tifications.Thispaperthereforedescribes amethodtoestimatetheparametersofanRL modelthatspecificallyexploitsthisknowledge.Theframework alsoallowsusto formu-lateaprobabilisticlinkutilizationestimationmethod,whichcanbeusedtoestimatelink flowsinanetworkbasedonthesensorobservations.Theeffectivenessofthemethodology isdemonstratedinsimulationusinganartificialnetwork,afterwhichthemethodologyis testedonarealdataset,collectedataDutchmusicevent.

© 2020TheAuthors.PublishedbyElsevierLtd. ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/)

1. Introduction

Discretechoice models havebeenused fordecades todescribe all kindsof humanmobility, includingactivitychoice, destinationchoice,modechoiceandroutechoice.Forthelatter,whichhasthemainfocusinthispaper,manymodelshave beensuccessfullyused,likeProbit, MultinomialLogit(MNL),C-Logit(Cascettaetal.,1996), Path-SizeLogit(Ben-Akivaand Bierlaire,1999) andRecursivelink-basedLogit (RL)(Fosgerauetal.,2013;Maietal., 2018). IntheRL model,whichhasa graph-basedrepresentation,route-choiceismodeledassequentiallychoosinganextlink.Wewilllaterseethatthis struc-tureisalsoverysuitableforoursensor-basedestimationmethod.

Regardless of the exact model type, data is required to estimate the parameters of a discrete choice model. A large amountofdatasourcescanbeadopted,likeGPS-traces,Bluetooth-traces,on-lineoroff-linesurveys,Wi-Fi-traces,RFID,and mobilenetworkphonedata.Althougheachtypeofdatacomeswiththeirspecificcharacteristicswhenitcomestoaccuracy,

Corresponding author.

E-mail addresses: T.P.vanOijen@tudelft.nl (T.P. van Oijen), W.Daamen@tudelft.nl (W. Daamen), S.P.Hoogendoorn@TUDelft.nl (S.P. Hoogendoorn). https://doi.org/10.1016/j.trb.2020.08.003

0191-2615/© 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

(3)

availabilityandprocessing, wecanmakeaverycleardistinctionbetweentwotypesofdatasources:location datasources andproximity datasources. Location data,like GPS-traces andself-reported routes, consistsof measured orreported lo-cationsof people.Proximitydata,like Wi-Fi-,Bluetooth- orRFID-traces, consistsofsequences offixed locations(sensors) whereacertainpersonwasobserved.Atfirstglance,thesetwodatasourceslookverysimilar,butifwethinkabitlonger, we realizethat thereis afundamentaldifference betweenthe two typesofdata. From locationdata, we donot getany insightintowhathappenedbetweentwoconsecutivemeasurements.Withproximitydata,however,weknowthatitis im-possible(orveryunlikely)thatapersonpassedsensorlocationselsewhereinthenetworkbetweentwoconsecutivesensor observations.Inotherwords,withproximitydata,contrarytolocationdata,wedohaveinsightintowhat(mostlikely)did nothappenbetweentwoconsecutivemeasurements.Apparently, estimationofaroutechoice modelusingproximity data asksforits ownapproach,especiallysincethegapsbetweentwosensorobservationscan bequitelarge,raising theneed toexploittheknowledgeofwhathappenedin-betweentheobservations.

Thereisverylittleliteratureabouthowtoestimatetheparametersofadiscreteroutechoicemodelwithproximitydata. Thisincontrastwithroutechoicemodelestimationusinglocationdata,forwhichplentyofestimation-relatedmethodshave beenproposed,rangingfrommapmatchingandtrajectoryreconstructionalgorithms,inordertoaugmentincompletedata, tomoreelaboratemethods,likethenetwork-freedataestimationapproachintroducedbyBierlaireandFrejinger(2008).In thispaper,wethereforefocus explicitlyonestimationofa discreteroute choicemodel withproximity data.Wedescribe andapplyanestimationmethodthatexploitstheveryspecificnatureofthistypeofdatawithrespecttolocationdata.The frameworkinwhichwedescribethemethodologyallowsustoformulateaprobabilisticlinkutilizationestimationmethod aswell,whichcanbeusedtoderivelinkflowsandroutesplitsfromasetofsensorobservationsequences.Weimplemented andtestedthismethodaswell.

The remainder of this paper is organizedas follows. Section 2 briefly reviewsthe different estimation techniques of routechoicemodels,makingtheexplicitdistinctionbetweenusinglocationdatasourcesandproximitydatasources.Then,

Section3describestheframework thatencapsulatesour estimationmethod.Thissectioncomprisesa briefdescriptionof theadoptedRecursivelink-basedLogitmodelandapreciseformulationofthesensornetworkconfiguration.Section4 de-scribes how observations of individuals travelling through the network are represented in terms of sensor observations, andinSection5we focusontheparticularcaseoftripswithoutanyobservations.Insightsintounobservedtravelling are generalizedinSection6,inwhichamethodisderivedtocalculatethelikelihoodtoreproduceanysequenceofsensor ob-servations.This likelihoodcalculation is thekey element ofthe route choicemodel estimationmethod, whichis applied inSection7onan artificialnetwork andrandomlygeneratedobservationpatterns.Section8describesaprobabilistic link utilizationestimationmethod,whichisbasedonsimilarprinciplesasthelikelihoodcalculationmethod.InSection9,this methodistestedinasimulateduse-case.Then,Section10describeshowweappliedthedevelopedmethodologiesona Wi-Fidatasetthatwe collectedduringtheTTAssen,amusicfestivalintheDutchcityAssen.Section11comparesourmodel estimationmethodwithtwo differentimplementations ofthe network-freedataapproach (Bierlaire andFrejinger,2008). Weshowthat withaparticularimplementationofthemeasurementequation,thenetwork-freedataapproachcan be in-terpretedasapath-basedversionofourrecursivemethod.Section12discussesthecomputationalcomplexityofthemodel estimationmethodandSection13discussestheapplicabilityofthemodelindifferentcontexts.Thepaperendswith con-clusionsandfuturestepsinSection14.

2. Reviewofroutechoicemodelestimationapproaches

Manytypesof datahave beenused to estimate the parameters ofdiscrete route choice models, ortraveller behavior modelsingeneral.Traditionally,surveyswerecommonlyusedtoestimatemodelparameters,butwiththeriseofnewdigital technologies,estimationshiftedmoretowardstheuseofbehavioralobservationsfromsensors.Ingeneral,datasources to estimateroutechoiceorlocationchoicemodelscanbeclassifiedintotwogroups;datasourcesreportinglocations(location datasources)anddatasourcesreportingproximitiestocertain(inmostcasesfixed)nodes(proximitydatasources).Inthis paper,wetreatproximity asabinaryconcept;either,theindividualisclosetothenode,orheisnot.Wewillnowbriefly reviewbothdatasourceclasses,withafocusonitsuseinroutechoicemodelestimation.

2.1. Locationdatasources

Datasourcesthat directlyreport locationsofindividualsmainlyincludeself-reportedbehavior (MahmassaniandPeeta, 1993;Abdel-Atyetal.,1995;Ramming,2001)andGPSmeasurements(Broachetal.,2012;Menghinietal.,2010;Tonetal., 2018;Galama, 2015). Bothtypes ofdatacontain errors anduncertainties. Inthe caseofcollectedGPS-traces orreported routes,datamightnotcorrectlyrepresentthetrueroutesofpeople.ItisknownthattheaccuracyofGPS-measurementsis highlyinfluencedbytheenvironmentandthehighbatteryconsumptionofGPS-localizationlimitsthemaximumfrequency. Moreover, the limitedability of humans to reproduce their takenroutes makes reported route data also unreliable to a certainextent.Todealwiththeproblemofincompletetrajectorydata,reconstructionoftrajectoriesseemsalogicalchoice, frequently by assuming a shortestpath choice (Ramming, 2001; Lu et al., 2018). Up to certain levels ofinaccuracy and gapduration,thismightworkfineandcangive satisfyingresults.BierlaireandFrejinger(2008)howeverwarnthatbiases areeasilyintroducedwhenapplyingthesetrajectoryreconstructiontechniques.Asanalternative,heproposesamethodto estimate routechoice models withnetwork-freedata,reducing theneed fortrajectory reconstruction andmap-matching.

(4)

Fig. 1. Example showing the difference between (a) location data sources and (b) proximity data sources, with respect to route likelihood. In the case of location data it is impossible to distinguish from the two observations whether the individual took the upper or the lower (actual) route. In the case of proximity data, the absence of an observation of the individual near the sensor at the upper node makes the lower route much more likely to be the actual one.

Oyama andHato (2018) proposea network-freeestimation methodwhich relieson a link-based routechoice model and reducesbiasesbyestimatinglink-specificstandarderrors.Bothmethodsuselocationdatatoestimatethemodelparameters.

2.2. Proximitydatasources

Data sources that reportwhen individualsare closetocertain nodesinclude technologiesasWi-Fi-sensing,Bluetooth, RFID,ormobilephonenetworkdata.Wi-Fi-traceshavebeenusedtostudyactivityanddestinationchoicebyinferringtrip originsanddestinationsfromthetraces(Danaletetal.,2014;Danalet,2015;Yoonetal.,2006). Yoonetal.(2006)studied routechoicebehavioraswell,bygeneratingadistance-basedsetofpathalternatives, andstatisticallyderiving routesplits fromthedata.Alsomobilephonenetworkdatahasbeenusedtostudydestination choice(Iqbal etal.,2014;Wangetal., 2018) aswell asroute choice (Leontiadis etal., 2014; Huang et al., 2018). Huang et al. (2018) estimatedthe perception parameter ofaC-logit modelwithso-calledantenna IDpaths. vandenHeuveletal.(2015) used Bluetoothscan-unitsto estimate aroute choicemodelina trainstation.The scan-unitswereplaced suchthat all alternativeroutescouldbe un-ambiguouslyobserved.Inurbannetworks, achievingfullobservabilityofallpossibleroutesisgenerallyinfeasible.Inthese contexts,Bluetoothobservationsfromselectedlocationsinthenetworkareoftenusedtoapproximatedensities,flows and travel orwaitingtimes(Versicheleetal., 2012;Larsen etal.,2013;Kurkcu andOzbay, 2017; LesaniandMiranda-moreno, 2018).However,tothebestofourknowledge,ageneralanddedicatedestimationmethodforadiscreteroutechoicemodel hasnotbeendevelopedforthistypeofdatasourcessofar.

2.3. Keydifferencebetweenlocationandproximitydata

As brieflyexplainedin theintroduction(Section 1), thereisafundamentaldifference whenwe comparelocationdata andproximitydatawhenitcomesto routechoicemodelestimation.Forlocationdatasources,onlyactual measurements provideinformationabouttheactualtakenroute.Existingtrajectoryreconstructionandestimationtechniquesaregenerally basedonthe measuredlocationsofanindividual only.In contrary,forproximity datasources,alsotheabsenceofsensor observationscontributestothelikelihoodofroutesthatdonotcrosstheparticularsensor.Thiskeydifferenceisvisualized by Fig.1,inwhichtheabsenceofan observationoftheindividualneara sensoratthe uppernode (Fig.1(b))makes the lowerroute morelikelytobe theactual one.Thisparticularfactmakes existingrouteestimationmethodsasproposedby

Bierlaire andFrejinger(2008)andOyama andHato (2018)not applicable,or atleastnot optimal,forestimation ofroute choice models usingproximity data.The methodproposed inthispaper doesexplicitlytake theabsence ofobservations intoaccount,by calculatinglikelihoods forindividualstoexactly reproducethesequenceofsensorobservations,herewith avoidingthoseplaceswheretheyhavenotbeenobserved.Beforeexplainingtheestimationmethodindetail,theframework inwhichthemethodwillbeintegratedwillbeoutlinedinthenextsection.

3. Modelingframework

Alink-basednetworkrepresentationwillbeusedforbothoursensorconfigurationandtheroutechoicemodel.Before formally describingthesensorconfigurationinSection3.1andtheroutechoice modelinSection 3.2,we startwithsome generalnetworkandroutedefinitions.

Weintroduce anetworkG=

(

L,V

)

,withdirected linksL andnodesV.AlinklLisdefinedtohaveastartandend vertex:l=

(

v

1,

v

2

)

,withv1 andv2 bothinV. Apathrthroughthenetworkis definedasasequence oflinks

(

r1,r2,...

)

,

withriLforalli.

Giventhe destination and currentlink ofan individual, we assume that the probabilities ofchoosing a next link are known.Differentmethodsexisttodefinetheseprobabilities.Inthisstudy,aRecursivelink-basedLogit(RL)modelhasbeen applied,whichisformulatedintermsofanext-linkprobabilitymatrix.Section3.2brieflyexplainshowthismodelisusedto determinethenext-linkprobabilities.Regardlessoftheexactmodel,wedefinepi, j,d=P

(

j

|

i,d

)

astheprobabilityofchoosing link jL asthenext linkwhen locatedatlink iL andhavinglink dLasdestination. Weassume that thenext-link

(5)

choicedoesnot dependonthe historicalpath.Ifthe destinationlinkis reached,a personisexpectedtostop moving, so

pd, j,d=0 forall dandj inL.Furthermore,a personthat arrives atthe start node ofthe destination linkis expectedto choosethedestinationlinkasitsnext(andfinal)link,so pi,d,d=1foralllinksithatcanprecededestinationd.

Lateron,itismoreconvenienttowritetheseprobabilitiesinmatrixnotation,soweintroducethenext-linkprobability matrixPdwithentries

(

Pd

)

i, j=pi, j,d.Althoughweareformallynotallowedtouselinkelementsasmatrixindices,forthe benefitofaclearnotation,weimplicitlyassumeanorderingofalllinks,whichweuseforindexing.

3.1. Sensorconfiguration

Inorder to describe a sensor configuration,we first introduce S as being the setof all sensors. Then, we model the sensorconfigurationbymatchingeachsensorsS withanon-emptysetofoneormorelinkstowhichan observationof thatsensorpossiblyapplies.WedenotetherelationbetweenasensoranditscorrespondingsetoflinksbythefunctionLS, suchthat LS

(

s

)

equals thesetoflinksthatare observedby sensors.The setofalllinksthat areobservedby asensoris denotedbyL=

s∈SLS

(

s

)

.

The estimation methods, described in the next sections, put two important requirements on the construction of the observedlinksets.First, eachlinkis allowedto beintheobserved linksetofatmostonesensor.This restrictionlargely simplifies theestimation methods, butposes a restrictionon the applicationscope aswell (see Section 13). Second, the observed link set of a sensor should be constructed in such a way that each possible non-cyclic path that crosses the detectionareaofthesensor shouldhaveexactlyone linkthat isintheobservedlink set.Thesecond requirementcomes fromthefact that we actually aim tocalculate thelikelihood to reproducethe observedsensor crossings, insteadof the likelihoodtoreproduce theexactobserved sensorobservations.Thissimplifies thederivation ofourmethodologyand, in addition,ithasapositiveeffectonthecomputationalefficiency.

Modellingtheobservedareaofasensorasasetofoneormoreobservedlinksallowsformanydifferentconfigurations. Fourtypicalwaystoconstructtheobservedlinksetare:

Single-link construction: Inthisconstruction,asensorsimplyobservesone (possiblybi-directional)link. Practicallyit impliesthatanindividualthatisobservedbythissensorundoubtedlytraversesthislink.SeeFig.2(a)foranexample. Single-node construction: In this construction, a sensor observes one single node (intersection). Since each possible

paththroughthenodeshouldhaveexactlyonelinkthatisintheobservedset(secondrequirement),theobservedlink setisconstructedfromallincominglinks.SeeFig.2(b)foranexample.

Multi-node construction: Inthisconstruction,asensorobservesmultiplenodes.Sinceeach possiblepathcrossingthe detection areashould haveexactly one linkthat isin theobserved set(second requirement),theobserved linkset is constructedfromalllinksthatenterthedetectionarea.SeeFig.2(c)foranexample.

Dummy-link construction: Thisconstruction issimilar to thesingle-node andmulti-node construction, withthekey differencethatdummy-linksareinsertedthatconnectincomingandoutgoinglinks.Theobservedlinksetthenconsists ofalldummy-links.SeeFig.2(d)foranexample.

ItisknownforpassiveWi-Fi-orBluetooth-sensorsthatthedetectionrateisfarfrom100%.Therefore,weassumethat eachlinkl isassociatedwitha link-specificdetectionprobability

θ

l,with0≤

θ

l < 1.Obviously,

θ

l=0forallunobserved links(l/L). Inpractice,thedetectionrate

θ

l willdepend mainlyon theutilizedsensor anditsplacement withrespect tothe local surroundings andinfrastructure.Generally, the detectionrateis expectedto increase with a longerduration of stay and shorter distances fromthe sensor. This implies that the detection rate could change ifa person chooses a differentpath in the detectable area (e.g., making a turninstead of crossing the street).In case thedetection rateby a

single-nodeormulti-nodesensordependsheavilyontheexactpathoftheindividualcrossingthedetectablearea,a dummy-linkconstructioncouldbeconsidered.Thiswouldallowforamoredirectspecificationofdetectionratesfordifferentpaths throughthedetectablearea.

3.2.Therecursivelink-basedlogitmodel

Thissection briefly reviewsthe Recursivelink-based Logit model, which isused in thisstudy todefine thenext-link probabilitiesofindividualsinanetwork thattravel towardsa destination.Thesection doesnotcontain anynewideasor insights,although thenotationdiffers slightlyfrom thenotationused by Fosgerauetal.(2013). TheRecursivelink-based Logitmodel(RL) wasintroduced byFosgerauetal.(2013)asan alternativetoexisting discretechoice modelsto describe routechoicebehavior.ThemainadvantageoftheRLmodelisthatithasnorestrictiononthechoiceset.Itsspecifications are comparableto existing traditional discrete choice methods. In the RL model,when currently at link i, the action of choosinganextlinkjhasaninstantaneousutility

v

(

j

|

i

)

+

μ

(

j

)

,wherethestochastic



(j)termsareassumedi.i.d.extreme value type 1 with zero mean and

μ

is a fixed scale parameter. A person travelling from link i to destination link d is modelledtomaximizeitstotalexpectedaccumulatedutility.Theexpectedaccumulatedutilitycanbefoundbysolvingthe Bellmanequation: Vi,d= E



max j∈L(i)



v

(

j

|

i

)

+ Vj,d+

μ

(

j

)



, (1)

(6)

Fig. 2. Four constructions for the observed link set L S(s) of sensor s . The sensor location and its detection range are indicated by the green solid circle and the dashed outlined circle respectively. The links in L S(s) are indicated in red. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

whereL

(

i

)

representsthechoicesetofnextlinkswhencurrentlytravellingatlinki.Sincetheerrortermsareassumedi.i.d. extremevaluetype1and

μ

isinvariant,theequationcanberewrittenas

Vi,d=



μ

ln



j∈L(i)1(v(j|i)+Vj,d)

, i  = d

0 , i = d

(2) (SeeFosgerauetal.(2013)).

Takingtheexponential ofbothsidesoftheequationleavesuswithalinearsystemofequationsinterms ofthe accu-mulatedutilityexponentials.Therefore,wedefine anincidencematrixofinstantaneousutilitiesMd withentriesaccording to

(

Md

)

i, j =

δ

(

j

|

i

)

e

1

μv(j|i), (3)

where

δ

(j|i)is1iflinkjisaneighbouroflinkiandzerootherwise.

δ

(

j

|

d

)

=0forallj,sincethepersonisexpectedtostop movingwhenarrivingatitsdestinationlink.Further,wedefinethevectorb,whichhasbd=1andbi=0fori=d,andthe vectorzd,forwhich

(

zd

)

i=e

1

μ Vi,d.

Theaccumulatedutilityexponentialszdcannowbefoundbysolving

(

I− M d

)

zd= b. (4)

Thenext-linkprobabilitymatrixPdisfinallydeterminedby

(

Pd

)

i, j =

(

Md

)

i, j

(

zd

)

j

l∈L

(

Md

)

i,l

(

zd

)

l

(7)

Linkflowsthatresultfromagivendemandgo,dbetweenoriginoanddestinationdcannowbe calculatedaccordingto

qo,d =

(

I− P Td

)

−1go,d. (6)

Inthisformula, the vectorqo,d containsthelink flows andgo,d is a vectorthat is zeroexceptforits oth element,which containsthedemandfromotod.

3.3.Linksizeattribute

Theoriginal formulation oftheRL model suffersfrom theIIA (independence inalternatives)property. In orderto re-lax thisproperty, Fosgerauetal. (2013) proposed a linksize attribute, whichis comparableto the pathsize attribute in pathchoicemodels(Ben-AkivaandBierlaire,1999).Thecorrectionisachievedbyaddingatermtotheinstantaneous link-transitionutilities,whichisproportional totheflowthrough thislink resultingfromaunit ofdemandbetweenanorigin andadestination.Thecorrectedinstantaneousutilityisdefinedas

v

LS

(

j

|

i,o,d

)

=

v

(

j

|

i

)

+

β

LS·

(

q˜ o,d

)

j·

(

j

)

, (7) where



(j) denotes the length oflink j,

β

LS denotes the link size parameter and

(

q˜o,d

)

j denotes the flow through link

j resulting from a unit of demand between origin o and destination d. The factor

β

LS is supposed to be negative. This implies that links with a large flow, which are likely to have a large contribution to route overlap, get a larger utility reduction. The flow vector q˜o,d is calculated according to Eq. (6)and requires a certain choice of Pd, denoted asPcorr,d. Fosgerauetal.donotprescribehowtochoosethematrixPcorr,d inordertocalculatethelinksizecorrectionflows.Atypical choicewouldbetoderivePcorr,d fromassumingautilityfunctionthatisbasedontripdistanceonly(andu-turnpenalties). ItshouldbenoticedthatthelinksizeutilityfunctionvLS(j|i,o,d)isorigin-specific.Thismakesthatthenext-linkprobability matrixPdandtheincidencematrixMdbecomeorigin-specificaswell,whichenlargesthecomputationalefforttoestimate models.Inourstudy,thelinksizeattributehasbeenappliedwhenestimatingthemodelforthecase-studyattheTTmusic festival(Section10).Intheotherpartsofthepaper,alinksizeattributehasnotbeentakenintoaccount,mainlytoensure readability.Although detailsare omitted,theestimationmethodandsimulations that willbe introducedinthe following sectionstriviallyallowforinclusionofthelinksizeattribute.

Amoreprofoundapproachtorelax theIIApropertyistousetheNestedRecursiveLogit(NRL)model(Maietal.,2015; Zimmermannetal.,2017),whichexplicitlyaccountsforcorrelatedpathutilities.AlthoughtheNRLmodelhasproventobe superiortothe RLmodelinterms ofitsaccuracy, itshighercomplexityhasmadeusdecideto taketheRL modelasthe startingpointforourmethoddevelopment.

4. Sensorobservations

Sincethe full paths ofpersonsgenerallycannot be observed, we definefor eachindividual a sensorobservation path

s∗=

(

s1,s2,...

)

asthesequence ofsensors atwhicha personduringatrip hasbeenobserved.The setof allobservation pathsfortripswithoriginlinkoanddestinationlinkdisdenotedbySo,d.

Itshouldbe noticedthattimestampsoftheobservationsare deliberatelynot takenintoaccount.Inclusionofthetime dimensioncomplicatesthelikelihoodcalculationasexplainedinthecomingsectionsandhasthereforebeenleftout.Incase ofestimatingaroutechoicemodelforpedestriansinanurbancontextorduringanevent,whichthismethodisparticular aimedat,itisquestionablehowtodealwiththistimeaspect,sincepedestriansintheseenvironmentsarenotexpectedto movewithpredictablespeeds.Nonetheless,howtoincludetimestampsintoourlikelihoodestimationsinordertoimprove thepredictivepowerisoneofourkeyquestionstobeansweredbyfutureresearch.

5. Unobservedtravelling

Beforeweexplainhowtocalculatesensorobservationpathlikelihoodsandlinkflows,wewillbrieflydiscusstheconcept ofunobserved travelling. Aquantity that appears to be crucial in our later computations isthe probability ofpassing a certainlink i,whentravelling fromacertain originto acertain destination,andbeingunobservedso far.Inother words, theindividualhasnotbeenobservedyetbyasensorbeforereachinglinki.Tomakethisformal,weintroduceq0(i|o,d)as

beingtheexpectednumberoftimesthatanindividualarrivesunobservedatlinki,whentravellingfromotod:

q0

(

i

|

o,d

)

= E

(

k0

(

i

)

|

o,d

)

, (8)

wherek0(i)equalsthenumberoftimesthatanindividualarrivesunobservedatlinki.Thisquantitydependsontheamount

ofroutes betweenthe originand destination that pass through linki andon the detectionrates ofthe sensors that an individualpassesbeforereachinglinki.Fig.3showstwo examplesofhowthenetworkandsensorconfigurationinfluence

q0(i|o, d). Itshould benoticed thatthe expectednumberofunobserved linkarrivalsisalmost identicaltothe probability

ofarriving unobservedatthespecificlinkatleastonce.Thesmall,butnon-zero,probabilityofcyclestooccur,makes the expectednumberofunobservedlinkarrivalsslightlylargerthantheprobability ofarriving unobservedatthespecific link atleastonce.

(8)

Fig. 3. The expected number of unobserved link arrivals, q 0 ( i | o, d ), for two different sensor configurations ( single-node construction) with a detection rate

θ= 0 . 7 . The wider and greener the link, the higher is the expected number of unobserved link arrivals. The most left link is defined as the origin and the most right link as the destination. The next-link probability matrix P d is based on link distances only plus a u-turn penalty. It can be seen that q 0 ( i | o, d ) is practically zero for the origin link, since only link arrivals are counted.

Thevaluesofq0(i|o,d)arefound bycalculating‘linkflows’that resultfromsendingone unitofflowintothenetwork

attheoriginlinko,withamodifiednext-linkprobability matrix,whichtakesthe link-specificdetectionprobabilitiesinto account.Eachflowthatpassesasensor-equippedlinkwillbeloweredaccordingtoitsdetectionrate

θ

.Detailsaregivenin

AppendixA.

6. Likelihoodofsensorobservations

ThissectionexplainshowtocalculatethejointlikelihoodL(

β

)ofreproducingthesetofsensorobservationpathsSo,dfor allo-dpairs,givenaparameterset

β

,whichinfluencestheelementvaluesofthenext-linkprobability matrixordetection rates. Thislikelihood can be maximizedin order toestimate utility parameters (Ben-Akiva andLerman, 1985). The joint likelihoodiscalculatedbymultiplyingallprobabilitiestoobservetheindividualpaths:

L

(

β

)

= o∈L d∈L s So,d P

(

s

|

o,d,

β

)

, (9)

whereP(s∗|o,d,

β

)isthelikelihoodtoreproducethesensorobservationpaths∗,givenorigino,destinationdandparameter set

β

.TocalculateP(s∗|o,d,

β

),wedistinguishbetweenemptyandnon-empty sensorobservationpaths.Thefollowingsub sectionsdescribehow tofindthelikelihoods forboth categories.Forsimplicity,fromnowonwe willomitthe

β

termin ournotation,sinceallfurtherderivationsdonotexplicitlydependon

β

.

6.1. Likelihoodofemptyobservationpath

Givenanoriginlinkoandadestinationlinkd,P(|o,d)istheprobabilitythatanindividualthattravelsfromotodisnot observedbyasinglesensor.Inordertocalculatethelikelihoodofanemptysensorobservationpath,weusetheexpected numberofunobservedarrivalsatthedestinationlink,q0(d|o,d)(seeSection5).Itshouldbenoticedthatapersoncanonly

reachthedestinationonce,sincepd,j,disdefinedtobe0foreachdestinationdandlinkj.Asaresult,q0(d|o,d) equalsthe

probabilitytoarriveatthedestinationlinkunobserved,whichallowsustowrite:

P

(

|

o,d

)

=

1 , if o= d

(

1 −

θ

d

)

· q 0

(

d

|

o,d

)

, otherwise .

(10)

Eq.(10)simplystatesthatthelikelihoodofanemptysensorobservationpathequalstheprobabilitytoarriveunobserved atthedestinationlink,multipliedwiththenon-detectionrateofthedestinationlink.Incasetheoriginanddestinationare thesame,thelikelihoodobviouslyequals1.

6.2. Likelihoodofnon-emptyobservationpath

Givenan origin linko and a destinationlink d,P

((

s1,s2,...,sn

)

|

o,d

)

is the probability that an individual that travels fromo todisobservedbythesensors

(

s1,s2,...,sn

)

,inthegivenorder.Sincethefirstsensorobservations1 isassociated withexactlyonelinkinthelinksetLS

(

s∗1

)

,wemayexpressthisprobabilityasasimplesum:

P

((

s1,s2,...,sn

)

|

o,d

)

= l∈LS(s∗1)

P

((

l,s2,...,sn

)

|

o,d

)

, (11)

whereP

((

l,s2,...,sn

)

|

o,d

)

denotestheprobabilitytobefirstobservedatlinkl,followedbythesensorss2tosn.Sinceactual choicesareassumedtobeindependentofhistoricalchoices,thisprobabilitycanbedecomposedasfollows:

P

((

l,s2,...,sn

)

|

o,d

)

= q0

(

l

|

o,d

)

·

θ

l· P

((

s∗2,...,sn

)

|

l,d

)

(12)

Thetermq0(l|o,d)·

θ

l,theexpectednumberofunobserved linklarrivalstimesthedetectionrate,equals theprobability that anindividualsfirstobservationhappensatlinkl.ThetermP

((

s2,. . .,sn

)

|

l,d

)

equalsthe probabilitytoreproduce the

(9)

remaining sensor observation path, startingat link l. Substituting Eq.(12) into Eq. (11) gives us the following recursive scheme: P

((

s1,s2,...,sn

)

|

o,d

)

= l∈LS(s∗1) q0

(

l

|

o,d

)

·

θ

l· P

(

(

s∗2,...,sn

)

|

l,d

)

. (13) With thisequation, the likelihoods of the observation paths can be calculated recursively using a standard dynamic programmingtop-downapproach,inwhichearlierresultsof(sub-)problemsarestoredandre-used,whichiscalled memo-ization(Cormenetal.,2009).Atfirstsight,itcouldappearthatthecomputationalcomplexityofthecalculationcanblow-up veryeasily incaseoflongobservationpaths andmanyobservedlinksper sensor.However, we noticethatcalculation of

P

((

s2,. . .,sn

)

|

l,d

)

involvescalculationofP

((

s3,. . .,sn

)

|

l2,d

)

foralll2∈LS

(

s∗2

)

,whichisregardlessofthelinkl.Thefactthat

thesub-probabilitiestobe calculatedareindependent ofthefollowedpathinthe recursiontree,makes that thenumber offunctionevaluationsdoesnotgrowexponentially. SeeSection12foracomprehensiveexaminationofthecomputational complexity.

InSection 7,we usethe likelihoodcalculation toestimate the parametersof aRL-model withan artificaldata set.In

Section10,themethodisappliedonarealdatasetthatwascollectedduringtheTTAssenFestival. 7. Simulateduse-case:EstimatinganRLmodel

Toanalyzetheapplicabilityofthelikelihoodcalculation(seeSection6),wewillnowuseittoestimateaRecursiveLogit (RL)model.Thissectiondescribesthemethodologytoestimatetheparametersbasedonasimulateddatasetofagentsthat movethrough partially observednetworks. Toevaluate theperformance of themethodology, we firstgeneratea number ofagent paths through thenetwork, according to an RL model withpredefined parameter values

β

0,which serves as a

groundtruth. These networkpaths are then reducedto sensor observationpaths by checkingwhich linksin a path are coveredby which sensors andtakingthe detectionrates intoaccount. Then, we lookfor thoseRL parameter values

β

est that maximize the log-likelihoodof thesesensor observationpaths. Finally, we compare the estimatedparameter values

β

est withtheoriginalvalues

β

0,aswellastheresultingnetworkuseofagents.

7.1. Networkandbehavior

Anetworkhasbeendefinedthatconsistsof224bi-directionallinksfromwhichfouraredefinedasanorigin/destination link.See Fig.4 fora schematicvisualizationof thenetwork. The origin/destinationlinksare diagonally connectedto the cornersofthenetwork.

Sincewewantourmodeltoestimatepreferencesregardingdifferentlinkcharacteristics,wedefinedtwodifferentroad types,whichareindicatedinthefigurebytheirlinethickness.Threedifferentsensorconfigurationshavebeentested,shown inFig.4(a),(b)and(c). Sensorsare placedatnodes(single-nodeconstruction, Fig.2(b))andare visualized aslarge green dots.Eachsensorhasadetectionrate

θ

.

Wedefinethefollowinginstantaneousutilitiestomovefromlinkitolinkj:

v

(

j

|

i

)

= −

(

1 +

β

R1· 1 R1

(

j

)

+

β

R2· 1 R2

(

j

))

·

(

j

)

− c penalty · 1 U

(

i,j

)

. (14)

Inthisexpression,1R1

(

j

)

isanindicatorfunctionwhichevaluatesto1incaselinkjbelongstothesetoflinkswithroad

type1 (thinline)and0otherwise.The function1R2

(

j

)

evaluates to1 incaselinkj belongstotheset oflinkswithroad type2(thickline),0otherwise.Thefunction



(j)denotesthelengthoflinkj.

β

R1and

β

R2 aretheroadtype-specificutility

parameters.Ifbothparameterswouldbe0,theutilitywouldbedeterminedbyroutelengthonly.Finally,thefunction1U(i,

j) evaluates to1 incasethetransitionfromlink i tolink jis au-turn and0otherwise.Multipliedwitha fixed constant

Fig. 4. The network used for the simulated use-case. The thickness of the line represents the road type (thin line = type 1, thick line = type 2). Three different sensor configurations (shown in sub figures a, b and c) with an increasing number of sensors, which are indicated by the large green dots, have been tested. The four diagonal links connected to the corners are origin/destination links and are modelled to have zero length. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

(10)

cpenalty,whichwillnot bevaried duringtheoptimization,andsubtractedfromthedistanceterm, thiseffectivelyprevents theagentfrommakingu-turns.

Foreach OD-pair,a total of n paths havebeen generated accordingto the RL model withutilities asin (14),a scale parameter

μ

(fixedto 1)androadtype-specificutility parameters

β

0=

(

β

R1,0,

β

R2,0

)

T.From thesetravelled paths,setsof

OD-specificsensorobservationpathshavebeengenerated,So,d.

7.2. Log-likelihoodoptimization

Thejointlog-likelihoodofthegeneratedsensorobservationpathshasbeenmaximizedbyvaryingthe‘to-be-estimated’ parameters

β

=

(

β

R1,

β

R2

)

T:

β

est= argmax β ln



L

(

β

)



(15) = argmax β o∈L d∈L sSo,d ln



P

(

s

|

o,d,

β

)



(16)

The optimizationwasperformedusingMATLAB’snon-linearconstrainedoptimizer(functionfmincon). Gradientswere ap-proximatedwithfinitedifferences. A lowerbound of-0.99 wasset asaconstraintforboth parametervalues,because as soonasoneoftheparametervaluesdropsbelow-1,link utilitiesmaybecome positive,causinga potentialpreferencefor infinitetriplengths.

7.3. Evaluation

Theestimatedmodelwasevaluatedusingaseriesofmetrics.

tstat,R1 andtstat,R2. Thesevariablesdenotethe t-statisticsforthe parameters

β

R1,est and

β

R2,est respectively.Theirvalues arecalculatedby dividingtheparametervalues

β

R1,est and

β

R2,est bytheir standarderrors,whichwe estimatedbythe Cramér–Raolower bound.TheHessianofthelog-likelihood,whichisinvolvedinthiscalculation,wasapproximatedby finitedifferences.

ρ

2.Thismetricisdefinedas

ρ

2=1ln

(

L(βest)

)

ln

(

L(β0)

)

andisameasureforthemodelfit.Inthisexample,

β

0=0.Itshouldbe noticedthat thelog-likelihoodsinthisformulaapplyto thesensorobservationpaths,andnotdirectlytoroute choice behavioritself.Therefore,thevalueof

ρ

2shouldbeinterpretedwithcareandnotbedirectlycomparedwith

ρ

2values

thatarecalculatedfromthecompleterouteperspective.

RMSE.Inordertomakeastatementaboutthepredictiveperformancewithrespecttorealnetworkuse,wecalculatethe rootmeansquareerrorofweightedlinkflows:RMSE=



l∈Lwl·



˜ qest,l− ˜q0,l



2

.Inthisformula,est,l and0,l arethe linkflows,intheestimatedandsimulatedcaserespectively,thatresultfromademandof1foreachOD-pair(according toEq.(6)).Themeanisweightedaccordingtolinklength:wl=

(

l

)

/ i∈L

(

i

)

, where



(l) denotesthe lengthof link

l.

NRMSE.The normalizedrootmeansquareerrorofweightedlinkflows NRMSE=RMSE/ l∈Lwl· ˜q0,l.Since the NRMSE

isnormalized by the meanlink flow, its value isexpected to be lessdependingon the network size andnumber of OD-pairs.

For7 differentsetsof parameters,we performed30simulations, forwhich weall estimatedmodel parameters

β

R1,est and

β

R2,est.Foreachparameterset,theaverageandstandarddeviationoftheestimatedparametersandevaluationmetrics arereportedintable1.

Table1 showsthatthe methodratherprecisely rediscoversthe parametervaluesthat were usedto generatethedata sets.Thet-statisticsindicatethatthevaluesarealsosignificantlydifferentfrom0,unlesstheyweresupposedtobe0.Also withrespect to the predictive performance, the methodseems to perform really well, since the averageNRMSE over30 simulationsdoesnotexceed5%.

Table 1

Estimated parameters βR1,est and βR2,est and evaluation metrics for three different parameter sets. The values report the mean and standard devia- tion over 30 simulations. Default parameter values: n = 100 , μ= 1 , c penalty = −100 , θ= 0 . 7 .

# sensors βR1,0 βR2,0 βR1,est βR2,est tstat,R1 tstat,R2 ρ2 RMSE NRMSE

4 0.00 0.50 −0 . 00 ± 0 . 05 0.52 ± 0.06 −0 . 05 ± 1 . 1 8.6 ± 0.22 0.17 ± 0.01 1.4 ± 0.82 0.04 ± 0.02 9 0.00 0.00 0.01 ± 0.02 0.00 ± 0.02 0.37 ± 1.1 0.12 ± 0.96 0.00 ± 0.00 1.2 ± 0.47 0.03 ± 0.01 9 0.00 0.50 −0 . 00 ± 0 . 03 0.50 ± 0.05 −0 . 05 ± 1 . 0 11 ± 0.17 0.15 ± 0.01 1.0 ± 0.58 0.03 ± 0.01 9 0.50 0.00 0.50 ± 0.04 0.00 ± 0.04 12 ± 0.33 0.03 ± 1.00 0.26 ± 0.01 0.83 ± 0.61 0.02 ± 0.02 9 −0 . 10 0.30 −0 . 10 ± 0 . 03 0.31 ± 0.03 −3 . 87 ± 0 . 97 9.8 ± 0.50 0.11 ± 0.01 1.3 ± 0.79 0.03 ± 0.02 9 0.10 −0 . 30 0.10 ± 0.02 −0 . 30 ± 0 . 02 4.2 ± 0.92 −12 . 73 ± 1 . 3 0.16 ± 0.01 1.3 ± 0.55 0.03 ± 0.01 25 0.00 0.50 −0 . 00 ± 0 . 02 0.50 ± 0.03 −0 . 18 ± 0 . 95 16 ± 0.21 0.12 ± 0.01 0.67 ± 0.39 0.02 ± 0.01

(11)

Fig. 5. The NRMSE for (a) different sensor configurations, (b) different detection rates θand (c) different levels of uncertainty of actual detection rates. All figures show a box plot, showing the median (red line) and the 25th and 75th percentiles (blue edges of box). Whiskers of the box plot extend to the most extreme points that are not considered outliers. The outliers are plotted individually using the ’+’ symbol. Each box plot is based on 30 simulation runs. Default parameter values: n = 100 , βR1,0 = 0 , βR2,0 = 0 . 5 , μ= 1 , c penalty = −100 , #sensors = 9 ( Fig. 4 (b)) and θ= 0 . 7 . (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

7.4.Relationbetweenpredictiveperformanceandsensorcharacteristics

Togeta better understandingofthe relationbetweenthe sensorconfigurationandthe predictive performanceof the estimatedmodel,threeanalyseshavebeenperformed.

First,we looked at theeffect that thenumber ofsensors has onthe NRMSE. The resultoftesting thethree different sensorconfigurations (Fig.4(a), (b)and (c))is shownin Fig.5(a). Withmore sensors, thepredictive performance of the modelincreases(NRMSEdecreases),whichisinlinewithourexpectations.Clearly,thepositionofsensorsisanimportant factoraswell,butathoroughanalysisoftheeffectsofsensorpositionswillbeleftforfutureresearch.

Second,we looked intothe effectof thedetectionrate

θ

onthe NRMSE.For detectionratesbetween0.3and 0.9, we calculatedthe NRMSE (mean and standard deviation) over30 simulations. The results are shownin Fig.5(b). The figure showsa generalincrease of thepredictive performance (decreasing NRMSE) withincreasing detectionrate. The apparent localminimumaround

θ

=0.5wasnotpresentinasecondrun(fromwhichwedidnotshowresultsinthispaper),sowe attributethepresenceofthisminimumtothestochasticnatureofourassessment.

Third,westudied thepredictiveperformance incaseofuncertaintyaboutthe actualsensordetectionrates.Sofar,we assumeddetectionratestobedeterministic.Forthisanalysis,duringsensorobservationpathsgeneration,theactualsensor detectionrateswere randomly drawnfrom aGaussian distribution witha meanof0.7anda standard deviation

σ

. Dur-ingparameterestimation,we assumedthedetectionrates’deviationsfromtheir meanstobe unknown,so allrateswere consideredto beequal to

θ

=0.7.The relationbetweentheresultingNRMSE and

σ

givesan idea abouttheeffectofthe detectionrateuncertaintywithrespectto predictiveperformance. Fig.5(c)showsthe results.Asexpected, the predictive performancedecreaseswithincreasinguncertainty,whichshowstheneedforaproperunderstandingofoursensor detec-tioncharacteristics.Atthesametime,weconcludethatintroductionofthedetectionuncertaintydoesnotdrasticallylower thepredictiveperformance.

8. Linkutilizationestimationfromsensorobservationpaths

Besidescalculatingthelikelihoodsofa givensensorobservationpath,wecan estimatethelinkutilizationfromsensor observationpaths. Without any sensorinformation, our best stochastic guess would be that the route for an individual movingfromoriginotodestinationdwouldbedescribedbythelinkflowsasalreadycalculatedbyFosgerau(seeEq.(6)). However,knowingatwhichlocationstheindividualwasidentifiedandwhereheorshewasnotidentified,wecanimprove theselink utilizationestimations.Forthisend,we followasimilarapproachasforthesensorobservationpathlikelihood calculations(Section6).

Firstofall,wehavetodefine linkutilizationasbeingconditionalwithrespecttoameasuredsensorobservationpath. Therefore,we introduceq(i|o, d, s∗), asbeingthe expectednumberoftimesthat link i isvisited giventhesensor obser-vationpaths∗,havinglinkso anddasoriginanddestinationrespectively. Similarasforderivingthelikelihoodsofsensor observationpaths,westartwithcalculationofq(i|o,d,∅);theexpectednumberoflinkarrivalsgiventhatanindividualhas notbeenobservedbyasinglesensor.Wecanshowthat

q

(

i

|

o,d,

)

=

(

δ

i,o+ q0

(

i

|

o,d

)

·

(

1 −

θ

i

))

·

P

(

|

i,d

)

P

(

|

o,d

)

, (17)

where

δ

i,o istheKroneckerdelta,whichequals1ifi=o and0ifi=o.ThetermsP(∅|i,d)andP(∅|o,d)aretheemptysensor observationpathlikelihoodsstartingfromlinksi andorespectively(seeSection6.1).Thederivationofthisformulacanbe foundinAppendixB.Tofindtheexpectedlinkutilizationincaseofanon-emptysensorobservationpaths∗,wefirstdefine

(12)

ˆ

q

(

i,s

|

o,d

)

= q

(

i

|

o,d,s

)

· P

(

s

|

o,d

)

(18)

SimilarasinSection6.2,qˆ

(

i,s

|

o,d

)

canbeexpressedrecursively: ˆ q

(

i,

(

s1,s2,...,sn

)

|

o,d

)

= K1 + K2, (19) with K1 =

(

δ

i,o+ q0

(

i

|

o,d

)

·

(

1 −

θ

i

))

· P

((

s∗1,s∗2,...,sn

)

|

i,d

)

(20) K2 = l∈LS(s∗1)



q0

(

l

|

o,d

)

·

θ

l · ˆ q

(

i,

(

s∗2,...,sn

)

|

l,d

)



. (21)

Insteadofacompletederivation,wewillexplain theintuitionbehindtherecursivescheme.Letusassumethatthelength of our sensorobservation path s∗ equals

ζ

.In this case, link i can be visitedduring

ζ

+1different periods: before the firstobservation,betweenthefirst andsecond observation,betweenthesecond andthird observation,andso on,tillthe period afterthe last observation. The total expectednumber ofvisits oflink i willbe the sum ofthe expected number of visitsof link i duringthese

ζ

+1 periods. In thislight, the termK1 counts theexpected numberof visits beforethe

firstobservationfromtheremaining sensorobservationpath(seetheanalogywith(17)).ThetermK2 recursivelyaddsthe

expectednumberofvisitsoflinkithat occurafterthefirstobservationoftheremaining sensorobservationpath(seethe analogy with(13)).Finally,q(i|o, d, s∗) canbe easily computedfromqˆ

(

i,s

|

o,d

)

usingEq.(18).Cumulativelink flows can beestimatedbysummingthelinkutilizationforeachindividualsensorobservationpath.Theselinkflowsdonotrepresent absolutevaluesbuthavetobeinterpretedinarelativeway,sincenoteverytripisnecessarilybeingrecorded.Thisrelative interpretationcanalreadyprovidevaluableinsightsinto,forinstance,therelativepopularityofdifferentroutesconnecting thesameoriginanddestination.Toestimateabsolutecumulativelinkflows, themethodhastocorrectfortheamountof non-recorded trips.It dependsonthe applicationandtheavailability ofother datasources (suchascountingsensors for specific cross-sections),whetherasimplecorrection canbe applied.One examplecould bea correctionfactorthat isthe inverseofthefractionoffestivalvisitorsthatdownloadedthefestivalapp.

Atthispointitisworthtomentionthatanothertechniqueexiststhataimstoreconstructaroutefromsensor observa-tions.ThetechniqueisbasedonHiddenMarkovModels(HMM)andusestheViterbiAlgorithmtofindthemostlikelypath toreproduceasequenceofsensorobservations(MusaandEriksson,2012).Oneofthemajordifferencesisthatthe HMM-Viterbimethod considers thediscretized time ofthe sensorobservations as well andherewith indirectlyassumes speed distributionsofindividuals.Forpedestriansinan urbanoreventcontext,theeffectofsuchimplicitspeedassumptionson theaccuracy oftheoutcomesis stillunclear. Besides this,the HMM-Viterbimethod producesa singlerouteasbeingthe mostlikelyone.Ourproposedmethodisaprobabilisticone,assigningautilizationvaluetoeachlinkinthenetwork,which makesthemethodmoresuitable foraggregationpurposeswebelieve,especiallyincaseswithlargegaps.Anadvantageof theHMM-Viterbimethodistheabilitytodealwithmultipleconcurrentsensorobservations.

9. Simulateduse-case:Linkutilizationestimationforasingleindividual

Totestthelinkutilizationestimationmethod,asdescribedinSection 8,weimaginean individualthat movesfroman origintoadestinationinanartificialnetwork,asindicatedinFig.6.Theindividualsroutechoicebehaviourismodeledby thenext-link probabilitymatrixPd,whichisconstructedassuming autilityfunction thatisbasedonlinkdistancesanda penaltyforu-turns(seeSection3.2).Wedefinedthreeimaginarysensorobservationpathsthatcanresultfromthetrip.The bigredcirclesinFig.6indicateperscenariothesensorlocationswheretheindividualhasbeenobserved.Thegreencircles representsensorlocationswheretheindividualhasnotbeenobserved.Theobservedlinksetsweredefinedaccordingtothe

single-nodeconstruction(seeFig.2(b)).TheutilizationperlinkhasbeenestimatedusingEq.(18)andtherecursiveformula

(19).TheresultsareshowninFig.6,wheregreenerandwiderlinesindicatehigherprobabilitiesthatanindividualwiththe givensensorobservationpathpassesthislink.

Fig. 6 shows that the calculated flows ‘avoid’ the sensor locations where the individual has not been observed. This clearlydemonstrates thebenefit ofthismethod over routereconstruction techniques whereonly thelocations aretaken intoaccountwheretheindividualhasactuallybeenobserved.

Toverifythelinkutilizationcalculation,wesimulatedatotalofntrajectoriesfromtheleft-bottomorigintothetop-right destination,whichwerandomlytransformed intosensorobservationpaths,usingthedetectionratesofthesensors(fixed at0.7).Fromallrandomlygeneratedsensorobservationpaths,weselectedonlythosethatmatchedthescenarioofFig.6(b) ((bottom-left, center-center)). The average link utilization over thisset of paths gives usan approximation ofa person’s expectedlinkutilization,giventhatthepersonwasobservedbythe(bottom-left)andthe(center-center)sensor.

Next,wewantedtoknowwhetherthis(simulated)truelinkutilizationcouldbecorrectlyestimatedbyourmethod.For thispurpose,wecalculated theRMSEbetweenthesimulatedandtheoretically derivedlink utilizationfordifferentvalues ofn(the unfiltered numberofsimulatedtrajectories).Theresults areshowninFig.7(notice thelogarithmicscales).The figurerevealsthetypical”inversesquareroot” relationbetweensamplesizeandsampleerrorofthemean,whichsupports thebeliefthatourmethodisabletocorrectlyderivetheexpectedlinkutilization.

(13)

Fig. 6. For three different sensor observation paths, the link utilization has been plotted on the network. The greener and thicker the line, the more likely it is that an individual passes that link, given the bottom-left origin, top-right destination and the sensor observation path as indicated by the big red circles. The green circles represent sensors where the individual has not been observed. It can be seen that the flows towards green circles are relatively small. Default parameter values: μ= 1 , c penalty = −100 and θ= 0 . 7 . (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 7. The RMSE between the simulated approximation of link utilization and theoretical link utilization, following the calculations of Section 8 . The simulation and theoretical calculation are based on a sensor observation path as in Fig. 6 (b). The figure shows a clear decrease of the RMSE for an increasing initial sample size n and herewith supports the validity of our link utilization calculation method.

10. Applicationatamusicfestival

WetestedourroutechoicemodelandlinkutilizationestimationmethodonadatasetthatwascollectedduringtheTT Assen.ThisDutchmusicfestivalisorganizedyearlyasasidefestivityaroundtheDutchTTmotorracingevent.In2018,the festivallastedfromJune27,tillJune30,andattractedapproximately160,000visitors.Atotalof11stageswerebuiltinthe citycentreofAssen,whereadiversityofmusicalperformancesandmotordemonstrationsweregiven.Incooperationwith thecompanyConnectionSystemsB.V.,weinstalledWi-Fi-sensorsat15differentlocationsinthecitycentre.Fig.8(a)shows thesesensorlocations. Withinaradius of20mon average,thesesensors identifydevicesinsearch foraWi-Fi-network, basedontheirMAC-address.IfthesameMAC-addressisdetectedbymultiplesensorsinthenetwork,wehavesomeinsight intothemobilityofthepersoncarryingthespecific device.Theobservedlinksetsofthesensorsagainweregoverned by thesingle-nodeconstruction(seeFig.2(b)).

Thequestionthatwetriedtoanswerforthisspecificeventwastowhatextentroutechoicebehaviorwasinfluencedby thestagelocations.Itcanbehypothesizedthatpeopletrytoavoidthebusylocationswhentheywalkthroughthecity.

10.1.Datacleaning

Wi-Fi-datawascollectedduringthefourdaysoftheevent.Formodelestimation,weonlyuseddatafromtheeveningof June28,(startingat6PM)tillthemorningofJune29,(endingat5AM),sincestagelocations,andherewithlink characteris-tics,differedfromdaytoday,whichwouldcomplicateourdatapreparationifwetookmultipleeveningsintoaccount.The rawdatatellwhichMAC-addresseshavebeenobservedbywhichsensorsatwhattimes.Thedataiscomposedof observa-tionsofstationarybehavior andobservationsoftravellingbehavior.Forestimationofthemodel,weneededthetravelling observations,together withthe triporigins anddestinations. Toget thistrip information,we processedthe raw dataas follows:

(14)

Fig. 8. a) The network that was used to assess the mobility during TT Assen festival. The green dots represent the sensor locations (placed at nodes in the network). The yellow circles represent the stage locations on Thursday and the thick red lines represent links that were adjacent to a stage area. b) The estimated cumulative link flows. Parameters: μ= 50 , βLS = −0 . 2 , c penalty = −100 , θ= 0 . 46 . (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Aperson isassumedto be stationarylocated ata certain sensor-equipped node whenheor sheis identified by this sensormorethanonceinaperiodthatlastsforatleast20min.

Thesestationarylocationsaresetastheoriginanddestinationofatrip.Theobservationpointsinbetweenareusedto definethesensorobservationpathofatrip.

Also,asingleobservationbyoneofthefoursensorsthatwereplacedatthecityentranceroads(seeFig.8(a))definesa triporiginordestination,sincepeopleareexpectedtoonlypassthesenodeswhileenteringorleavingthecitycenter. Whiletravelling,peoplecan beobservedmultipletimesby thesamesensor.Assome Wi-Fi-sensorswereplacedclose

toeachother,personswereevenoccasionallyidentifiedalternatelybytwodifferentsensorsduringacertainpartofhis orhertrip.Toaccountforthis,wediscardallobservationsforwhichthesamepersonwasalreadyidentifiedearlierby thesamesensorwithinatrip.

Whiletravellingfromtheirorigintotheirdestination,peopleareassumedtotakearoutewhosedistanceisnottoolarge comparedto theshortestdistance possible.Forthisend, foreach sensorobservationpath, we calculatedthe shortest cycle-freepath,connectingtheoriginwiththedestination,thatpassesthroughallobservationpointsinthegivenorder. Dividing thispathlength bythe lengthof theshortestpathfromoriginto destination,whichdoesnot necessarilygo throughtheobservationpoints,givesusalowerboundfortheso-calleddetourratio.Toexcludeerratictrips,wefiltered out all witha detour ratioabove 2.5. To findthe shortestcycle-freepath froman originto a destination that passes throughasetofnodesinagivenorder,abest-firstbranchandboundalgorithmwasadopted(A∗-algorithmwithbranch dependentfeasibilityconstraints).Implementationdetailsareomittedsincetheyfalloutsidethescopeofthispaper. Sincewe wereonly interestedin walkingbehavior,we excludedall observationswhoseaverage tripspeed wasbelow

0.5m/sorabove2m/s.Weestimatedtheaveragespeedusingagainthedistanceoftheshortestcycle-freepaththrough allobservationpoints.

Sincethe modelrequirestheoriginanddestinationofa triptobe alink (insteadofa node),dummylinkshavebeen connectedwithallsensor-equippednodes,servingasoriginanddestinationlinks.Thedistancesofthesedummylinkshave beensetto0.Aftercleaning,weendedupwith296sensorobservationpaths,fromwhich197uniqueones.Foreachsensor observationpath,thelinkutilizationhasbeenestimatedusingtheformulasinSection8.Cumulativelinkflowswerederived bysummingtheestimatedlinkutilizationforallsensorobservationpaths.ThecumulativelinkflowsareshowninFig.8(b). Noticethattheselinkflowsdonotrepresentabsolutevalues,sinceonlyaportionofthepopulationhasbeentracked.Hence, onlyrelativeconclusionswithrespecttotheflowscanbedrawnfromthefigure.

10.2. Likelihoodcorrection

Onlysensorobservationsstrictly betweenthe originanddestinationnode aredefinedasbeingpartofa sensor obser-vation path.Thisisadirectimplicationfromourchoicetodefine thefinal sensorobservationtobelongto thestationary phaseandnottothetravellingphase.Hence,byconstruction,thelinkthatleadstothedestinationnodeisneverpartofthe sensorobservationpath.Thisleadstoastructuralunderestimation ofthelikelihoodtoreproduce theactualobservations, whereanobservationbythesensoratthefinaldestinationmightapplytothetravellingphaseaswell.Tocompensatefor this,thelikelihoodascalculatedbyEq.(13)hasbeencorrectedbydividingby

(

1−

θ

d

)

,where

θ

dequalsthedetectionrate ofthesensorlocatedatthedestinationnode.

(15)

Table 2

Estimated parameters βLS , βnormal , βstage and θand evalu- ation metrics for the observations between June 28, 6PM and June 29, 5AM.

Parameter estimate std. err. t-stat p-value

βLS −0.45 0.077 −5.81 6.1e-9

βnormal 0.25 0.022 11.4 0

βstage −0.30 0.056 −5.32 1.1e-7

θ 0.46 0.023 20.1 0

10.3.Analysis

Withthecollectedsensorobservationpaths,westudiedtherelationbetweenroutechoicebehaviorandstagelocations. Forthisend,wefirstidentifiedallthelinksthatwerepartofastagearea.TheselinksareindicatedinFig.8(a)bythickred lines.Thefollowingutilityfunctionwasdefined:

v

(

j

|

i,o,d

)

=

(

−1 +

β

normal· 1 normal

(

j

)

+

β

stage· 1 stage

(

j

)

+

β

LS·

(

q˜ o,d

)

l

)

·

(

j

)

− c penalty· 1 U

(

i,j

)

(22) Inthisexpression,1stage(j)isanindicatorfunctionthatevaluatesto1incaselinkjispartofastagearea(thickredline) and0otherwise.Thefunction1normal(j)returns1foranon-stagelinkand0forastagelink.Further,

β

LSrepresentsthelink sizeattribute value (see Section3.3). Theflow vector q˜o,d, thesecond componentofthe pathoverlapcorrection term, is calculatedaccordingto(6)usingtheutility function(22)with

β

LS,

β

normal and

β

stage setto0.Finally,thefunction1U(i,j) evaluatesto1incasethetransitionfromlinkitolinkjisau-turnand0otherwise.

Thesensorswereinstalledinsuchawaythattheirintersectionscouldbeobservedcompletely,whichmakesusassume thateachsensorhas(approximately)thesamedetectionrate

θ

.Themagnitudeof

θ

,however,wasunknown.Therefore,we decided

θ

tobepartofthesearchspaceinouroptimizationprocess.Thus,wemaximizedthejointlog-likelihoodbyvarying

β

normal,

β

stage,

β

LS and

θ

. Thelog-likelihoodmaximization wasperformedusingMATLAB’sfunctionfmincon,inwhich the detectionrate

θ

wasconstrainedtotheinterval[0,1)and

β

LSwasconstrainedtotheinterval[−1,1].Theparameters

β

normal and

β

stage wereconstrainedtobesmallerthan1(sincepreferencesforcyclesmightoccurotherwise).Thescaleparameter

μ

waskeptataconstantvalueof50.TheresultsoftheoptimizationareshowninTable2.

Whenweanalysetheestimatedparameters,wefirstofallrecognize thenegativevalueofthelinksizeattribute (

β

LS), whichisinaccordancewithpreviousstudies(e.g.,Fosgerauetal.(2013),Zimmermannetal.(2017)).Regardingthe hypoth-esis,werecognizethatthepreferenceforlinksthatarepartofastagearea(

β

stage)issignificantlylowerthanforlinksthat arenotpartofastage area(

β

normal).Althoughotherparameters mightplay aroleaswell, theresultsuggeststhat people actuallytriedtoavoidthecrowdedareaswhileconsciouslywalkingtotheirintendeddestination.

Finally,somewordsaboutthegoodnessoffit.Thevalueof

ρ

2wascalculatedasexplainedinSection7.3.Forthe

refer-enceparameterset

β

0,weselectedzerovaluesforthelinksizeandstage-linkattributesandthevalue

θ

est=0.46forthe detectionrate.The

ρ

2 thatwasfoundequals0.074.Aplausiblereasonforthislowvalueisthatpredictionofsensor

obser-vationpathsisfundamentallymoredifficultthanthetraditionalpredictionofroutes,sincepredictionofsensorobservation pathsisinvolvedwithanadditionalsourceofstochasticity;thesensordetectionrate.Althoughthisstochasticcomponent decreases

ρ

2, it hasto be kept in mindthat we are generally not interestedin predictingtheactual sensorobservation

paths,sowedonotnecessarilyconsideralowvalueof

ρ

2asabadthing.

11. Thenetwork-freedataapproachasapath-basedalternative

Bierlaire and Frejinger (2008) proposed a path-based method to estimate route choice models with unprocessed, network-freelocation data. Theyintroduced the concept ofa Domainof Data Relevance,which corresponds to a physical regioninthenetworktowhichaspecificobservationisrelevant.Akeyelementinthemethodistheaso-called measure-mentequation, which calculates theprobability to observea certain location sequence, givena certain chosen path.The methodwasdesignedtobeusedwithlocationdata,likeGPSmeasurementsorself-reportedtrips.Theauthorssuccessfully appliedtheirnetwork-freedataestimationmethodonasetofself-reportedtripsinanetworkconsistingofalmost40,000 unidirectionallinks.

Thenetwork-free dataestimation approachis similarto our recursiveapproach inthe sense that itestimates aroute choicemodelwithincompletedata.Itwouldthereforebeinterestingtocomparebothmethods.Exceptforthefactthatthe network-freedataapproach involvesgenerationofachoiceset,themethodcanbe appliedtoourstaticsensorcontextin astraightforward way,byinterpreting sensorobservationsaslocation measurementsandobserved linksets fromsensors asthe DomainsofData Relevance.WefollowedthemethodologyasdescribedinBierlaireandFrejinger(2008),wherethe measurementequationresultsinto1incasethepathcrossesall“observed” Domains ofDataRelevanceinthecorrectorder and0otherwise.Thisprovidesuswithanalternativeestimationmethodfortheroutechoicemodel.

Nevertheless, since this implementation does not use the knowledge of the full sensor network, which includes the locationsanddetection ratesof all sensors, we could expect the methodto give biased estimations ifapplied to such a

Cytaty

Powiązane dokumenty