Estimation of a recursive link-based logit model and link flows in a sensor equipped network

(1)

Estimation of a recursive link-based logit model and link flows in a sensor equipped

network

van Oijen, Tim P.; Daamen, Winnie; Hoogendoorn, Serge P.

DOI

10.1016/j.trb.2020.08.003

Publication date

2020

Document Version

Final published version

Published in

Transportation Research Part B: Methodological

Citation (APA)

van Oijen, T. P., Daamen, W., & Hoogendoorn, S. P. (2020). Estimation of a recursive link-based logit

model and link flows in a sensor equipped network. Transportation Research Part B: Methodological, 140,

262-281. https://doi.org/10.1016/j.trb.2020.08.003

Important note

To cite this publication, please use the final published version (if applicable).

Please check the document version above.

Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

(2)

ContentslistsavailableatScienceDirect

Transportation

Research

Part

B

journalhomepage:www.elsevier.com/locate/trb

Estimation

of

a

recursive

link-based

logit

model

and

link

ﬂows

in

a

sensor

equipped

network

Tim

P. van

Oijen

∗

,

Winnie

Daamen,

Serge

P. Hoogendoorn

Department of Civil Engineering and Geosciences, Delft University of Technology, Transport & Planning, PO Box 5048, Delft, 2600 GA, the Netherlands

a

r

t

i

c

l

e

i

n

f

o

Article history: Received 23 September 2019 Revised 19 April 2020 Accepted 26 August 2020 Keywords:

Discrete choice modeling Recursive link-based logit model Wi-Fi-sensors

Crowd monitoring Route choice Link ﬂow estimation

a

b

s

t

r

a

c

t

ThispaperdescribesamethodtoestimatetheparametersofaRecursivelink-basedLogit model(RL)usingmeasurementsofasetofspatiallyfixedproximitysensors,withlimited hitrates,whichcanuniquelyidentifypeople,suchasWi-Fi-,RFID-orBluetooth-sensors. Theobserved‘route’ofanindividual,wherewefocusonpedestriansinanurbanorevent context,ismodelledasthesequenceofsensorsthathaveidentifiedtheindividualduring hisorhertrip.Obviously,these‘routes’containlargegaps,whichmakestraditional esti-mationtechniquesnotapplicable.Althoughwedonotexactlyknowwhathappenswithin thesegaps,wedohavesomespecificinsightabouttheindividualsbehaviorbetweentwo identifications;weknowwithacertainprobabilitywhichisrelatedtothehitrateofthe sensors,thattheindividualdidnot crossanothersensorlocationbetweenthetwo iden-tifications.Thispaperthereforedescribes amethodtoestimatetheparametersofanRL modelthatspecificallyexploitsthisknowledge.Theframework alsoallowsusto formu-lateaprobabilisticlinkutilizationestimationmethod,whichcanbeusedtoestimatelink flowsinanetworkbasedonthesensorobservations.Theeffectivenessofthemethodology isdemonstratedinsimulationusinganartificialnetwork,afterwhichthemethodologyis testedonarealdataset,collectedataDutchmusicevent.

1. Introduction

Discretechoice models havebeenused fordecades todescribe all kindsof humanmobility, includingactivitychoice, destinationchoice,modechoiceandroutechoice.Forthelatter,whichhasthemainfocusinthispaper,manymodelshave beensuccessfullyused,likeProbit, MultinomialLogit(MNL),C-Logit(Cascettaetal.,1996), Path-SizeLogit(Ben-Akivaand Bierlaire,1999) andRecursivelink-basedLogit (RL)(Fosgerauetal.,2013;Maietal., 2018). IntheRL model,whichhasa graph-basedrepresentation,route-choiceismodeledassequentiallychoosinganextlink.Wewilllaterseethatthis struc-tureisalsoverysuitableforoursensor-basedestimationmethod.

Regardless of the exact model type, data is required to estimate the parameters of a discrete choice model. A large amountofdatasourcescanbeadopted,likeGPS-traces,Bluetooth-traces,on-lineoroff-linesurveys,Wi-Fi-traces,RFID,and mobilenetworkphonedata.Althougheachtypeofdatacomeswiththeirspeciﬁccharacteristicswhenitcomestoaccuracy,

∗ _{Corresponding author.}

E-mail addresses: T.P.vanOijen@tudelft.nl (T.P. van Oijen), W.Daamen@tudelft.nl (W. Daamen), S.P.Hoogendoorn@TUDelft.nl (S.P. Hoogendoorn). https://doi.org/10.1016/j.trb.2020.08.003

(3)

availabilityandprocessing, wecanmakeaverycleardistinctionbetweentwotypesofdatasources:location datasources andproximity datasources. Location data,like GPS-traces andself-reported routes, consistsof measured orreported lo-cationsof people.Proximitydata,like Wi-Fi-,Bluetooth- orRFID-traces, consistsofsequences ofﬁxed locations(sensors) whereacertainpersonwasobserved.Atﬁrstglance,thesetwodatasourceslookverysimilar,butifwethinkabitlonger, we realizethat thereis afundamentaldifference betweenthe two typesofdata. From locationdata, we donot getany insightintowhathappenedbetweentwoconsecutivemeasurements.Withproximitydata,however,weknowthatitis im-possible(orveryunlikely)thatapersonpassedsensorlocationselsewhereinthenetworkbetweentwoconsecutivesensor observations.Inotherwords,withproximitydata,contrarytolocationdata,wedohaveinsightintowhat(mostlikely)did nothappenbetweentwoconsecutivemeasurements.Apparently, estimationofaroutechoice modelusingproximity data asksforits ownapproach,especiallysincethegapsbetweentwosensorobservationscan bequitelarge,raising theneed toexploittheknowledgeofwhathappenedin-betweentheobservations.

Thereisverylittleliteratureabouthowtoestimatetheparametersofadiscreteroutechoicemodelwithproximitydata. Thisincontrastwithroutechoicemodelestimationusinglocationdata,forwhichplentyofestimation-relatedmethodshave beenproposed,rangingfrommapmatchingandtrajectoryreconstructionalgorithms,inordertoaugmentincompletedata, tomoreelaboratemethods,likethenetwork-freedataestimationapproachintroducedbyBierlaireandFrejinger(2008).In thispaper,wethereforefocus explicitlyonestimationofa discreteroute choicemodel withproximity data.Wedescribe andapplyanestimationmethodthatexploitstheveryspeciﬁcnatureofthistypeofdatawithrespecttolocationdata.The frameworkinwhichwedescribethemethodologyallowsustoformulateaprobabilisticlinkutilizationestimationmethod aswell,whichcanbeusedtoderivelinkﬂowsandroutesplitsfromasetofsensorobservationsequences.Weimplemented andtestedthismethodaswell.

The remainder of this paper is organizedas follows. Section 2 brieﬂy reviewsthe different estimation techniques of routechoicemodels,makingtheexplicitdistinctionbetweenusinglocationdatasourcesandproximitydatasources.Then,

Section3describestheframework thatencapsulatesour estimationmethod.Thissectioncomprisesa briefdescriptionof theadoptedRecursivelink-basedLogitmodelandapreciseformulationofthesensornetworkconﬁguration.Section4 de-scribes how observations of individuals travelling through the network are represented in terms of sensor observations, andinSection5we focusontheparticularcaseoftripswithoutanyobservations.Insightsintounobservedtravelling are generalizedinSection6,inwhichamethodisderivedtocalculatethelikelihoodtoreproduceanysequenceofsensor ob-servations.This likelihoodcalculation is thekey element ofthe route choicemodel estimationmethod, whichis applied inSection7onan artiﬁcialnetwork andrandomlygeneratedobservationpatterns.Section8describesaprobabilistic link utilizationestimationmethod,whichisbasedonsimilarprinciplesasthelikelihoodcalculationmethod.InSection9,this methodistestedinasimulateduse-case.Then,Section10describeshowweappliedthedevelopedmethodologiesona Wi-Fidatasetthatwe collectedduringtheTTAssen,amusicfestivalintheDutchcityAssen.Section11comparesourmodel estimationmethodwithtwo differentimplementations ofthe network-freedataapproach (Bierlaire andFrejinger,2008). Weshowthat withaparticularimplementationofthemeasurementequation,thenetwork-freedataapproachcan be in-terpretedasapath-basedversionofourrecursivemethod.Section12discussesthecomputationalcomplexityofthemodel estimationmethodandSection13discussestheapplicabilityofthemodelindifferentcontexts.Thepaperendswith con-clusionsandfuturestepsinSection14.

2. Reviewofroutechoicemodelestimationapproaches

Manytypesof datahave beenused to estimate the parameters ofdiscrete route choice models, ortraveller behavior modelsingeneral.Traditionally,surveyswerecommonlyusedtoestimatemodelparameters,butwiththeriseofnewdigital technologies,estimationshiftedmoretowardstheuseofbehavioralobservationsfromsensors.Ingeneral,datasources to estimateroutechoiceorlocationchoicemodelscanbeclassifiedintotwogroups;datasourcesreportinglocations(location datasources)anddatasourcesreportingproximitiestocertain(inmostcasesfixed)nodes(proximitydatasources).Inthis paper,wetreatproximity asabinaryconcept;either,theindividualisclosetothenode,orheisnot.Wewillnowbriefly reviewbothdatasourceclasses,withafocusonitsuseinroutechoicemodelestimation.

2.1. Locationdatasources

Datasourcesthat directlyreport locationsofindividualsmainlyincludeself-reportedbehavior (MahmassaniandPeeta, 1993;Abdel-Atyetal.,1995;Ramming,2001)andGPSmeasurements(Broachetal.,2012;Menghinietal.,2010;Tonetal., 2018;Galama, 2015). Bothtypes ofdatacontain errors anduncertainties. Inthe caseofcollectedGPS-traces orreported routes,datamightnotcorrectlyrepresentthetrueroutesofpeople.ItisknownthattheaccuracyofGPS-measurementsis highlyinﬂuencedbytheenvironmentandthehighbatteryconsumptionofGPS-localizationlimitsthemaximumfrequency. Moreover, the limitedability of humans to reproduce their takenroutes makes reported route data also unreliable to a certainextent.Todealwiththeproblemofincompletetrajectorydata,reconstructionoftrajectoriesseemsalogicalchoice, frequently by assuming a shortestpath choice (Ramming, 2001; Lu et al., 2018). Up to certain levels ofinaccuracy and gapduration,thismightworkﬁneandcangive satisfyingresults.BierlaireandFrejinger(2008)howeverwarnthatbiases areeasilyintroducedwhenapplyingthesetrajectoryreconstructiontechniques.Asanalternative,heproposesamethodto estimate routechoice models withnetwork-freedata,reducing theneed fortrajectory reconstruction andmap-matching.

(4)

Fig. 1. Example showing the difference between (a) location data sources and (b) proximity data sources, with respect to route likelihood. In the case of location data it is impossible to distinguish from the two observations whether the individual took the upper or the lower (actual) route. In the case of proximity data, the absence of an observation of the individual near the sensor at the upper node makes the lower route much more likely to be the actual one.

Oyama andHato (2018) proposea network-freeestimation methodwhich relieson a link-based routechoice model and reducesbiasesbyestimatinglink-speciﬁcstandarderrors.Bothmethodsuselocationdatatoestimatethemodelparameters.

2.2. Proximitydatasources

Data sources that reportwhen individualsare closetocertain nodesinclude technologiesasWi-Fi-sensing,Bluetooth, RFID,ormobilephonenetworkdata.Wi-Fi-traceshavebeenusedtostudyactivityanddestinationchoicebyinferringtrip originsanddestinationsfromthetraces(Danaletetal.,2014;Danalet,2015;Yoonetal.,2006). Yoonetal.(2006)studied routechoicebehavioraswell,bygeneratingadistance-basedsetofpathalternatives, andstatisticallyderiving routesplits fromthedata.Alsomobilephonenetworkdatahasbeenusedtostudydestination choice(Iqbal etal.,2014;Wangetal., 2018) aswell asroute choice (Leontiadis etal., 2014; Huang et al., 2018). Huang et al. (2018) estimatedthe perception parameter ofaC-logit modelwithso-calledantenna IDpaths. vandenHeuveletal.(2015) used Bluetoothscan-unitsto estimate aroute choicemodelina trainstation.The scan-unitswereplaced suchthat all alternativeroutescouldbe un-ambiguouslyobserved.Inurbannetworks, achievingfullobservabilityofallpossibleroutesisgenerallyinfeasible.Inthese contexts,Bluetoothobservationsfromselectedlocationsinthenetworkareoftenusedtoapproximatedensities,ﬂows and travel orwaitingtimes(Versicheleetal., 2012;Larsen etal.,2013;Kurkcu andOzbay, 2017; LesaniandMiranda-moreno, 2018).However,tothebestofourknowledge,ageneralanddedicatedestimationmethodforadiscreteroutechoicemodel hasnotbeendevelopedforthistypeofdatasourcessofar.

2.3. Keydifferencebetweenlocationandproximitydata

As brieﬂyexplainedin theintroduction(Section 1), thereisafundamentaldifference whenwe comparelocationdata andproximitydatawhenitcomesto routechoicemodelestimation.Forlocationdatasources,onlyactual measurements provideinformationabouttheactualtakenroute.Existingtrajectoryreconstructionandestimationtechniquesaregenerally basedonthe measuredlocationsofanindividual only.In contrary,forproximity datasources,alsotheabsenceofsensor observationscontributestothelikelihoodofroutesthatdonotcrosstheparticularsensor.Thiskeydifferenceisvisualized by Fig.1,inwhichtheabsenceofan observationoftheindividualneara sensoratthe uppernode (Fig.1(b))makes the lowerroute morelikelytobe theactual one.Thisparticularfactmakes existingrouteestimationmethodsasproposedby

Bierlaire andFrejinger(2008)andOyama andHato (2018)not applicable,or atleastnot optimal,forestimation ofroute choice models usingproximity data.The methodproposed inthispaper doesexplicitlytake theabsence ofobservations intoaccount,by calculatinglikelihoods forindividualstoexactly reproducethesequenceofsensorobservations,herewith avoidingthoseplaceswheretheyhavenotbeenobserved.Beforeexplainingtheestimationmethodindetail,theframework inwhichthemethodwillbeintegratedwillbeoutlinedinthenextsection.

3. Modelingframework

Alink-basednetworkrepresentationwillbeusedforbothoursensorconfigurationandtheroutechoicemodel.Before formally describingthesensorconfigurationinSection3.1andtheroutechoice modelinSection 3.2,we startwithsome generalnetworkandroutedefinitions.

Weintroduce anetworkG=

(

L,V

)

,withdirected linksL andnodesV.Alinkl∈Lisdeﬁnedtohaveastartandend vertex:l₌

(

v

1,

v

2

)

,withv1 andv2 bothinV. Apathrthroughthenetworkis deﬁnedasasequence oflinks

(

r1,r2,...

)

,

withri∈Lforalli.

Giventhe destination and currentlink ofan individual, we assume that the probabilities ofchoosing a next link are known.Differentmethodsexisttodefinetheseprobabilities.Inthisstudy,aRecursivelink-basedLogit(RL)modelhasbeen applied,whichisformulatedintermsofanext-linkprobabilitymatrix.Section3.2brieflyexplainshowthismodelisusedto determinethenext-linkprobabilities.Regardlessoftheexactmodel,wedefinepi, j,d=P

(

j

|

i,d

)

astheprobabilityofchoosing link j∈L asthenext linkwhen locatedatlink i∈L andhavinglink d∈Lasdestination. Weassume that thenext-link

(5)

choicedoesnot dependonthe historicalpath.Ifthe destinationlinkis reached,a personisexpectedtostop moving, so

pd, j,d=0 forall dandj inL.Furthermore,a personthat arrives atthe start node ofthe destination linkis expectedto choosethedestinationlinkasitsnext(andﬁnal)link,so pi,d,d=1foralllinksithatcanprecededestinationd.

Lateron,itismoreconvenienttowritetheseprobabilitiesinmatrixnotation,soweintroducethenext-linkprobability matrixPdwithentries

(

Pd

)

i, j=pi, j,d.Althoughweareformallynotallowedtouselinkelementsasmatrixindices,forthe beneﬁtofaclearnotation,weimplicitlyassumeanorderingofalllinks,whichweuseforindexing.

3.1. Sensorconﬁguration

Inorder to describe a sensor configuration,we first introduce S as being the setof all sensors. Then, we model the sensorconfigurationbymatchingeachsensors∈S withanon-emptysetofoneormorelinkstowhichan observationof thatsensorpossiblyapplies.Wedenotetherelationbetweenasensoranditscorrespondingsetoflinksbythefunction_LS, suchthat LS

(

s

)

equals thesetoflinksthatare observedby sensors.The setofalllinksthat areobservedby asensoris denotedbyL∗₌

s∈SLS

(

s

)

.

The estimation methods, described in the next sections, put two important requirements on the construction of the observedlinksets.First, eachlinkis allowedto beintheobserved linksetofatmostonesensor.This restrictionlargely simplifies theestimation methods, butposes a restrictionon the applicationscope aswell (see Section 13). Second, the observed link set of a sensor should be constructed in such a way that each possible non-cyclic path that crosses the detectionareaofthesensor shouldhaveexactlyone linkthat isintheobservedlink set.Thesecond requirementcomes fromthefact that we actually aim tocalculate thelikelihood to reproducethe observedsensor crossings, insteadof the likelihoodtoreproduce theexactobserved sensorobservations.Thissimplifies thederivation ofourmethodologyand, in addition,ithasapositiveeffectonthecomputationalefficiency.

Modellingtheobservedareaofasensorasasetofoneormoreobservedlinksallowsformanydifferentconﬁgurations. Fourtypicalwaystoconstructtheobservedlinksetare:

• Single-link construction: Inthisconstruction,asensorsimplyobservesone (possiblybi-directional)link. Practicallyit impliesthatanindividualthatisobservedbythissensorundoubtedlytraversesthislink.SeeFig.2(a)foranexample. • Single-node construction: In this construction, a sensor observes one single node (intersection). Since each possible

paththroughthenodeshouldhaveexactlyonelinkthatisintheobservedset(secondrequirement),theobservedlink setisconstructedfromallincominglinks.SeeFig.2(b)foranexample.

• Multi-node construction: Inthisconstruction,asensorobservesmultiplenodes.Sinceeach possiblepathcrossingthe detection areashould haveexactly one linkthat isin theobserved set(second requirement),theobserved linkset is constructedfromalllinksthatenterthedetectionarea.SeeFig.2(c)foranexample.

• Dummy-link construction: Thisconstruction issimilar to thesingle-node andmulti-node construction, withthekey differencethatdummy-linksareinsertedthatconnectincomingandoutgoinglinks.Theobservedlinksetthenconsists ofalldummy-links.SeeFig.2(d)foranexample.

ItisknownforpassiveWi-Fi-orBluetooth-sensorsthatthedetectionrateisfarfrom100%.Therefore,weassumethat eachlinkl isassociatedwitha link-speciﬁcdetectionprobability

θ

l,with0≤

θ

l < 1.Obviously,

θ

l=0forallunobserved links(l∈/L∗_). _In_practice,_the_detection_rate

_θ

l willdepend mainlyon theutilizedsensor anditsplacement withrespect tothe local surroundings andinfrastructure.Generally, the detectionrateis expectedto increase with a longerduration of stay and shorter distances fromthe sensor. This implies that the detection rate could change ifa person chooses a differentpath in the detectable area (e.g., making a turninstead of crossing the street).In case thedetection rateby a

single-nodeormulti-nodesensordependsheavilyontheexactpathoftheindividualcrossingthedetectablearea,a dummy-linkconstructioncouldbeconsidered.Thiswouldallowforamoredirectspeciﬁcationofdetectionratesfordifferentpaths throughthedetectablearea.

3.2.Therecursivelink-basedlogitmodel

Thissection briefly reviewsthe Recursivelink-based Logit model, which isused in thisstudy todefine thenext-link probabilitiesofindividualsinanetwork thattravel towardsa destination.Thesection doesnotcontain anynewideasor insights,although thenotationdiffers slightlyfrom thenotationused by Fosgerauetal.(2013). TheRecursivelink-based Logitmodel(RL) wasintroduced byFosgerauetal.(2013)asan alternativetoexisting discretechoice modelsto describe routechoicebehavior.ThemainadvantageoftheRLmodelisthatithasnorestrictiononthechoiceset.Itsspecifications are comparableto existing traditional discrete choice methods. In the RL model,when currently at link i, the action of choosinganextlinkjhasaninstantaneousutility

v

(

j

|

i

)

+

μ

(

j

)

,wherethestochastic

(j)termsareassumedi.i.d.extreme value type 1 with zero mean and

μ

is a ﬁxed scale parameter. A person travelling from link i to destination link d is modelledtomaximizeitstotalexpectedaccumulatedutility.Theexpectedaccumulatedutilitycanbefoundbysolvingthe Bellmanequation: Vi,d= E

max j∈L(i)

v

(

j

|

i

)

+ Vj,d+

μ

(

j

)

, (1)

(6)

Fig. 2. Four constructions for the observed link set L S(s) of sensor s . The sensor location and its detection range are indicated by the green solid circle and the dashed outlined circle respectively. The links in L S(s) are indicated in red. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

whereL

(

i

)

representsthechoicesetofnextlinkswhencurrentlytravellingatlinki.Sincetheerrortermsareassumedi.i.d. extremevaluetype1and

μ

isinvariant,theequationcanberewrittenas

V_i,d=

μ

ln

_j_∈L_(i₎eμ1(v(j|i)+Vj,d)

_, _i₌_d

0 , i = d

(2) (SeeFosgerauetal.(2013)).

Takingtheexponential ofbothsidesoftheequationleavesuswithalinearsystemofequationsinterms ofthe accu-mulatedutilityexponentials.Therefore,wedeﬁne anincidencematrixofinstantaneousutilitiesMd withentriesaccording to

(

Md

)

i, j =

δ

(

j

|

i

)

e

1

μv(j|i)_, ₍₃₎

where

δ

(j|i)is1iflinkjisaneighbouroflinkiandzerootherwise.

δ

(

j

|

d

)

=0forallj,sincethepersonisexpectedtostop movingwhenarrivingatitsdestinationlink.Further,wedeﬁnethevectorb,whichhasbd=1andbi=0fori=d,andthe vectorzd,forwhich

(

zd

)

i=e

1

μ Vi,d_.

Theaccumulatedutilityexponentialszdcannowbefoundbysolving

(

I− M d

)

zd= b. (4)

Thenext-linkprobabilitymatrixPdisﬁnallydeterminedby

(

Pd

)

i, j =

(

Md

)

i, j

(

zd

)

j

l∈L

(

Md

)

i,l

(

zd

)

l

(7)

Linkﬂowsthatresultfromagivendemandg_o,dbetweenoriginoanddestinationdcannowbe calculatedaccordingto

qo,d =

(

I− P Td

)

−1go,d. (6)

Inthisformula, the vectorq_o,d containsthelink ﬂows andg_o,d is a vectorthat is zeroexceptforits oth element,which containsthedemandfromotod.

3.3.Linksizeattribute

Theoriginal formulation oftheRL model suffersfrom theIIA (independence inalternatives)property. In orderto re-lax thisproperty, Fosgerauetal. (2013) proposed a linksize attribute, whichis comparableto the pathsize attribute in pathchoicemodels(Ben-AkivaandBierlaire,1999).Thecorrectionisachievedbyaddingatermtotheinstantaneous link-transitionutilities,whichisproportional totheﬂowthrough thislink resultingfromaunit ofdemandbetweenanorigin andadestination.Thecorrectedinstantaneousutilityisdeﬁnedas

v

LS

(

j

|

i,o,d

)

=

v

(

j

|

i

)

+

β

LS·

(

q˜ o,d

)

j·

(

j

)

, (7) where

(j) denotes the length oflink j,

β

LS denotes the link size parameter and

(

q˜o,d

)

j denotes the ﬂow through link

j resulting from a unit of demand between origin o and destination d. The factor

β

LS is supposed to be negative. This implies that links with a large flow, which are likely to have a large contribution to route overlap, get a larger utility reduction. The flow vector q˜o,d is calculated according to Eq. (6)and requires a certain choice of Pd, denoted asPcorr,d. Fosgerauetal.donotprescribehowtochoosethematrixP_corr,d inordertocalculatethelinksizecorrectionflows.Atypical choicewouldbetoderivePcorr,d fromassumingautilityfunctionthatisbasedontripdistanceonly(andu-turnpenalties). ItshouldbenoticedthatthelinksizeutilityfunctionvLS(j|i,o,d)isorigin-specific.Thismakesthatthenext-linkprobability matrixP_dandtheincidencematrixM_dbecomeorigin-specificaswell,whichenlargesthecomputationalefforttoestimate models.Inourstudy,thelinksizeattributehasbeenappliedwhenestimatingthemodelforthecase-studyattheTTmusic festival(Section10).Intheotherpartsofthepaper,alinksizeattributehasnotbeentakenintoaccount,mainlytoensure readability.Although detailsare omitted,theestimationmethodandsimulations that willbe introducedinthe following sectionstriviallyallowforinclusionofthelinksizeattribute.

Amoreprofoundapproachtorelax theIIApropertyistousetheNestedRecursiveLogit(NRL)model(Maietal.,2015; Zimmermannetal.,2017),whichexplicitlyaccountsforcorrelatedpathutilities.AlthoughtheNRLmodelhasproventobe superiortothe RLmodelinterms ofitsaccuracy, itshighercomplexityhasmadeusdecideto taketheRL modelasthe startingpointforourmethoddevelopment.

4. Sensorobservations

Sincethe full paths ofpersonsgenerallycannot be observed, we deﬁnefor eachindividual a sensorobservation path

s∗=

(

s∗₁,s∗₂,...

)

asthesequence ofsensors atwhicha personduringatrip hasbeenobserved.The setof allobservation pathsfortripswithoriginlinkoanddestinationlinkdisdenotedbyS∗_o,d.

Itshouldbe noticedthattimestampsoftheobservationsare deliberatelynot takenintoaccount.Inclusionofthetime dimensioncomplicatesthelikelihoodcalculationasexplainedinthecomingsectionsandhasthereforebeenleftout.Incase ofestimatingaroutechoicemodelforpedestriansinanurbancontextorduringanevent,whichthismethodisparticular aimedat,itisquestionablehowtodealwiththistimeaspect,sincepedestriansintheseenvironmentsarenotexpectedto movewithpredictablespeeds.Nonetheless,howtoincludetimestampsintoourlikelihoodestimationsinordertoimprove thepredictivepowerisoneofourkeyquestionstobeansweredbyfutureresearch.

5. Unobservedtravelling

Beforeweexplainhowtocalculatesensorobservationpathlikelihoodsandlinkﬂows,wewillbrieﬂydiscusstheconcept ofunobserved travelling. Aquantity that appears to be crucial in our later computations isthe probability ofpassing a certainlink i,whentravelling fromacertain originto acertain destination,andbeingunobservedso far.Inother words, theindividualhasnotbeenobservedyetbyasensorbeforereachinglinki.Tomakethisformal,weintroduceq0₍_i_|_o,_d₎_as

beingtheexpectednumberoftimesthatanindividualarrivesunobservedatlinki,whentravellingfromotod:

q0

₍

_i

_|

_o,_d

₎

_{= E}

₍

_k0

₍

_i

₎

_|

_o,_d

₎

_, ₍₈₎

wherek0₍_i₎_equals_the_number_of_times_that_an_individual_arrives_unobserved_at_link_i_._This_quantity_depends_on_the_amount

ofroutes betweenthe originand destination that pass through linki andon the detectionrates ofthe sensors that an individualpassesbeforereachinglinki.Fig.3showstwo examplesofhowthenetworkandsensorconﬁgurationinﬂuence

q0₍_i_|_o, _d_). _It_should _be_noticed _that_the _expected_number_of_unobserved _link_arrivals_is_almost _identical_to_the _probability

ofarriving unobservedatthespeciﬁclinkatleastonce.Thesmall,butnon-zero,probabilityofcyclestooccur,makes the expectednumberofunobservedlinkarrivalsslightlylargerthantheprobability ofarriving unobservedatthespeciﬁc link atleastonce.

(8)

Fig. 3. The expected number of unobserved link arrivals, q 0 ( i | o, d ), for two different sensor conﬁgurations ( single-node construction) with a detection rate

θ= 0 . 7 . The wider and greener the link, the higher is the expected number of unobserved link arrivals. The most left link is deﬁned as the origin and the most right link as the destination. The next-link probability matrix P d is based on link distances only plus a u-turn penalty. It can be seen that q 0 ( i | o, d ) is practically zero for the origin link, since only link arrivals are counted.

Thevaluesofq0₍_i_|_o,_d₎_are_found _by_calculating_‘link_ﬂows’_that _result_from_sending_one _unit_of_ﬂow_into_the_network

attheoriginlinko,withamodifiednext-linkprobability matrix,whichtakesthe link-specificdetectionprobabilitiesinto account.Eachflowthatpassesasensor-equippedlinkwillbeloweredaccordingtoitsdetectionrate

θ

.Detailsaregivenin

AppendixA.

6. Likelihoodofsensorobservations

ThissectionexplainshowtocalculatethejointlikelihoodL(

β

)ofreproducingthesetofsensorobservationpathsS∗_o,dfor allo-dpairs,givenaparameterset

β

,whichinﬂuencestheelementvaluesofthenext-linkprobability matrixordetection rates. Thislikelihood can be maximizedin order toestimate utility parameters (Ben-Akiva andLerman, 1985). The joint likelihoodiscalculatedbymultiplyingallprobabilitiestoobservetheindividualpaths:

L

(

β

)

= o∈L d∈L s∗_∈ S∗ o,d P

(

s∗

|

o,d,

β

)

, (9)

whereP(s∗|o,d,

β

)isthelikelihoodtoreproducethesensorobservationpaths∗,givenorigino,destinationdandparameter set

β

.TocalculateP(s∗|o,d,

β

),wedistinguishbetweenemptyandnon-empty sensorobservationpaths.Thefollowingsub sectionsdescribehow toﬁndthelikelihoods forboth categories.Forsimplicity,fromnowonwe willomitthe

β

termin ournotation,sinceallfurtherderivationsdonotexplicitlydependon

β

.

6.1. Likelihoodofemptyobservationpath

Givenanoriginlinkoandadestinationlinkd,P(_∅|o,d)istheprobabilitythatanindividualthattravelsfromotodisnot observedbyasinglesensor.Inordertocalculatethelikelihoodofanemptysensorobservationpath,weusetheexpected numberofunobservedarrivalsatthedestinationlink,q0₍_d_|_o,_d₎_(see_Section₅_)._It_should_be_noticed_that_a_person_can_only

reachthedestinationonce,sincep_d,j,disdeﬁnedtobe0foreachdestinationdandlinkj.Asaresult,q0₍_d_|_o,_d₎ _equals_the

probabilitytoarriveatthedestinationlinkunobserved,whichallowsustowrite:

P

(

∅

|

o,d

)

=

1 , if o= d

(

1 −

θ

d

)

· q 0

(

d

|

o,d

)

, otherwise .

(10)

Eq.(10)simplystatesthatthelikelihoodofanemptysensorobservationpathequalstheprobabilitytoarriveunobserved atthedestinationlink,multipliedwiththenon-detectionrateofthedestinationlink.Incasetheoriginanddestinationare thesame,thelikelihoodobviouslyequals1.

6.2. Likelihoodofnon-emptyobservationpath

Givenan origin linko and a destinationlink d,P

((

s∗₁,s∗₂,...,s∗n

)

|

o,d

)

is the probability that an individual that travels fromo todisobservedbythesensors

(

s∗₁_,s∗₂_,_._._._,s∗_n

)

_,inthegivenorder.Sincetheﬁrstsensorobservations∗₁ isassociated withexactlyonelinkinthelinksetLS

(

s∗1

)

,wemayexpressthisprobabilityasasimplesum:

P

((

s∗₁,s∗₂,...,s∗_n

)

|

o,d

)

= l∈LS(s∗1)

P

((

l,s∗₂,...,s∗_n

)

|

o,d

)

, (11)

whereP

((

l,s∗₂,...,s∗_n

)

|

o,d

)

denotestheprobabilitytobeﬁrstobservedatlinkl,followedbythesensorss∗₂tos∗_n.Sinceactual choicesareassumedtobeindependentofhistoricalchoices,thisprobabilitycanbedecomposedasfollows:

P

((

l,s∗₂,...,s∗_n

)

|

o,d

)

= q0

₍

_l

_|

_o,_d

₎

_·

_θ

l· P

((

s∗2,...,s∗n

)

|

l,d

)

(12)

Thetermq0₍_l_|_o,_d₎_·

_θ

l,theexpectednumberofunobserved linklarrivalstimesthedetectionrate,equals theprobability that anindividualsﬁrstobservationhappensatlinkl.ThetermP

((

s∗₂,. . .,s∗n

)

|

l,d

)

equalsthe probabilitytoreproduce the

(9)

remaining sensor observation path, startingat link l. Substituting Eq.(12) into Eq. (11) gives us the following recursive scheme: P

((

s∗₁,s∗₂,...,s∗_n

)

|

o,d

)

= l∈LS(s∗1) q0

₍

_l

_|

_o,_d

₎

_·

_θ

l· P

(

s∗2,...,s∗n

)

|

l,d

)

. (13) With thisequation, the likelihoods of the observation paths can be calculated recursively using a standard dynamic programmingtop-downapproach,inwhichearlierresultsof(sub-)problemsarestoredandre-used,whichiscalled memo-ization(Cormenetal.,2009).Atﬁrstsight,itcouldappearthatthecomputationalcomplexityofthecalculationcanblow-up veryeasily incaseoflongobservationpaths andmanyobservedlinksper sensor.However, we noticethatcalculation of

P

((

s∗₂,. . .,s∗_n

)

|

l,d

)

involvescalculationofP

((

s∗₃,. . .,s∗_n

)

|

l2,d

)

foralll2∈LS

(

s∗2

)

,whichisregardlessofthelinkl.Thefactthat

thesub-probabilitiestobe calculatedareindependent ofthefollowedpathinthe recursiontree,makes that thenumber offunctionevaluationsdoesnotgrowexponentially. SeeSection12foracomprehensiveexaminationofthecomputational complexity.

InSection 7,we usethe likelihoodcalculation toestimate the parametersof aRL-model withan artiﬁcaldata set.In

Section10,themethodisappliedonarealdatasetthatwascollectedduringtheTTAssenFestival. 7. Simulateduse-case:EstimatinganRLmodel

Toanalyzetheapplicabilityofthelikelihoodcalculation(seeSection6),wewillnowuseittoestimateaRecursiveLogit (RL)model.Thissectiondescribesthemethodologytoestimatetheparametersbasedonasimulateddatasetofagentsthat movethrough partially observednetworks. Toevaluate theperformance of themethodology, we ﬁrstgeneratea number ofagent paths through thenetwork, according to an RL model withpredeﬁned parameter values

β

0,which serves as a

groundtruth. These networkpaths are then reducedto sensor observationpaths by checkingwhich linksin a path are coveredby which sensors andtakingthe detectionrates intoaccount. Then, we lookfor thoseRL parameter values

β

est that maximize the log-likelihoodof thesesensor observationpaths. Finally, we compare the estimatedparameter values

β

est withtheoriginalvalues

β

0,aswellastheresultingnetworkuseofagents.

7.1. Networkandbehavior

Anetworkhasbeendeﬁnedthatconsistsof224bi-directionallinksfromwhichfouraredeﬁnedasanorigin/destination link.See Fig.4 fora schematicvisualizationof thenetwork. The origin/destinationlinksare diagonally connectedto the cornersofthenetwork.

Sincewewantourmodeltoestimatepreferencesregardingdifferentlinkcharacteristics,wedefinedtwodifferentroad types,whichareindicatedinthefigurebytheirlinethickness.Threedifferentsensorconfigurationshavebeentested,shown inFig.4(a),(b)and(c). Sensorsare placedatnodes(single-nodeconstruction, Fig.2(b))andare visualized aslarge green dots.Eachsensorhasadetectionrate

θ

.

Wedeﬁnethefollowinginstantaneousutilitiestomovefromlinkitolinkj:

v

(

j

|

i

)

= −

(

1 +

β

R1· 1 R1

(

j

)

+

β

R2· 1 R2

(

j

))

·

(

j

)

− c penalty · 1 U

(

i,j

)

. (14)

Inthisexpression,1R1

(

j

)

isanindicatorfunctionwhichevaluatesto1incaselinkjbelongstothesetoflinkswithroad

type1 (thinline)and0otherwise.The function1R₂

(

j

)

evaluates to1 incaselinkj belongstotheset oflinkswithroad type2(thickline),0otherwise.Thefunction

(j)denotesthelengthoflinkj.

β

R1and

β

R2 aretheroadtype-speciﬁcutility

parameters.Ifbothparameterswouldbe0,theutilitywouldbedeterminedbyroutelengthonly.Finally,thefunction1U(i,

j) evaluates to1 incasethetransitionfromlink i tolink jis au-turn and0otherwise.Multipliedwitha ﬁxed constant

Fig. 4. The network used for the simulated use-case. The thickness of the line represents the road type (thin line = type 1, thick line = type 2). Three different sensor configurations (shown in sub figures a, b and c) with an increasing number of sensors, which are indicated by the large green dots, have been tested. The four diagonal links connected to the corners are origin/destination links and are modelled to have zero length. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

(10)

c_penalty,whichwillnot bevaried duringtheoptimization,andsubtractedfromthedistanceterm, thiseffectivelyprevents theagentfrommakingu-turns.

Foreach OD-pair,a total of n paths havebeen generated accordingto the RL model withutilities asin (14),a scale parameter

μ

(ﬁxedto 1)androadtype-speciﬁcutility parameters

β

₀₌

(

β

R1,0,

β

R2,0

)

T.From thesetravelled paths,setsof

OD-speciﬁcsensorobservationpathshavebeengenerated,S∗_o_,d.

7.2. Log-likelihoodoptimization

Thejointlog-likelihoodofthegeneratedsensorobservationpathshasbeenmaximizedbyvaryingthe‘to-be-estimated’ parameters

β

₌

(

β

R1,

β

R2

)

T:

β

est= argmax β ln

L

(

β

)

(15) = argmax β o∈L d∈L s∗_∈_S∗ o,d ln

P

(

s∗

|

o,d,

β

)

(16)

The optimizationwasperformedusingMATLAB’snon-linearconstrainedoptimizer(functionfmincon). Gradientswere ap-proximatedwithﬁnitedifferences. A lowerbound of-0.99 wasset asaconstraintforboth parametervalues,because as soonasoneoftheparametervaluesdropsbelow-1,link utilitiesmaybecome positive,causinga potentialpreferencefor inﬁnitetriplengths.

7.3. Evaluation

Theestimatedmodelwasevaluatedusingaseriesofmetrics.

• tstat,R1 andtstat,R2. Thesevariablesdenotethe t-statisticsforthe parameters

β

R1,est and

β

R2,est respectively.Theirvalues arecalculatedby dividingtheparametervalues

β

R1,est and

β

R2,est bytheir standarderrors,whichwe estimatedbythe Cramér–Raolower bound.TheHessianofthelog-likelihood,whichisinvolvedinthiscalculation,wasapproximatedby ﬁnitedifferences.

•

ρ

2.Thismetricisdeﬁnedas

ρ

2₌₁₋ln

(

L(βest)

)

ln

(

L(β0)

)

andisameasureforthemodelﬁt.Inthisexample,

β

0=0.Itshouldbe noticedthat thelog-likelihoodsinthisformulaapplyto thesensorobservationpaths,andnotdirectlytoroute choice behavioritself.Therefore,thevalueof

ρ

2_should_be_interpreted_with_care_and_not_be_directly_compared_with

_ρ

2_values

thatarecalculatedfromthecompleterouteperspective.

• RMSE.Inordertomakeastatementaboutthepredictiveperformancewithrespecttorealnetworkuse,wecalculatethe rootmeansquareerrorofweightedlinkﬂows:RMSE=

l∈Lwl·

˜ qest,l− ˜q0,l

2

.Inthisformula,q˜est,l andq˜0,l arethe linkﬂows,intheestimatedandsimulatedcaserespectively,thatresultfromademandof1foreachOD-pair(according toEq.(6)).Themeanisweightedaccordingtolinklength:wl=

(

l

)

/ i∈L

(

i

)

, where

(l) denotesthe lengthof link

l.

• NRMSE.The normalizedrootmeansquareerrorofweightedlinkﬂows NRMSE₌RMSE_/ _l_∈Lw_l_{· ˜}q₀_,l.Since the NRMSE

isnormalized by the meanlink ﬂow, its value isexpected to be lessdependingon the network size andnumber of OD-pairs.

For7 differentsetsof parameters,we performed30simulations, forwhich weall estimatedmodel parameters

β

R1,est and

β

R2,est.Foreachparameterset,theaverageandstandarddeviationoftheestimatedparametersandevaluationmetrics arereportedintable1.

Table1 showsthatthe methodratherprecisely rediscoversthe parametervaluesthat were usedto generatethedata sets.Thet-statisticsindicatethatthevaluesarealsosigniﬁcantlydifferentfrom0,unlesstheyweresupposedtobe0.Also withrespect to the predictive performance, the methodseems to perform really well, since the averageNRMSE over30 simulationsdoesnotexceed5%.

Table 1

Estimated parameters βR1,est and βR2,est and evaluation metrics for three different parameter sets. The values report the mean and standard deviation over 30 simulations. Default parameter values: n = 100 , μ= 1 , c penalty = −100 , θ= 0 . 7 .

# sensors βR1,0 βR2,0 βR1,est βR2,est tstat,R1 tstat,R2 ρ2 RMSE NRMSE

4 0.00 0.50 −0 . 00 ± 0 . 05 0.52 ± 0.06 −0 . 05 ± 1 . 1 8.6 ± 0.22 0.17 ± 0.01 1.4 ± 0.82 0.04 ± 0.02 9 0.00 0.00 0.01 ± 0.02 0.00 ± 0.02 0.37 ± 1.1 0.12 ± 0.96 0.00 ± 0.00 1.2 ± 0.47 0.03 ± 0.01 9 0.00 0.50 −0 . 00 ± 0 . 03 0.50 ± 0.05 −0 . 05 ± 1 . 0 11 ± 0.17 0.15 ± 0.01 1.0 ± 0.58 0.03 ± 0.01 9 0.50 0.00 0.50 ± 0.04 0.00 ± 0.04 12 ± 0.33 0.03 ± 1.00 0.26 ± 0.01 0.83 ± 0.61 0.02 ± 0.02 9 −0 . 10 0.30 −0 . 10 ± 0 . 03 0.31 ± 0.03 −3 . 87 ± 0 . 97 9.8 ± 0.50 0.11 ± 0.01 1.3 ± 0.79 0.03 ± 0.02 9 0.10 −0 . 30 0.10 ± 0.02 −0 . 30 ± 0 . 02 4.2 ± 0.92 −12 . 73 ± 1 . 3 0.16 ± 0.01 1.3 ± 0.55 0.03 ± 0.01 25 0.00 0.50 −0 . 00 ± 0 . 02 0.50 ± 0.03 −0 . 18 ± 0 . 95 16 ± 0.21 0.12 ± 0.01 0.67 ± 0.39 0.02 ± 0.01

(11)

Fig. 5. The NRMSE for (a) different sensor configurations, (b) different detection rates θand (c) different levels of uncertainty of actual detection rates. All figures show a box plot, showing the median (red line) and the 25th and 75th percentiles (blue edges of box). Whiskers of the box plot extend to the most extreme points that are not considered outliers. The outliers are plotted individually using the ’+’ symbol. Each box plot is based on 30 simulation runs. Default parameter values: n = 100 , βR1,0 = 0 , βR2,0 = 0 . 5 , μ= 1 , c penalty = −100 , #sensors = 9 ( Fig. 4 (b)) and θ= 0 . 7 . (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

7.4.Relationbetweenpredictiveperformanceandsensorcharacteristics

Togeta better understandingofthe relationbetweenthe sensorconﬁgurationandthe predictive performanceof the estimatedmodel,threeanalyseshavebeenperformed.

First,we looked at theeffect that thenumber ofsensors has onthe NRMSE. The resultoftesting thethree different sensorconﬁgurations (Fig.4(a), (b)and (c))is shownin Fig.5(a). Withmore sensors, thepredictive performance of the modelincreases(NRMSEdecreases),whichisinlinewithourexpectations.Clearly,thepositionofsensorsisanimportant factoraswell,butathoroughanalysisoftheeffectsofsensorpositionswillbeleftforfutureresearch.

Second,we looked intothe effectof thedetectionrate

θ

onthe NRMSE.For detectionratesbetween0.3and 0.9, we calculatedthe NRMSE (mean and standard deviation) over30 simulations. The results are shownin Fig.5(b). The ﬁgure showsa generalincrease of thepredictive performance (decreasing NRMSE) withincreasing detectionrate. The apparent localminimumaround

θ

=0.5wasnotpresentinasecondrun(fromwhichwedidnotshowresultsinthispaper),sowe attributethepresenceofthisminimumtothestochasticnatureofourassessment.

Third,westudied thepredictiveperformance incaseofuncertaintyaboutthe actualsensordetectionrates.Sofar,we assumeddetectionratestobedeterministic.Forthisanalysis,duringsensorobservationpathsgeneration,theactualsensor detectionrateswere randomly drawnfrom aGaussian distribution witha meanof0.7anda standard deviation

σ

. Dur-ingparameterestimation,we assumedthedetectionrates’deviationsfromtheir meanstobe unknown,so allrateswere consideredto beequal to

θ

=0.7.The relationbetweentheresultingNRMSE and

σ

givesan idea abouttheeffectofthe detectionrateuncertaintywithrespectto predictiveperformance. Fig.5(c)showsthe results.Asexpected, the predictive performancedecreaseswithincreasinguncertainty,whichshowstheneedforaproperunderstandingofoursensor detec-tioncharacteristics.Atthesametime,weconcludethatintroductionofthedetectionuncertaintydoesnotdrasticallylower thepredictiveperformance.

8. Linkutilizationestimationfromsensorobservationpaths

Besidescalculatingthelikelihoodsofa givensensorobservationpath,wecan estimatethelinkutilizationfromsensor observationpaths. Without any sensorinformation, our best stochastic guess would be that the route for an individual movingfromoriginotodestinationdwouldbedescribedbythelinkflowsasalreadycalculatedbyFosgerau(seeEq.(6)). However,knowingatwhichlocationstheindividualwasidentifiedandwhereheorshewasnotidentified,wecanimprove theselink utilizationestimations.Forthisend,we followasimilarapproachasforthesensorobservationpathlikelihood calculations(Section6).

Firstofall,wehavetodeﬁne linkutilizationasbeingconditionalwithrespecttoameasuredsensorobservationpath. Therefore,we introduceq(i|o, d, s∗), asbeingthe expectednumberoftimesthat link i isvisited giventhesensor obser-vationpaths∗,havinglinkso anddasoriginanddestinationrespectively. Similarasforderivingthelikelihoodsofsensor observationpaths,westartwithcalculationofq(i|o,d,∅);theexpectednumberoflinkarrivalsgiventhatanindividualhas notbeenobservedbyasinglesensor.Wecanshowthat

q

(

i

|

o,d,∅

)

=

(

δ

i,o+ q0

(

i

|

o,d

)

·

(

1 −

θ

i

))

·

P

(

∅

|

i,d

)

P

(

∅

|

o,d

)

, (17)

where

δ

i,o istheKroneckerdelta,whichequals1ifi=o and0ifi=o.ThetermsP(∅|i,d)andP(∅|o,d)aretheemptysensor observationpathlikelihoodsstartingfromlinksi andorespectively(seeSection6.1).Thederivationofthisformulacanbe foundinAppendixB.Tofindtheexpectedlinkutilizationincaseofanon-emptysensorobservationpaths∗,wefirstdefine

(12)

ˆ

q

(

i,s∗

|

o,d

)

= q

(

i

|

o,d,s∗

)

· P

(

s∗

|

o,d

)

(18)

SimilarasinSection6.2,qˆ

(

i_,s∗

|

o_,d

)

canbeexpressedrecursively: ˆ q

(

i,

(

s∗₁,s∗₂,...,s∗_n

)

|

o,d

)

= K1 + K2, (19) with K1 =

(

δ

i,o+ q0

(

i

|

o,d

)

·

(

1 −

θ

i

))

· P

((

s∗1,s∗2,...,s∗n

)

|

i,d

)

(20) K2 = l∈LS(s∗1)

q0

(

l

|

o,d

)

·

θ

l · ˆ q

(

i,

(

s∗2,...,s∗n

)

|

l,d

)

. (21)

Insteadofacompletederivation,wewillexplain theintuitionbehindtherecursivescheme.Letusassumethatthelength of our sensorobservation path s∗ equals

ζ

.In this case, link i can be visitedduring

ζ

₊1different periods: before the ﬁrstobservation,betweentheﬁrst andsecond observation,betweenthesecond andthird observation,andso on,tillthe period afterthe last observation. The total expectednumber ofvisits oflink i willbe the sum ofthe expected number of visitsof link i duringthese

ζ

₊1 periods. In thislight, the termK1 counts theexpected numberof visits beforethe

ﬁrstobservationfromtheremaining sensorobservationpath(seetheanalogywith(17)).ThetermK2 recursivelyaddsthe

expectednumberofvisitsoflinkithat occuraftertheﬁrstobservationoftheremaining sensorobservationpath(seethe analogy with(13)).Finally,q(i|o, d, s∗) canbe easily computedfromqˆ

(

i,s∗

|

o,d

)

usingEq.(18).Cumulativelink flows can beestimatedbysummingthelinkutilizationforeachindividualsensorobservationpath.Theselinkflowsdonotrepresent absolutevaluesbuthavetobeinterpretedinarelativeway,sincenoteverytripisnecessarilybeingrecorded.Thisrelative interpretationcanalreadyprovidevaluableinsightsinto,forinstance,therelativepopularityofdifferentroutesconnecting thesameoriginanddestination.Toestimateabsolutecumulativelinkflows, themethodhastocorrectfortheamountof non-recorded trips.It dependsonthe applicationandtheavailability ofother datasources (suchascountingsensors for specific cross-sections),whetherasimplecorrection canbe applied.One examplecould bea correctionfactorthat isthe inverseofthefractionoffestivalvisitorsthatdownloadedthefestivalapp.

Atthispointitisworthtomentionthatanothertechniqueexiststhataimstoreconstructaroutefromsensor observa-tions.ThetechniqueisbasedonHiddenMarkovModels(HMM)andusestheViterbiAlgorithmtoﬁndthemostlikelypath toreproduceasequenceofsensorobservations(MusaandEriksson,2012).Oneofthemajordifferencesisthatthe HMM-Viterbimethod considers thediscretized time ofthe sensorobservations as well andherewith indirectlyassumes speed distributionsofindividuals.Forpedestriansinan urbanoreventcontext,theeffectofsuchimplicitspeedassumptionson theaccuracy oftheoutcomesis stillunclear. Besides this,the HMM-Viterbimethod producesa singlerouteasbeingthe mostlikelyone.Ourproposedmethodisaprobabilisticone,assigningautilizationvaluetoeachlinkinthenetwork,which makesthemethodmoresuitable foraggregationpurposeswebelieve,especiallyincaseswithlargegaps.Anadvantageof theHMM-Viterbimethodistheabilitytodealwithmultipleconcurrentsensorobservations.

9. Simulateduse-case:Linkutilizationestimationforasingleindividual

Totestthelinkutilizationestimationmethod,asdescribedinSection 8,weimaginean individualthat movesfroman origintoadestinationinanartificialnetwork,asindicatedinFig.6.Theindividualsroutechoicebehaviourismodeledby thenext-link probabilitymatrixP_d,whichisconstructedassuming autilityfunction thatisbasedonlinkdistancesanda penaltyforu-turns(seeSection3.2).Wedefinedthreeimaginarysensorobservationpathsthatcanresultfromthetrip.The bigredcirclesinFig.6indicateperscenariothesensorlocationswheretheindividualhasbeenobserved.Thegreencircles representsensorlocationswheretheindividualhasnotbeenobserved.Theobservedlinksetsweredefinedaccordingtothe

single-nodeconstruction(seeFig.2(b)).TheutilizationperlinkhasbeenestimatedusingEq.(18)andtherecursiveformula

(19).TheresultsareshowninFig.6,wheregreenerandwiderlinesindicatehigherprobabilitiesthatanindividualwiththe givensensorobservationpathpassesthislink.

Fig. 6 shows that the calculated ﬂows ‘avoid’ the sensor locations where the individual has not been observed. This clearlydemonstrates thebeneﬁt ofthismethod over routereconstruction techniques whereonly thelocations aretaken intoaccountwheretheindividualhasactuallybeenobserved.

Toverifythelinkutilizationcalculation,wesimulatedatotalofntrajectoriesfromtheleft-bottomorigintothetop-right destination,whichwerandomlytransformed intosensorobservationpaths,usingthedetectionratesofthesensors(ﬁxed at0.7).Fromallrandomlygeneratedsensorobservationpaths,weselectedonlythosethatmatchedthescenarioofFig.6(b) ((bottom-left, center-center)). The average link utilization over thisset of paths gives usan approximation ofa person’s expectedlinkutilization,giventhatthepersonwasobservedbythe(bottom-left)andthe(center-center)sensor.

Next,wewantedtoknowwhetherthis(simulated)truelinkutilizationcouldbecorrectlyestimatedbyourmethod.For thispurpose,wecalculated theRMSEbetweenthesimulatedandtheoretically derivedlink utilizationfordifferentvalues ofn(the unﬁltered numberofsimulatedtrajectories).Theresults areshowninFig.7(notice thelogarithmicscales).The ﬁgurerevealsthetypical”inversesquareroot” relationbetweensamplesizeandsampleerrorofthemean,whichsupports thebeliefthatourmethodisabletocorrectlyderivetheexpectedlinkutilization.

(13)

Fig. 6. For three different sensor observation paths, the link utilization has been plotted on the network. The greener and thicker the line, the more likely it is that an individual passes that link, given the bottom-left origin, top-right destination and the sensor observation path as indicated by the big red circles. The green circles represent sensors where the individual has not been observed. It can be seen that the ﬂows towards green circles are relatively small. Default parameter values: μ= 1 , c penalty = −100 and θ= 0 . 7 . (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

Fig. 7. The RMSE between the simulated approximation of link utilization and theoretical link utilization, following the calculations of Section 8 . The simulation and theoretical calculation are based on a sensor observation path as in Fig. 6 (b). The ﬁgure shows a clear decrease of the RMSE for an increasing initial sample size n and herewith supports the validity of our link utilization calculation method.

10. Applicationatamusicfestival

WetestedourroutechoicemodelandlinkutilizationestimationmethodonadatasetthatwascollectedduringtheTT Assen.ThisDutchmusicfestivalisorganizedyearlyasasidefestivityaroundtheDutchTTmotorracingevent.In2018,the festivallastedfromJune27,tillJune30,andattractedapproximately160,000visitors.Atotalof11stageswerebuiltinthe citycentreofAssen,whereadiversityofmusicalperformancesandmotordemonstrationsweregiven.Incooperationwith thecompanyConnectionSystemsB.V.,weinstalledWi-Fi-sensorsat15differentlocationsinthecitycentre.Fig.8(a)shows thesesensorlocations. Withinaradius of20mon average,thesesensors identifydevicesinsearch foraWi-Fi-network, basedontheirMAC-address.IfthesameMAC-addressisdetectedbymultiplesensorsinthenetwork,wehavesomeinsight intothemobilityofthepersoncarryingthespeciﬁc device.Theobservedlinksetsofthesensorsagainweregoverned by thesingle-nodeconstruction(seeFig.2(b)).

Thequestionthatwetriedtoanswerforthisspeciﬁceventwastowhatextentroutechoicebehaviorwasinﬂuencedby thestagelocations.Itcanbehypothesizedthatpeopletrytoavoidthebusylocationswhentheywalkthroughthecity.

10.1.Datacleaning

Wi-Fi-datawascollectedduringthefourdaysoftheevent.Formodelestimation,weonlyuseddatafromtheeveningof June28,(startingat6PM)tillthemorningofJune29,(endingat5AM),sincestagelocations,andherewithlink characteris-tics,differedfromdaytoday,whichwouldcomplicateourdatapreparationifwetookmultipleeveningsintoaccount.The rawdatatellwhichMAC-addresseshavebeenobservedbywhichsensorsatwhattimes.Thedataiscomposedof observa-tionsofstationarybehavior andobservationsoftravellingbehavior.Forestimationofthemodel,weneededthetravelling observations,together withthe triporigins anddestinations. Toget thistrip information,we processedthe raw dataas follows:

(14)

Fig. 8. a) The network that was used to assess the mobility during TT Assen festival. The green dots represent the sensor locations (placed at nodes in the network). The yellow circles represent the stage locations on Thursday and the thick red lines represent links that were adjacent to a stage area. b) The estimated cumulative link ﬂows. Parameters: μ= 50 , βLS = −0 . 2 , c penalty = −100 , θ= 0 . 46 . (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

• Aperson isassumedto be stationarylocated ata certain sensor-equipped node whenheor sheis identiﬁed by this sensormorethanonceinaperiodthatlastsforatleast20min.

• Thesestationarylocationsaresetastheoriginanddestinationofatrip.Theobservationpointsinbetweenareusedto deﬁnethesensorobservationpathofatrip.

• Also,asingleobservationbyoneofthefoursensorsthatwereplacedatthecityentranceroads(seeFig.8(a))deﬁnesa triporiginordestination,sincepeopleareexpectedtoonlypassthesenodeswhileenteringorleavingthecitycenter. • Whiletravelling,peoplecan beobservedmultipletimesby thesamesensor.Assome Wi-Fi-sensorswereplacedclose

toeachother,personswereevenoccasionallyidentiﬁedalternatelybytwodifferentsensorsduringacertainpartofhis orhertrip.Toaccountforthis,wediscardallobservationsforwhichthesamepersonwasalreadyidentiﬁedearlierby thesamesensorwithinatrip.

• Whiletravellingfromtheirorigintotheirdestination,peopleareassumedtotakearoutewhosedistanceisnottoolarge comparedto theshortestdistance possible.Forthisend, foreach sensorobservationpath, we calculatedthe shortest cycle-freepath,connectingtheoriginwiththedestination,thatpassesthroughallobservationpointsinthegivenorder. Dividing thispathlength bythe lengthof theshortestpathfromoriginto destination,whichdoesnot necessarilygo throughtheobservationpoints,givesusalowerboundfortheso-calleddetourratio.Toexcludeerratictrips,wefiltered out all witha detour ratioabove 2.5. To findthe shortestcycle-freepath froman originto a destination that passes throughasetofnodesinagivenorder,abest-firstbranchandboundalgorithmwasadopted(A∗-algorithmwithbranch dependentfeasibilityconstraints).Implementationdetailsareomittedsincetheyfalloutsidethescopeofthispaper. • Sincewe wereonly interestedin walkingbehavior,we excludedall observationswhoseaverage tripspeed wasbelow

0.5m/sorabove2m/s.Weestimatedtheaveragespeedusingagainthedistanceoftheshortestcycle-freepaththrough allobservationpoints.

Sincethe modelrequirestheoriginanddestinationofa triptobe alink (insteadofa node),dummylinkshavebeen connectedwithallsensor-equippednodes,servingasoriginanddestinationlinks.Thedistancesofthesedummylinkshave beensetto0.Aftercleaning,weendedupwith296sensorobservationpaths,fromwhich197uniqueones.Foreachsensor observationpath,thelinkutilizationhasbeenestimatedusingtheformulasinSection8.Cumulativelinkflowswerederived bysummingtheestimatedlinkutilizationforallsensorobservationpaths.ThecumulativelinkflowsareshowninFig.8(b). Noticethattheselinkflowsdonotrepresentabsolutevalues,sinceonlyaportionofthepopulationhasbeentracked.Hence, onlyrelativeconclusionswithrespecttotheflowscanbedrawnfromthefigure.

10.2. Likelihoodcorrection

Onlysensorobservationsstrictly betweenthe originanddestinationnode aredefinedasbeingpartofa sensor obser-vation path.Thisisadirectimplicationfromourchoicetodefine thefinal sensorobservationtobelongto thestationary phaseandnottothetravellingphase.Hence,byconstruction,thelinkthatleadstothedestinationnodeisneverpartofthe sensorobservationpath.Thisleadstoastructuralunderestimation ofthelikelihoodtoreproduce theactualobservations, whereanobservationbythesensoratthefinaldestinationmightapplytothetravellingphaseaswell.Tocompensatefor this,thelikelihoodascalculatedbyEq.(13)hasbeencorrectedbydividingby

(

1−

θ

d

)

,where

θ

dequalsthedetectionrate ofthesensorlocatedatthedestinationnode.

(15)

Table 2

Estimated parameters βLS , βnormal , βstage and θand evaluation metrics for the observations between June 28, 6PM and June 29, 5AM.

Parameter estimate std. err. t-stat p-value

βLS −0.45 0.077 −5.81 6.1e-9

βnormal 0.25 0.022 11.4 0

βstage −0.30 0.056 −5.32 1.1e-7

θ 0.46 0.023 20.1 0

10.3.Analysis

Withthecollectedsensorobservationpaths,westudiedtherelationbetweenroutechoicebehaviorandstagelocations. Forthisend,wefirstidentifiedallthelinksthatwerepartofastagearea.TheselinksareindicatedinFig.8(a)bythickred lines.Thefollowingutilityfunctionwasdefined:

v

(

j

|

i,o,d

)

=

(

−1 +

β

normal· 1 normal

(

j

)

+

β

stage· 1 stage

(

j

)

+

β

LS·

(

q˜ o,d

)

l

)

·

(

j

)

− c penalty· 1 U

(

i,j

)

(22) Inthisexpression,1stage(j)isanindicatorfunctionthatevaluatesto1incaselinkjispartofastagearea(thickredline) and0otherwise.Thefunction1_normal(j)returns1foranon-stagelinkand0forastagelink.Further,

β

LSrepresentsthelink sizeattribute value (see Section3.3). Theﬂow vector q˜o,d, thesecond componentofthe pathoverlapcorrection term, is calculatedaccordingto(6)usingtheutility function(22)with

β

LS,

β

normal and

β

stage setto0.Finally,thefunction1U(i,j) evaluatesto1incasethetransitionfromlinkitolinkjisau-turnand0otherwise.

Thesensorswereinstalledinsuchawaythattheirintersectionscouldbeobservedcompletely,whichmakesusassume thateachsensorhas(approximately)thesamedetectionrate

θ

.Themagnitudeof

θ

,however,wasunknown.Therefore,we decided

θ

tobepartofthesearchspaceinouroptimizationprocess.Thus,wemaximizedthejointlog-likelihoodbyvarying

β

normal,

β

stage,

β

LS and

θ

. Thelog-likelihoodmaximization wasperformedusingMATLAB’sfunctionfmincon,inwhich the detectionrate

θ

wasconstrainedtotheinterval[0,1)and

β

LSwasconstrainedtotheinterval[−1,1].Theparameters

β

normal and

β

stage wereconstrainedtobesmallerthan1(sincepreferencesforcyclesmightoccurotherwise).Thescaleparameter

μ

waskeptataconstantvalueof50.TheresultsoftheoptimizationareshowninTable2.

Whenweanalysetheestimatedparameters,weﬁrstofallrecognize thenegativevalueofthelinksizeattribute (

β

LS), whichisinaccordancewithpreviousstudies(e.g.,Fosgerauetal.(2013),Zimmermannetal.(2017)).Regardingthe hypoth-esis,werecognizethatthepreferenceforlinksthatarepartofastagearea(

β

stage)issigniﬁcantlylowerthanforlinksthat arenotpartofastage area(

β

normal).Althoughotherparameters mightplay aroleaswell, theresultsuggeststhat people actuallytriedtoavoidthecrowdedareaswhileconsciouslywalkingtotheirintendeddestination.

Finally,somewordsaboutthegoodnessofﬁt.Thevalueof

ρ

2_was_calculated_as_explained_in_Section_7.3_._For_the

refer-enceparameterset

β

0,weselectedzerovaluesforthelinksizeandstage-linkattributesandthevalue

θ

est=0.46forthe detectionrate.The

ρ

2 _that_was_found_equals_0.074._A_plausible_reason_for_this_low_value_is_that_prediction_of_sensor

obser-vationpathsisfundamentallymorediﬃcultthanthetraditionalpredictionofroutes,sincepredictionofsensorobservation pathsisinvolvedwithanadditionalsourceofstochasticity;thesensordetectionrate.Althoughthisstochasticcomponent decreases

ρ

2_, _it _has_to _be _kept _in _mind_that _we _are _generally _not _interested_in _predicting_the_actual _sensor_observation

paths,sowedonotnecessarilyconsideralowvalueof

ρ

2_as_a_bad_thing.

11. Thenetwork-freedataapproachasapath-basedalternative

Bierlaire and Frejinger (2008) proposed a path-based method to estimate route choice models with unprocessed, network-freelocation data. Theyintroduced the concept ofa Domainof Data Relevance,which corresponds to a physical regioninthenetworktowhichaspeciﬁcobservationisrelevant.Akeyelementinthemethodistheaso-called measure-mentequation, which calculates theprobability to observea certain location sequence, givena certain chosen path.The methodwasdesignedtobeusedwithlocationdata,likeGPSmeasurementsorself-reportedtrips.Theauthorssuccessfully appliedtheirnetwork-freedataestimationmethodonasetofself-reportedtripsinanetworkconsistingofalmost40,000 unidirectionallinks.

Thenetwork-free dataestimation approachis similarto our recursiveapproach inthe sense that itestimates aroute choicemodelwithincompletedata.Itwouldthereforebeinterestingtocomparebothmethods.Exceptforthefactthatthe network-freedataapproach involvesgenerationofachoiceset,themethodcanbe appliedtoourstaticsensorcontextin astraightforward way,byinterpreting sensorobservationsaslocation measurementsandobserved linksets fromsensors asthe DomainsofData Relevance.WefollowedthemethodologyasdescribedinBierlaireandFrejinger(2008),wherethe measurementequationresultsinto1incasethepathcrossesall“observed” Domains ofDataRelevanceinthecorrectorder and0otherwise.Thisprovidesuswithanalternativeestimationmethodfortheroutechoicemodel.

Nevertheless, since this implementation does not use the knowledge of the full sensor network, which includes the locationsanddetection ratesof all sensors, we could expect the methodto give biased estimations ifapplied to such a