• Nie Znaleziono Wyników

A novel one-layer recurrent neural network for the l1-regularized least square problem

N/A
N/A
Protected

Academic year: 2021

Share "A novel one-layer recurrent neural network for the l1-regularized least square problem"

Copied!
12
0
0

Pełen tekst

(1)

Delft University of Technology

A novel one-layer recurrent neural network for the l1-regularized least square problem

Mohammadi, Majid; Tan, Yao Hua; Hofman, Wout; Mousavi, S. Hamid

DOI

10.1016/j.neucom.2018.07.007

Publication date

2018

Document Version

Final published version

Published in

Neurocomputing

Citation (APA)

Mohammadi, M., Tan, Y. H., Hofman, W., & Mousavi, S. H. (2018). A novel one-layer recurrent neural

network for the l1-regularized least square problem. Neurocomputing.

https://doi.org/10.1016/j.neucom.2018.07.007

Important note

To cite this publication, please use the final published version (if applicable).

Please check the document version above.

Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

(2)

Green Open Access added to TU Delft Institutional Repository

'You share, we take care!' - Taverne project

https://www.openaccess.nl/en/you-share-we-take-care

Otherwise as indicated in the copyright section: the publisher

is the copyright holder of this work and the author uses the

Dutch legislation to make this work public.

(3)

Neurocomputing 315 (2018) 135–144

ContentslistsavailableatScienceDirect

Neurocomputing

journalhomepage:www.elsevier.com/locate/neucom

A

novel

one-layer

recurrent

neural

network

for

the

l

1

-regularized

least

square

problem

Majid

Mohammadi

a,∗

,

Yao-Hua

Tan

a

,

Wout

Hofman

b

,

S.

Hamid

Mousavi

c

a Faculty of Technology, Policy and Management, Delft University of Technology, The Netherlands b The Netherlands Institute of Applied Technology (TNO)

c Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg, Germany

a

r

t

i

c

l

e

i

n

f

o

Article history: Received 18 July 2017 Revised 12 May 2018 Accepted 4 July 2018 Available online 10 July 2018 Communicated by Dr Ding Wang

Keywords:

Least squares

l1 -regularization

Recurrent neural network Convex

Lyapunov Total variation

a

b

s

t

r

a

c

t

Thel1-regularizedleastsquareproblemhasbeenconsideredindiversefields.However,findingits solu-tionisexactingasitsobjectivefunctionisnotdifferentiable.Inthispaper,weproposeanewone-layer neuralnetworkto findthe optimal solution ofthe l1-regularizedleast squaresproblem. To solve the problem,wefirstconvertitintoasmoothquadraticminimizationbysplittingthedesiredvariableinto itspositiveand negativeparts.Accordingly,anovelneuralnetworkisproposed tosolve theresulting problem,whichisguaranteedtoconvergetothesolutionoftheproblem.Furthermore,therateofthe convergenceisdependentonascalingparameter,nottothesizeofdatasets.Theproposedneural net-workisfurtheradjusted toencompassthetotalvariationregularization.Extensiveexperimentsonthe

l1andtotalvariationregularizedproblemsillustratethereasonableperformanceoftheproposedneural network.

© 2018ElsevierB.V.Allrightsreserved.

1. Introduction

Thel1-regularizedleastsquares,orthelasso[1],hasreceiveda

considerable amountof attentionover thelast decade andmuch research in recent years has focused on solving its non-smooth convexoptimizationproblem

min x 1 2



y− Ax



2 2 +

λ

x



1 (1)

wherexRl,yRn,Aisann× lmatrixconsistingofldatapoints,

λ

isanon-negativeparameter,



v



2 indicatestheEuclideannorm, and



v



1 =



|

v

i

|

isthe l1 -normofv,whichencouragesthesmall

componentsofxtobezero.

The lassohasa broadrangeofapplications,such assignal re-construction[2],curve fittingandclassification[3],subspace clus-tering[4,5],sparsecoding[6,7],androbotcontrol[8],tonamejust afew.Intheseapplications,itiscriticaltosolvetheminimization (1)efficiently.Therefore,myriadmethodshavebeendevelopedfor solving(1)morequicklyandeffectively[9–12].

One promising wayto find the optimum of the minimization (1)istoutilizetherecurrentneuralnetwork.Oneofthemain ad-vantages of such an approach is that the structure of RNNs can

Corresponding author.

E-mail address: m.mohammadi@tudelft.nl (M. Mohammadi).

be implementedusingvery-large-scale integration(VLSI) and op-ticaltechnologies. Furthermore,it iswell-known that neural net-works have the ability to process real-time applications. Hence, when thereare demands on real-time processing, it is necessary anddesirable to employ parallel anddistributedapproaches, like neural networks. Despitehaving such unique merits, solving the minimization (1)via RNNs is thoroughly neglected (withthe ex-ceptionoftheRNNs forgeneralnon-smoothproblems).And, itis theprincipalincentivetodevelopanovelrecurrentneuralnetwork especiallytailoredforthelasso.

The tremendous challenge of solving the minimization (1) is itsnon-differentiability duetoits l1 -regularization.There aretwo options to putforward the neural networkby circumventing the non-differentiability of the lasso. The first approach is to take advantage of the dual problem of the minimization (1). This is the modus operandi of various methods in the recent literature [10,11,13].The interior-pointmethodis arguablythemostfamous technique used to solve the dual problem. Contrary to conven-tional interior-points methods, it is claimed that this technique is suitable for large-scale problems; a problem with millions of variablesissolubleinseveralminutesonanordinaryPC.However, the main difficulty in solving the dual problem is finding the optimalsolutionoftheprimalproblem,e.g.xintheminimization (1),fromthedual solution.The calculationoftheprimal solution x fromthe dual variable hasusually enmeshed the computation

https://doi.org/10.1016/j.neucom.2018.07.007

(4)

(

ATA

)

−1 . Mathematically speaking, such an inverse does not

theoretically exist forall matrices A. Ontop of that, the inverse calculation is both time- and memory-consuming for large-scale problems.Therefore,thisapproachisnottakenintoaccount.

Anotherapproachtosolvetheminimization(1)istoconvertit intoa smoothproblembysplittingthevariablexintoitspositive andnegativeparts.The resultant smooth problemcan be readily solvedusing gradient-based methods.The gradient projection for sparse reconstruction (GPSR) [9]solves the smooth problem and is of immense popularity among other methods. Further studies onthe gradient projection concentrated on acceleratingthe con-vergence[14,15].

Inthisarticle,weusethesecondapproachto comeup witha neuralnetworkinordertoavoidthecalculationoftheinverse ma-trix.However, splitting thevariable intoits positive andnegative parts results in dimension escalation of the consequent smooth problem. We further investigate whether the dimension increase canbe dealt withmoreeconomically than it appearsat thefirst sight.

Theproposedneuralnetworkisguaranteedtofindtheoptimal solutionofthesmoothproblemequivalenttotheminimization(1). Then,thesolutionoftheoriginalproblemcanbereadilyobtained byconductingthesubtractionsamongtheoutcomesoftheneural network.Further,theproposed neuralnetwork hasasimple one-layerstructurethatcanbesmoothlyimplemented.Fromthespeed pointofview,theconvergenceoftheneuralnetworkisrelianton a positive parameter determined by the user, not on the size of thedataset.Such a salientfeature isdesiredwhen largedatasets are available. We further adjust the proposed neural network to solvethetotalvariation-regularizedproblems.Similartothelasso, thetotalvariation-regularizedproblemsarenotdifferentiable.The efficiencyoftheproposedneuralnetworkisdemonstratedby con-ductingexperimentsoverseveralrealandsimulateddatasetsfrom thesignalandimageprocessingandbioinformaticsdomain.

In a nutshell, thecontributions of thisarticle can be summa-rizedasfollows:

• A novel recurrent neural network is proposed for solving the lasso.

• The neural network isguaranteed to convergeto thesolution oftheproblem.

• Theescalation indimensionsstemming fromthevariablesplit isdiscussed,andthecomputationcostisreduced.

• The neural network is then extended to solve the total variation-regularizedproblem.

• Extensive experiments are presented to illustrate the perfor-manceoftheproposedneuralnetwork.

The paperisorganizedasfollows.InSection II,wefirstderive the smooth problem of the minimization (1), andthen a neural networkisproposed accordingly.Further, wealso analyzethe ef-fect of dimension and the complexity of the neural network in thissection. The convergenceofthe neural network andits con-vergencerateare investigatedinSection III.Extensive experimen-talresultswithapplicationtocompressedsensingandimage and signalrecovery are discussedin SectionIX,andwe concludethis paperinSectionX.

2. Neuralnetworkforsmoothequivalentproblem

In this section, a smooth problemfor the minimization (1) is derivedbysplittingthedesiredvariablexintoitspositiveand neg-ative parts. The subsequent escalation of dimension and a one-layer neural network are investigated afterward. The proposed neural network is then adjusted to solve the total variation reg-ularizedproblem.

2.1. Smoothequivalentproblem

Tosolvetheminimization(1)usingtheneuralnetwork,wefirst restateitasasmooth quadraticproblem.Thisisdonebysplitting variablexintoitspositiveandnegativeparts.Letu,

v

∈Rn be aux-iliaryvariablessuchthat

x = u

v

u ≥ 0 ,

v

≥ 0

whereui=

(

xi

)

+ ,

v

i=

(

−xi

)

+ and

(

.

)

+ denotesthepositivepart de-fined as

(

x

)

+ =max

{

0,x

}

. Now, let 12 n=

(

1,1,...,1

)

∈R2 n, then theproblem(1)canberewrittenasthefollowingquadratic prob-lem: min z F

(

z

)

= 1 2 z TBz+ cTz (2) s.t. z≥ 0 where z =



u

v



, c =

λ

12 n+



−A Ty ATy



B =



ATA − A TA −A TA ATA



2.2. One-layerneuralnetwork

The smooth problem (2)is a convex minimization with non-negativity constraints. Therefore, the Karush–Kuhn–Tucker (KKT) conditions [16]are necessary and sufficient forthe optimality of thesolution.Asstatedby K.K.Tconditions,z∗ istheoptimal solu-tionoftheminimization(2)ifandonlyifthereexistsw∗∈R2 lsuch that(z∗,w∗)satisfiesthefollowingconditions:



F

(

z

)

− w = 0 , w≥ 0

wTz= 0 , z ≥ 0 . (3)

Fromthefirst equalityinEq.(3),itisdrawnthat

F

(

z

)

=w.The foregoingequationscouldbethusrestatedas

F

(

z

)

≥ 0 , z≥ 0 ,

F

(

z

)

Tz= 0 . (4)

The inequalities (4) are known as the nonlinear complementar-ityproblem(NCP)[17].Withtheaidofthenext theorem,a neu-ralnetwork fortheminimization(2)isproposedaccordingtothe aboveNCP.

Theorem2.1. Forthe problem(2),zis theoptimalsolution ifand onlyif

(

z

)

=0,where

(

z

)

= min

{

z,

F

(

z

)

}

, (5)

and



(z) is a vector valuefunction, and “min” represents the mini-mumvalueofeachelementofzand

F(z).

Proof. Itcanbeeasilydrawnfromtheinequalities(4)(see[18]for moreinformation). 

Basedon theabove theorem,thefollowingdynamicsystemis proposedtosolvetheproblem(2)

dz

dt = −

α

(

z

)

(6)

where

α

>0 is a scaling parameter. The dynamic system(6) can be recognized as a recurrent neural network with a single-layer structure. Beforeexamining its structure,however,we first probe intotheeffectofthedimensionescalation causedbythevariable split.

(5)

M. Mohammadi et al. / Neurocomputing 315 (2018) 135–144 137

Fig. 1. Block diagram of the proposed recurrent neural network (6) taking the computational reduction into account. The aa ij is the element at the ith row and jth column

of A T A , and the triangle and represent the multiplication and addition, respectively.

2.3. Dimensioneffectandcomplexityofneuralnetwork

Itisobservedthat thesizeoftheproblem(2)istwiceaslarge astheoriginalproblem(1)whilex∈Rl butzR2 l.However,this increaseindimensiondoesnothaveasignificantimpactsincethe matrix operation to obtain B can be performed more efficiently thanitmightseem.Toillustratetheminorityofthiseffect,letus considerthecomplexityofthesystem(6)bycomputingthe num-berofmultiplicationsandadditions/subtractionsineachiteration. ThemostcostlycomputationbelongstoBzwhileBisa2l× 2l ma-trixandz∈R2 l.Suchacalculationrequires4l2 multiplicationsand

4l2− 2l additions.

However, the computationcan be significantly reduced.For a givenz=

(

uT,

v

T

)

T,onecanrewriteBzas

Bz = B



u

v



=



ATA

(

u

v

)

−A TA

(

u

v

)



.

The computationofBzonly requiresl2 multiplicationsandl2 ad-ditions/subtractions,consideringthatATAshouldbecomputed be-forehand. Hence, thenumberofoperationshasdropped from4l2 multiplicationstol2 ,andfrom4l2 − 2l additions/subtractionstol2 . Intheaggregate, asc isalsoapre-process computation,l2 multi-plicationsandl2 +2l additions/subtractionsaredone ineach iter-ationofthedynamicsystem(6).

Intheelementform,thedynamicsystem(6)canwrittenas

dzi

dt =

(

zi

)

= min

((

Bz

)

i + ci,zi

)

= min

(

sign

(

l− i

)



j

(

aai j

(

u

v

)

j+ ci,zi

)

(7)

where aaij is the element in the ith rowand jth column of the matrixATA.Asregardstheelement-wiseequationoftheproposed neural network,its structure isdisplayed inFig. 1.In thisfigure, themodificationfordimensionescalationisalsoconsideredto re-ducethecomplexityofthenetwork.Theoutputsoftheneural net-work are ui’s and vi’s, which are recursively entered in the first layer. They are then multiplied by aaij, which are shown as the triangleinthefigure andareexplained inEq.(7).Intheview of Fig.1,thecircuitconsistsof2lintegrators,2lactivationminimum functions,4lsummers,andsomeconnectionweights.

2.4.Totalvariation-regularizedproblem

Thetotalvariation-regularizedproblemisanothernon-smooth minimization. The corresponding minimization function for total variation-regularizedproblemis

min

q



p− q



2

2 +

λ

q



TV

wherepRl isthe observation,qRl isthedesiredvariable,

λ

is theregularizationparameterand



x



TV=il−1 =1

|

xi− xi+1

|

isthe

to-talvariationnorm.Thisproblemcanbeequivalentlyrewrittenas

min q



p− q



2 2 +

λ

Dq



1 (8) whereDRl−1 ,l isdefinedas D=

1 −1 0 ...0 0 0 1 −1 ...0 0 . . . . . . . . . . . . . . . 0 0 ... 1 −1

.

(6)

Toallappearances,theproblem(8)issimilartotheminimization (1);however,thetotalvariation-regularizedproblemhasmore ma-jorchallengesasthevariableinthel1 -regularizationhasbeen mul-tipliedbyamatrix.

Harchaoui andLevy-Leduc [19]solved the total variation reg-ularizedminimization(8)through theproblem(1).Thefollowing theoremsummarizestheirmainresult.

Theorem2.2 [19]. By the following change in variables, the mini-mizations(1)and(8)areequivalent:

x= Dq

A = DT

(

DDT

)

−1

y = DT

(

DDT

)

−1 Dp (9)

where D, p, and q are thevariables in thetotal variationproblem. Further,thevariableqintheminimization(8)isobtainedas

q = p+ DT

(

DDT

)

−1

(

x− Dp

)

. (10)

Inother words,thetotalvariation-regularizedproblem(8)can besolvedbytheminimization(1)withtheinitialization(9).Then, theoptimalsolutionqiscalculatedbyEq.(10).

Based onthistheorem,theproposedrecurrentneuralnetwork canbeadjustedtosolvethetotalvariation-basedregularizationas well.Themajorelementsfortheneuralnetworkcomputationare

ATA =

(

DDT

)

−1 ATy =

(

DDT

)

−1 Dp

Inthe experimentsection, two applications ofthe totalvariation regularizationareinvestigated.

3. Convergenceanalysis

To assess the reliability of the proposed dynamicsystem, we firstdiscuss its stabilityandconvergence, andfurther investigate thepropertiesofthe presentedRNN. Thesystemis proved tobe globallyconvergentandstableinaLyapunovsense.

Definition 3.1. A continuous-time neural network is said to be globallyconvergentifthetrajectoryofthecorrespondingdynamic systemconvergestoanequilibriumpointforanyinitialpointz(t0 ). Inotherwords,theequilibriumze isconvergentif

δ

> 0 s.t.



z

(

t0

)

− z e



<

δ

⇒ lim

t−→∞ z

(

t

)

= ze.

Lemma3.2. Thefunction



(.), definedin the system(5),is a Lips-chitzcontinuousfunction.Therefore,thereexistsapositiveconstantL suchthat



(

x

)

(

y

)



≤ L



x− y



,

x,y∈ R 2 n. (11) Proof.Foranyarbitraryx,y∈R2 n,wehave



(

x

)

(

y

)



=



min

{

x,

F

(

x

)

}

− min

{

y,

F

(

y

)

}

=



x+

F

(

x

)

|

x

F

(

x

)

|

2 − y+

F

(

y

)

|

y

F

(

y

)

|

2



=



1 /2

{

(

x− y

)

+

(

F

(

x

)

F

(

y

))

|

x

F

(

x

)

|

+

|

y

F

(

y

)

|}

≤ 1 /2

{

x− y



+

∇

F

(

x

)

F

(

y

)



+

|

x

F

(

x

)

|

|

y

F

(

y

)

|}

≤ 1 /2

{

x− y



+

∇

F

(

x

)

F

(

y

)



+



x

F

(

x

)

− y +

F

(

y

)

}



x− y



+



(

Bx+ c

)

(

By+ c

)



=

(

1 +



B



)



x− y



,

Now,letL=

(

1+



B



)

andtheproofiscomplete. 

The upcoming discussion elaboratesthe convergence and sta-bilityofthesystem(6).

Theorem3.3. Foranyinitialpointz0 ,thereexistsaunique continu-oussolutionz(t)for(6)withinthefinitetime.Moreover,the equilib-riumpointof(6)isthesolutionoftheminimization(2).

Proof. AccordingtoLemma3.1, thefunction



(z)isLipschitz con-tinuousandsoistheright-handsideofthesystem(6).Thus,based onthePeano’s theoremforODEs[20],thereexists aunique con-tinuoussolutionz(t) for(6)definedont0 ≤ t≤ Tf.Theinterval [t0 , Tf)istheso-calledmaximalintervalofexistence.

Furthermore, we show that Tf=∞ if the set of all pos-sible solutions,



=

{

z∈R2 n

|

z≥ 0

}

, is bounded. To do so, let



be bounded and z0



; and let

|

z

F

(

z

)

|

represent

(

|

z1

F

(

z

)

1

|

,...,

|

z2 n

F

(

z

)

2 n

|

)

.Wehave



(

z

)



=



min

{

z,

F

(

z

)

}

=



z+

F

(

z

)

2

|

z

F

(

z

)

|



≤ 1 /2

(



z+

F

(

z

)



+



z

F

(

z

)



)

≤ 1 /2

(



z



+

∇

F

(

z

)



+



z



+

∇

F

(

z

)



)



z



+

∇

F

(

z

)



Ontheotherhand,since



isbounded,thereexistsavectorKsuch that foranyz∈Rn wehave

∇

F(z)





K



([21]).It isobtainable that



z

(

t

)





z0



+

α

t t0



(

z

(

s

))



ds



z0



+

α

t t0



z

(

s

)



+

∇

F

(

z

)



ds



z0



+

α

(



K



(

t − t 0

))

+ t t0



z

(

s

)



ds

)

Furthermore,byGronwallinequality[22]



z

(

t

)





z0



+

α

K



(

t − t 0

)

exp

(

α

(

t − t 0

))

.

Thus,thesolutionz(t)isboundedon[t0 ,Tf),whichimpliesTf=∞ andthiscompletestheproofofthefirstpart.

Now,ifz∗istheequilibriumpointofsystem(6),then

(

z

)

= 0,andaccordingtoTheorem2.1hisequilibriumpointisthe opti-malsolutionofproblem(2). 

Theorem3.4. Theproposedneuralnetwork(6)withtheinitialpoint z0 ∈R2 n isstable inthe senseof Lyapunovand globallyconverges to thesolutionof(2).Moreover,theconvergencerateoftheneural net-work(6)escalatesas

α

increases.

Proof. AccordingtoTheorem3.1, thereexistsauniquesolutionz∗ forthesystem(6)withintheinterval[t0 ,Tf).Letz



andconsider thefollowingLyapunovfunction:

E

(

z

)

= F

(

z

)

− F

(

z

)

.

ItisreadilyseenthatE(z)≥ 0becausez∗istheoptimalsolutionof theminimization(2).Further,z∗istheoptimalsolutionofproblem (2) if andonly if

(

z

)

=0 (according to Theorem 2.1), and the solutionof

(

z

)

=0isunique(byTheorem3.3),soisthesolution oftheproblem(2).Thus,E

(

z

)

=0ifandonlyifz=z∗.Moreover, wehave dE

(

z

)

dt =



dE(z) dz



Tdz dt = −

α∇

F

(

z

)

T

((

z

))

= −

α∇

F

(

z

)

T

(

z+

F

(

z

)

|

z

F

(

z

)

|

2

)

= −

α

2

(

F

(

z

)

Tz+

∇

F

(

z

)



2

F

(

z

)

T

|

z

F

(

z

)

|

)

α

2

(

F

(

z

)

Tz

∇

F

(

z

)



2 +

F

(

z

)

T

|

z

|

+

∇

F

(

z

)



2

)

= 0 , (12)

(7)

M. Mohammadi et al. / Neurocomputing 315 (2018) 135–144 139

Fig. 2. Convergence of the proposed neural network (6) with α= 10 and different initializations: (a) with the initialization z = 1;(b) with the initialization z = 0; (c) with the random initialization. The x -axis is the iteration and y -axis is the value of elements of the desired variable x in the lasso problem.

where

|

z

|

=z since z≥ 0. Hence, the system (6) is stable in the sense ofLyapunov. Wefurther investigatetheglobalconvergence of the proposed system and show that dz/dt=0 if and only if dE/dt=0. To do so, let dz/dt=0 which implies

(

z

)

=0, then clearly dE dt = −

α∇

F

(

z

)

T

(

z

)

= 0 . Conversely,ifdE/dt=0,then

F

(

z

)

T

((

z

))

= 0 .

In this equation,

(

z

)

=0 results in dz/dt=0 and the proof is complete.Butif



(z)=0and

F

(

z

)

=0,weget(sincez≥ 0)

dz

dt = −

α

(

z

)

= min

{

z,

F

(

z

)

}

=

F

(

z

)

= 0 .

Therefore,the presentedsystem(6)is stableinthesense of Lya-punovandgloballyconvergestotheoptimalsolutionof(2).

Moreover,theinequalityin(12)impliesthatas

α

increases,the convergenceratealsoincreases. 

4. Experimentresults

This section presents the experimental results regarding the proposed neural network. First, the convergence analysis of the neural network was empirically investigated,and its dependency on theparameter

α

wasverified.Then, the proposed neural net-work wasappliedto threedifferentapplications.The firstwasto recover a sparse signal from noisy observations. The other two were an image restoration andan aCGH data recovery,in which the total variation-regularized minimization is utilized. The pro-posedneuralnetworkisimplementedinMATLAB bytheordinary differentialequations(ODE)solvers.

4.1. Empiricalconvergenceanalysis

Theconvergenceoftheproposedneuralnetworkhasbeen the-oretically investigated.Wenowpresentempiricallyexplorationof theconvergenceoftheproposedneuralnetwork(6)asa comple-ment to the theoretical studies. To do so, the WINE benchmark problem, whichconsistsof178datawithfourattributes, was se-lected. To check the convergence, y was set to one of the data pointsrandomlyselectedfromthedataset,andAwasthe remain-ing data. Thus, the minimization of the problem (1) obtained a coefficient vector that enabledusto write therandomly selected sampleasalinearcombinationofotherdatapoints.Thisisknown astheself-expressivenessproperty,whichisutilizedinrecentworks [5,23]. Theconvergenceis scrutinizedby various initializationsin order to check the sensitivity of the neural network to the ini-tialization. Let

α

=10, Fig. 2 plots the convergence of the neu-ral network trajectory withthe initial point z=[1,...,1]∈R356 ,

z=[0,...,0]∈R356 andrandominitialization,respectively. The

x-axisin thisfigure isthe iterationandy-axisisthe value ofeach

Fig. 3. The transient behavior of the energy error based on the neural network

(6) for three different values of αon the WINE benchmark. The solid, dashed and dotted lines correspond to α= 10 , 15 and 20, respectively.

element of the vector x. In this figure, it is clear that most of the coefficients z converge to zero, which is the reason for the l1 -regularization.Further,thenon-zerocoefficientsconvergetothe samevalues(onearound0.22andanotheraround0.61).This indi-catesthattheneuralnetworkisgloballyconvergenttotheoptimal solution,anditsconvergenceisnotreliantontheinitializations.

Furthermore,weexploredtheconvergenceratebehaviorofthe neuralnetwork(6).Todoso,werepeatedthepreviousexperiment overtheWINEbenchmarkandassumedthat

α

is10,15and20in thedynamicsystem(6).Theenergyerrorofthe proposed neural networkcanbedefinedas

ER

(

z

)

=



(

z

)



2 . (13)

Accordingtothedynamicsystem(6),ER

(

k

)

=0ifandonly ifk∗ isan optimalsolution.Fig.3 showsthetransientbehavior ofthe error. It is readily observable that the bigger value of

α

acceler-atestheconvergenceoftheproposedneuralnetworkonthesame problem. Thus, onecan accelerate the convergencesimply by in-creasingtheparameter

α

.

4.2.Signalreconstruction

In thissection, we consider a sparse signal recovery problem withasignalx∈R4096 .Inthisexample(shownatthetopofFig.4),

thereare160spikeswith± 1amplitude.ThematrixA∈R1024 ×4096

isfilled withindependentsamplesof thestandard normal distri-butionwithorthonormalizedrows.Theobservationyisgenerated accordingto

y = Ax+ n (14)

wherenisanoisedrawnaccordingtothenormaldistributionN(0, 0.01)onR1024 .Theparameter

λ

isalsochosenby

(8)

Fig. 4. Sparse signal reconstruction. Top: the original signal. Middle: the minimum energy reconstruction. Bottom: the reconstructed signal using the neural network

(6) .

asfor

λ

>



ATy



∞ ,the unique minimumof(1)is thezerovector [24].

Fig. (4) showsthereconstruction results.Theoriginal signal is presentedatthe top ofthe plot.The middle plotshows the sig-nalx=Ay,which isknown asthe minimumenergy reconstruc-tion. The bottom plot delineates the reconstructed signal by the proposedneuralnetwork (6).As canbe readilygraspedformthis figure,theproposedneuralnetworkcanfaithfullyrecoverthe cor-rupted signal even though only a few of the non-zero measure-mentsareavailableincomparisontoallelements.

4.3.aCGHdatarecovery

Array comparative genomehybridization (CGHarray oraCGH) is a new technique to discover the aberration in the DNA copy number[25,26].Thegreatestchallengeinfindingtheaberrationsis thataCGHdataarehighlycorruptedbyvariousnoisessothat the boundariesofthenormalandaberrantgenomescannotbereadily detected.Asaresult,itisoftheutmostimportancetoremovethe noisesfromtherawaCGHdatapriortotheaberrationdetection.

The most popular way of denosing aCGH data is to solve a problemregularized by the total variation norm. These method-ologiesprocesseither allthe aCGHsamplesina dataset simulta-neously[27–30]oreachsampleseparately[31,32].

We applied the proposed neural network for noise removal fromthe aCGH data and compared it with state-of-the-art algo-rithmssuch as total variation and spectral regularization (TVSp) [33],piece-wiseandlowrankapproximation(PLA)[34],low rank recovery based on the half-quadratic minimization (LRHQ) [30], andgroupfusedlassosegmentation(GFLseg)[28].TVSptakes ad-vantage of the nuclear norm regularization along with the total variation norm. By the same token, PLA and LRHQ have similar formulation,withmoresparsityconstraintsintheformermethod andmorerobust information-theoretic lossfunction in the latter method.GFLsegisyetanothertechniquethatutilizestheweighted l1− l2normwiththeintegral total-variationregularization.Allof thesemethods havemoreparameters to be tuned (at leasttwo), and are of higher complexity due to the various regularizations theyemployed.Inthefollowing,weshowthattheproposedneural

networkiscompetitivewiththestateoftheartdespiteits simplic-ityandlowernumberofparameters.

The performance comparison was twofold.First, The compar-ison was conducted based on receiver operating characteristic (ROC)curvesacross simulateddatasets contaminatedby different typesofnoise.Second,tworeal-worldaCGHdatasetswereusedto carryouttherecovery.

4.3.1. Experimentonsimulateddata

Inthissubsection,themethodsmentionedabovearecompared acrosssynthesizeddatasets.Intheexperiment,50sampleswitha length of500were generatedaccording tothemethodology pre-sentedin[33].The simulateddatawere corrupted by a Gaussian noisewithdifferentsignal-to-noise(SNR)ratios.Forthefirst com-parison,we plot theROCdiagram forthe methods.The ROCis a curveplottingthetruepositiverate(TPR)againstthefalsepositive rate (FPR) for different thresholds. Given a threshold T, the true andfalsepositiveratesaredefinedas

TPR

(

T

)

=

|

PT

|

|

A

|

FPR

(

T

)

=

|

FPT

|

|

N

|

where A and N are respectively real aberrations and normal genomes,PTandFPT arerespectivelythe trulyandfalsely discov-eredaberrations,and|.|isthecardinalityoperator.Theseelements can be easily obtained as the study was on the simulated data. IntheROC curve,moredeviationfromthediagonalindicates the superiorityofthemethods.Fig.5plotstheROCdiagramfor differ-ent SNRs.The proposed neural network consistently outperforms PLA and GFLseg in all scenarios as it has more digression from thediagonal.However,TVSpandLRHQareslightlybetterthanthe proposed neural network. For SNR=0.5,the superiority of TVSp andLRHQ ismoreevident while the proposed neuralnetwork is competitiveforotherSNRs.Thereasonforsuchadifferenceisthe complexityofTVSpandLRHQ.Bothutilizethenuclearnorm (be-sidesthetotalvariation)intheir problemtoinducethe lowrank intherecoveredprofiles.Sucharegularizationincreasesthe com-plexityandrequirestheinterminablesingularvaluedecomposition in each iteration.Despite its simplicity,the recurrentneural net-work has a reasonable performance in removing the noise from aCGHdata.

4.3.2. Experimentonrealdatasets

Theperformanceoftheproposedneural networkwasthen in-vestigated across real datasets. To do so, two datasets were em-ployed: the Pollack et al. dataset [35], which includes 44 breast tumorsof6691humanmappedgenes,andChinetal.dataset[36], whichconsistsof2149clonesfrom141primarybreasttumors.

Thesedatasetsweresubjectedtotheproposed neuralnetwork to obtainthe recoveredprofiles. Fig.6plots theheat andbar di-agramsfortheretrievedprofilesofthedatasetsmentionedabove. Theheatmapsare plottedatthetopandthebardiagram,which is the sum of the number of grains across all samples given a threshold,isatthebottom.Asthecolorbarsuggests,theyellowish segmentsintheheatmap indicatetheduplicationandthebluish segments indicate the lossin the aCGHdata.The greenish parts, whichareindeedprevalentintheheatmap,arewherethereisno aberration.Theresultsfromthebardiagramsindicatethatprobes 178–184fromthePollacketal.datasetandprobes38–39fromthe Chinetal.datasetareamplificationregions. Regardingtheir loca-tionsonthechromosome,thediscoveredareasfrombothdatasets areinaccordancewitheachother andarealsoinlinewithother studiesonbreastcancer[35,36].

To show the efficient data recovery by the neural network, several recovered profiles from the proposed neural network, TVSp [33] and PLA [34] are presented in Fig. 7. Each column

(9)

M. Mohammadi et al. / Neurocomputing 315 (2018) 135–144 141

Fig. 5. The performance comparison of the proposed recurrent neural network (RNN), TVSp [33] , PLA [34] , LRHQ [30] and GFLseg [28] via the ROC curve. The ROC curves of different methods on the simulated data corrupted by the Gaussian noise with different SNRs: (a) SNR = 0.5; (b) SNR = 1.0; (c) SNR = 1.5; (d) SNR = 2.0. The x -axis and

y -axis of each figure is the false positive rate and the true positive rate, respectively.

Fig. 6. The profiles retrieved by the proposed neural network; (a) the recovered profiles of the Pollack et al. dataset [35] ; (b) the recovered profiles of the Chin et al. dataset

[36] . The yellowish color in the heat map (the top figure) indicates the duplication and the bluish shows the loss in the chromosome. The greenish areas are the normal regions. The bottom is the bar diagram which plots the sum of the number of aberrations with the threshold 1. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

in this figure is dedicated to one sample, andeach column cor-responds to a recovery method. Further, the red dots are the real data,andthe blue linesindicate the datarecovered by each method. From the smoothness perspective, the proposed neural network consistentlyoutperformsPLAandTVSp, sincethe recov-ered data are much smoother than those recovered by PLA and TVSp.

4.3.3. Timecomplexity

The proposed neural network was empirically evaluated in terms ofthe executiontime. To thisend, 50 aCGHsamples with a differentnumberofprobes were generatedandcorrupted with

arandomGaussian noise.Theresultingcorrupted datawerethen subjectedtodifferentmethods forrecovery,andthetime needed todoso istheparameter basedonwhichthe variousalgorithms are contrasted. The numbers of probes for this experimentwere 50,500,1000,and10,000.Theexperimentswereperformedona PCwitha3.2Core-i5CPUand4GBofRAM.

Fig. 8plots the time in seconds that each method needed to completetherecoverytaskwithdifferentnumbersofprobes.The proposed neural network significantly outperforms RCLR, and is quitecompetitivewithTVSp.PLAandGFLSegaremuchfasterthan theothers,mainlyduetothefacttheyhaveimplementedapartof theiralgorithminC/C++,whichisinherentlyswift.

(10)

Fig. 7. Five selected samples from the Pollack et al. dataset recovered by various methods. Each row in this figure corresponds to a sample and each column tallies with a recovery method. The three methods are the proposed neural network, PLA [34] and TVSp [33] . The red dots are the real data from the datasets, and the blue lines are the data retrieved by each method. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 8. The time required for each method to complete the recovery task over a dataset with 50 sample and different numbers of probes. The x -axis is the number of probes, and y -axis is the time in seconds for each method to complete the task.

4.4.Imagerestoration

The final experiment was to recover the original image from noisyobservations.Todoso,threeimageswereselectedand con-taminatedbytheGaussiannoisewith

σ

=0.05.Thefirstand sec-ondcolumnsofFig.9correspondtotheoriginalandnoisyimages understudy, respectively. The totalvariation-regularized problem (8)wasutilized to recover the original images fromthe contami-natedobservations.Therecovery wascarriedoutbytheproposed neuralnetworkandtheprimal-dualsplittingmethod(PDSM)[37]. Theimagesrecoveredby PDSMandtheproposed neuralnetwork are presentedin the third and fourthcolumns, respectively. This figureclearly showsthat the proposed neural network has

faith-Table 1

The mean square errors of the proposed neural network and the primal-dual splitting method (PDSM) [37] across three images.

Image RNN PDSM

MRI 3 . 08 × 10 −3 5 . 75 × 10 −5

Lena 6 . 49 × 10 −5 9 . 13 × 10 −5

Cameraman 7 . 47 × 10 −5 9 . 54 × 10 −5

fullyrecovered the images.We further tabulatethe meansquare error of two methods for each image in Table 1. The table also confirmsthat theproposed neural network retrieves the original imageswithhighconfidenceandiscompetitivewithPDSM.

(11)

M. Mohammadi et al. / Neurocomputing 315 (2018) 135–144 143

Fig. 9. Image recovery by the proposed neural network and PDSM [37] . The columns from left to right correspond to the original image, noisy image, the image recovered by PDSM, and the image retrieved by the neural network, respectively.

5. Conclusion

This paperpresented a one-layer recurrent neural network to find theoptimal solution ofthe l1 -regularized least square prob-lem. Theproposed neural network isguaranteed toglobally con-vergeto thesolutionofthisproblemwhileitsconvergenceis re-liant not upon the size of the datasets but upon a constant pa-rameter. Theexperiments furtherinvestigated theconvergenceof the neural network and its dependence on the constant param-eter. The proposed recurrentneural network was applied to sev-eral problemsincludingsparse signal recovery,image restoration, and aCGH data recovery. These applications showed the reason-able performance of the proposed neural network incomparison withotherstate-of-the-artmethods.

References

[1] R. Tibshirani , Regression shrinkage and selection via the lasso, J. Royal Stat. Soc. Ser. B (Methodol.) 58 (1) (1996) 267–288 .

[2] S.J. Wright , R.D. Nowak , M.A. Figueiredo , Sparse reconstruction by separable approximation, IEEE Trans. Signal Process. 57 (7) (2009) 2479–2493 .

[3] C.M. Bishop , et al. , Pattern Recognition and Machine Learning, 1, Springer, New York, 2006 .

[4] E. Elhamifar , R. Vidal , Sparse subspace clustering, in: Proceedings of IEEE Con- ference on Computer Vision and Pattern Recognition, CVPR 20 09, IEEE, 20 09, pp. 2790–2797 .

[5] E. Elhamifar , R. Vidal , Sparse subspace clustering: algorithm, theory, and appli- cations, IEEE Trans. Pattern Anal. Mach. Intell. 35 (11) (2013) 2765–2781 .

[6] H. Lee , A. Battle , R. Raina , A.Y. Ng , Efficient sparse coding algorithms, in: Proceedings of Advances in Neural Information Processing Systems, 2006, pp. 801–808 .

[7] J. Mairal , F. Bach , J. Ponce , G. Sapiro , Online dictionary learning for sparse cod- ing, in: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, 2009, pp. 689–696 .

[8] L. Jin , S. Li , X. Luo , Y. Li , B. Qin , Neural dynamics for cooperative control of redundant robot manipulators, IEEE Trans. Neural Netw. Learn. Syst. (2018) .

[9] M.A. Figueiredo , R.D. Nowak , S.J. Wright , Gradient projection for sparse recon- struction: application to compressed sensing and other inverse problems, IEEE J. Sel. Top. Signal Process. 1 (4) (2007) 586–597 .

[10] J. Kim , H. Park , Fast active-set-type algorithms for l1-regularized linear regres- sion, in: Proceedings of the International Conference on Artificial Intelligence and Statistics, 2010, pp. 397–404 .

[11] S.-J. Kim , K. Koh , M. Lustig , S. Boyd , D. Gorinevsky , An interior-point method for large-scale l 1-regularized least squares, IEEE J. Sel. Top. Signal Process 1 (4) (2007) 606–617 .

[12] Y. Xiao , Q. Wang , Q. Hu , Non-smooth equations based method for 1-norm problems with applications to compressed sensing, Nonlinear Anal.: Theory Methods Appl. 74 (11) (2011) 3570–3577 .

[13] P.G.C. Zhang , in: A Fast Dual Projected Newton Method for L1-Regularized Least Squares, Tsinghua University, Beijing, 2011 .

[14] P. Tseng , S. Yun , A coordinate gradient descent method for nonsmooth separa- ble minimization, Math. Program. 117 (1–2) (2009) 387–423 .

[15] I. Loris , M. Bertero , C. De Mol , R. Zanella , L. Zanni , Accelerating gradient projec- tion methods for 1-constrained signal recovery by steplength selection rules, Appl. Comput. Harmon. Anal. 27 (2) (2009) 247–254 .

[16] M.S. Bazaraa , H.D. Sherali , C.M. Shetty , Nonlinear programming: Theory and Algorithms, John Wiley & Sons, 2013 .

[17] O.L. Mangasarian , Equivalence of the complementarity problem to a system of nonlinear equations, SIAM J. Appl. Math. 31 (1) (1976) 89–92 .

[18] D.P. Bertsekas , J.N. Tsitsiklis , Parallel and Distributed Computation: Numerical Methods, 23, Prentice Hall, Englewood Cliffs, NJ, 1989 .

[19] C. Levy-leduc , Z. Harchaoui , Catching change-points with lasso, in: Proceedings of Advances in Neural Information Processing Systems, 2008, pp. 617–624 .

[20] J.K. Hale , Functional Differential Equations, Springer, 1971 .

[21] S. Boyd , A. Mutapcic , Subgradient Methods, in: Notes for EE364b, Stanford Uni- versity, Winter 2006-07 .

[22] R. Bellman , et al. , The stability of solutions of linear differential equations, Duke Math. J. 10 (4) (1943) 643–647 .

[23] G. Liu , Z. Lin , Y. Yu , Robust subspace segmentation by low-rank representa- tion, in: Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010, pp. 663–670 .

[24] J.-J. Fuchs , On sparse representations in arbitrary redundant bases, IEEE Trans. Inf. Theory, 50 (6) (2004) 1341–1344 .

(12)

[25] D. Pinkel , D.G. Albertson ,Array comparative genomic hybridization and its ap- plications in cancer, Nat. Genet. 37 (2005) S11–S17 .

[26] L. Feuk , A.R. Carson , S.W. Scherer , Structural variation in the human genome, Nat. Rev. Genet. 7 (2) (2006) 85–97 .

[27] C.M. Alaíz , Á. Barbero , J.R. Dorronsoro , Group fused lasso, in: International Conference on Artificial Neural Networks, Springer, 2013, pp. 66–73 .

[28] K. Bleakley, J.-P. Vert, The group fused lasso for multiple change-point detec- tion, arXiv preprint arXiv: 1106.4199 (2011).

[29] H.S. Noghabi , M. Mohammadi , Y.-H. Tan , Robust group fused lasso for multi- sample copy number variation detection under uncertainty, IET Syst. Biol. 10 (6) (2016) 229–236 .

[30] M. Mohammadi , G.A. Hodtani , M. Yassi , A robust correntropy-based method for analyzing multisample aCGH data, Genomics 106 (5) (2015) 257–264 .

[31] A. Mitra , G. Liu , J. Song , A genome-wide analysis of array-based comparative genomic hybridization (CGH) data to detect intra-species variations and evolu- tionary relationships, PloS one 4 (11) (2009) e7978 .

[32] J. Hu , J.-B. Gao , Y. Cao , E. Bottinger , W. Zhang , Exploiting noise in array CGH data to improve detection of DNA copy number change, Nucl. Acids Res. 35 (5) (2007) e35 .

[33] X. Zhou , C. Yang , X. Wan , H. Zhao , W. Yu , Multisample ACGH data analysis via total variation and spectral regularization, IEEE/ACM Trans. Comput. Biol. Bioinform. 10 (1) (2013) 230–235 .

[34] X. Zhou , J. Liu , X. Wan , W. Yu , Piecewise-constant and low-rank approximation for identification of recurrent copy number variations, Bioinformatics 30 (14) (2014) btu131 .

[35] J.R. Pollack , T. Sørlie , C.M. Perou , C.A. Rees , S.S. Jeffrey , P.E. Lonning , R. Tibshi- rani , D. Botstein , A.-L. Børresen-Dale , P.O. Brown , Microarray analysis reveals a major direct role of dna copy number alteration in the transcriptional program of human breast tumors, Proc. Natl. Acad. Sci. 99 (20) (2002) 12963–12968 .

[36] K. Chin , S. DeVries , J. Fridlyand , P.T. Spellman , R. Roydasgupta , W.-L. Kuo , A. La- puk , R.M. Neve , Z. Qian , T. Ryder , et al. , Genomic and transcriptional aber- rations linked to breast cancer pathophysiologies, Cancer Cell 10 (6) (2006) 529–541 .

[37] L. Condat , A primal–dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms, J. Optim. Theory Appl. 158 (2) (2013) 460–479 .

Majid Mohammadi is a Ph.D. candidate at Information and Communication Technology group of the Department of Technology, Policy and Management of the Delft Uni- versity of Technology. He has obtained his B.Sc. and M.Sc. in Software Engineering and Artificial Intelligence, respec- tively. His main research interest is semantic interoper- ability, machine learning and pattern recognition.

Yao-Hua Tan is professor of Information and Communi- cation Technology at the ICT Group of the Department of Technology, Policy and Management of the Delft Univer- sity of Technology and part-time professor of Electronic Business at the Department of Economics and Business Administration of the Vrije university Amsterdam. His re- search interests are service engineering and governance; ICT-enabled electronic negotiation and contracting; multi- agent modelling to develop automation of business proce- dures in international trade.

Wout Hofman is senior research scientist at TNO, the Dutch organization for applied science, on the subject of interoperability with a specialization in government (e.g. customs) and business interoperability in logistics. He is responsible for coordinating semantic developments within the iCargo project. Wout is also as member of the Scientific Board of the EU FP7 SEC Cassandra project re- sponsible for IT developments in that latter project.

S. Hamid Mousavi was born in Mashhad, Iran on Febru- ary 3, 1988. He received the B.Sc. degree in pure math- ematics from Ferdowsi University of Mashhad (FUM) in 2011. He started his M.Sc. in applied mathematics in FUM and worked on control and optimization problems. Af- ter graduation in 2015, he joined the machine learning group at the University of Oldenburg, Germany, where he is currently working toward a doctorate degree. His ma- jor fields of interest currently are optimization and prob- abilistic algorithms.

Cytaty

Powiązane dokumenty

Suligowski, wychodził z założenia, że normy etyki zawodowej ujęte w formę kodeksu byłyby zbyt sztywne, zbyt wią- żące i że członkowie komisji dyscyplinarnych musieliby

3.7. Logical reasoning and problem solving is rarely used by people. Even medical doctors, after many years of studying, relay mostly on an intuitive knowledge acquired during

It was found that the neural network based adaptive state feedback controller can be successfully used to control load current during DC voltage value changes. The

This locality ensures that a hidden neuron has limited influence on the network response, so removing this neuron or modification of its weights influences only these patterns which

Признать за судом право на создание права означает поставить под сомнение суверенитет парламента, народа или того, за кем он признается официальной доктриной

W ostatnim sympozjum, które odbyło się w lutym 1971 r., wzięli udział — prócz stałych wykładowców, wiceprezes NRA adw. dr Zdzisław Krzemiński i adw.

EXE : An executable to predict the mean overtopping discharge at coastal structures given user-supplied input parameters that characterise the geometry of the structure

When the legs are in a straight position (the length of the arm of the force bending the fin is maximum), the propulsion effect depends on the downward movement speed because the