Linear State-Space Identification of Interconnected Systems: A structured approach

(1)

Linear State-Space

Identification of

Interconnected Systems: A

Structured Approach

(2)

(3)

IDENTIFICATION OF

INTERCONNECTED SYSTEMS: A

STRUCTURED APPROACH

PROEFSCHRIFT

ter verkrijging van de graad van doctor

aan de Technische Universiteit Delft,

op gezag van de Rector Magnificus Prof. ir. K.C.A.M. Luyben,

voorzitter van het College voor Promoties,

in het openbaar te verdedigen op

woensdag 14 mei 2014 om 15:00 uur

door

Patricio Ignacio TORRES TAPIA

Master of Science in Electrical Engineering, University of Chile

geboren te Santiago, Chile.

(4)

Copromotor:

Dr. ir. Jan-Willem van Wingerden

Samenstelling promotiecommisie:

Rector Magnificus, voorzitter

Prof. dr. ir. M. Verhaegen, Technische Universiteit Delft, promotor Dr. ir. Jan-Willem van Wingerden, Technische Universiteit Delft, copromotor Prof. dr. ir. B. De Schutter, Technische Universiteit Delft

Prof. dr. H. Werner, Technische Universiteit Hamburg-Harburg Prof. dr. Marcos E. Orchard, Universiteit van Chili

Dr. ir. M. Van Gijzen, Technische Universiteit Delft Dr. ir. K.J. Keesman, Universiteit Wageningen

Prof. dr. ir. H. J. Hellendoorn, Technische Universiteit Delft, reservelid

This dissertation has been completed in partial fulfillment of the requirements for the graduate study of the Dutch Institute of Systems and Control (DISC).

ISBN: 978-94-6203-588-1

Copyright c 2014 by P.I. Torres Tapia.

All rights reserved. No part of the material protected by this copyright notice may be re-produced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without writ-ten permission from the copyright owner.

(5)

(6)

(7)

Acknowledgments

F

irst of all, I would like to thank my family for their constant love and support. Without them, the completion of this thesis would have been impossible. In second place, I would like to thank Michel Verhaegen and Jan Willem van Winger-den for giving me the oportunity to come to the Netherlands to pursue my Ph.D and for the gift of this incredible experience. Thanks to Michel Verhaegen for the freedom he gave me to do research and for always challenging me to work harder. Also I want to thank Jan-Willem van Wingerden for the support I received during the last four years of Ph.D and for trusting on my capacity.

An special mention deserves the secretaries of the group for their efficiency, professionalism and excellence. Thanks to Esther for preparing the documents for the renewal of my schoolarship every year. Thanks to Saskia for providing the DISC certificates together with my grades and for always talking to me as a friend. Thank you Kitty for being like a rose, for your extraordinary character only comparable with your infinite patience and elegance. Thank you for finding a place for me in difficult times. This is something that I will not forget. Thanks to Will van Geest for helping me with the MATLAB Licence of my PC at the office.

Thank you Gijs for explaining to me the Predictor Based Subspace Identifi-cation method. Thanks to Edwin, Pieter and Sachin for reading and correcting my papers. My work was greately improved by your comments and suggestions. Thanks also to Pieter and Marco for helping me to move my stuff and to Pieter for providing the wind farm picture appearing in the introduction of this thesis. Thank you Noortje for oppening the doors of your place to help me and for lots of chocolate breaks with nice company and conversations. Thank you for the umbrellas and also for showing to me how soft and at the same time strong the character of a dutch girl can be. Thank you Max for translating my summary to dutch, for always talking to me in spanish and for encouraging me to go for some beers. Thank you Bart for translating my propositions.

I would like to thank Arturo, Katka, Yvonne, Minken and Joke, for helping me to cross the tunnel. Yvonne, I hope that you have found the peace that your heart deserves, if not, I am sure you will find it in the future. Thank you Jonas for having dinner with me, for some anti-anxiety walks and for your great sense of justice. Thank you Paula for your advice and help, I can sleep much better now.

I would like to thank very specially to the wonderful people from Iran: Yashar, Mohammad Hajiahmadi, Sadegh, Farzaneh, Samira, Amir, Sazan, Ali the

(8)

pher, Mehdi, Mohammad Shahbazi and Esmaeil. I had never met people like you with such a character of a fighter and at the same time so kind and warm. Thank you Yashar for oppening the doors of your house to me, for always being there to help me in any time, for your kindness and for being my friend. Thank you Mo-hammad Hajiahmadi for helping me with my paper work and any other thing I need it and of course for playing table football with me. Thank you to Sadegh and Farzaneh for comming to my place for dinner, for playing UNO and for always making a great fun and a mess of it. Thanks to Amir, Sazan and Ali for showing to me a little bit more of the Iranian culture. Thank you Samira for always smiling to me.

Thank you Shuai for teaching me chineese and for being a quiet girl and a nice officemate during two years of my Ph.D. Additionally, I would like to thank very specially to the following people: Le Li, Yue, Yu Hu, Renshi, Yihui, Jia, Subu, Zul, Dieky and Dang for nice talks, for always willing to help me, for being generous to the top and for teaching me again the value of the simplicity and the humility, the capacity to understand that we are not the owners of the whole truth and that we are all together in the path of life. Thanks also to Laura and Jacopo for nice occasional conversations.

Thanks also to Aleksandar for long philosophical discussions and for always making jokes about me. Also, I want to thank Hans Yoo, Arne, Kerry, Blythe and Simone for sharing nice moments together and for being great people. I would like to thank you Anna for having lunch with me, for always being comprehensive and pacient and for your lovely soul, you are a great girl!.

I would also like to thank Mohammad, Renshi, Amir, Marco, Sachin, Hans and Ilya for playing table football with me and for making me scream with laughter. Thank you Tijmen and Michel for praying with me and thanks to Elisabeth and Ruxandra for being nice and funny girls.

I would like to thank Andrea and Helene for teaching me to dance Rock & Roll together with a couple of things about life and specially to you Helene for being that passionate about Rock & Roll and for always trying to improve the class and the teaching. I would also like to thank Caroline Briele, Caroline Bruens, Fien, Brett, Jeroen and Jules for being great with me and also great dancers. I would also like to thank Jan and Vivian for teaching me some salsa and also Eagle, Fenna and Becky for teaching me to dance Swing!.

Finally, I would like to thank all the people that crossed my way during these years that I could not met. To all of those that intended to hurt me or love me in some way and that contributed a little bit to this amazing experience. I hope that all of you have a fruitful life. I wish you all the best.

Delft, May 2014 Patricio Ignacio Torres Tapia

(9)

1 C

Introduction

T

his thesis deals with the linear state-space identification of intercon-nected systems in the discrete-time domain. At first, a linear com-plexity Output-Error identification method for large 1-D heterogeneous interconnected systems is proposed. Then, two different Subspace Identi-fication methods capable to reconstruct the topology of Directed Acyclic Graphs are presented. In this first chapter, an introduction to intercon-nected systems, the main challenges in the identification of networks and the goal of the thesis are stablished.

1.1 Introduction to interconnected systems

In industry and everyday life, there exists a large number of processes and phe-nomena that can be modeled as a group of interconnected systems interacting with each other. One simple example can be found on the highway when we drive our car. In this case, every vehicle can be considered as a subsystem that can interact with other vehicles as they travel in line towards a given direction. In particular, we can be interested in controlling the distance between the cars, for instance, for safety reasons, as we minimize the fuel consumption. In addition, if we consider that every car can measure its own position (for example by using a Global Positioning System (GPS)) and the distance with respect to the car in-mediately in front, then the whole vehicle platoon can be modelled as a large one dimensional (1-D) string of interconnected systems interacting as shown in Fig.

1.1,Horowitz and Varaiya(2000),Rice(2010). If we also assume that the number of cars N in the highway is large (N >> 1000) then the number of operations (and most importantly the amount of memory) that we need for centralized control design and identification become computationally intractable,Rice(2010).

Another class of systems that can be represented by a large string (or mul-tidimensional grid) of interconnected components are those governed by Partial Differential Equations (PDEs). An example is presented inScholte and D’Andrea

(14)

Σ2

Σ1 · · · ΣN

· · ·

Figure 1.1:Vehicle platoon seen as a 1-D string of interconnected systems (creative commons license,http://sweetclipart.com/red-ferrari-car-645).

Gene Gene Gene Gene Protein Protein Protein

Figure 1.2:Gene regulatory network (creative commons license,

http://upload.wikimedia.org/wikipedia/commons/4/4e/Helix ribbon.pngandhttp://upload.wikimedia.org/wikipedia/commons/ 7/71/Sodiumoxide-3D-polyhedra.png?uselang=es).

(2003), where the equations of motion of a structural beam are discretized by us-ing the finite difference method resultus-ing in a strus-ing of interconnected state-space models and also in input-output form,Mukhtar et al.(2010). Moreover, for heat problems, finite difference, finite volume and finite element discretizations will result in the same model structure (seeVersteeg and Malalasekera(2007)). After the discretization procedure is completed and depending on the required preci-sion of the solution, the number of generated subsystems N is of the order of N >>1000. Furthermore, the parameter estimation in this case can be very dif-ficult if not impossible for a real scenario, Sanderse(2009). Other applications for which similar model structures can be obtained are large telescope mirrors,

Jiang et al. (2006),Fraanje et al.(2010), image processing,Roesser(1975), irriga-tion networks, Cantoni et al.(2007), turbulent wavefront reconstruction, Fraanje et al. (2010), satellite formations (case of homogeneous interconnected systems and circulant structures),Massioni(2010),Massioni and Verhaegen(2008), litog-raphy and optical systems,Haber(2013).

(15)

2 4 5 1 3 6 1 2 3 4 5 6 mean free stream

flow direction distance [m] d is ta n ce [m ] Wind speed [m/s] 0 500 1000 1500 2000 2500 3000 2 3 4 5 6 7 8 9 10 11 0 500 1000 1500 2000 2500 3000 a) b)

Figure 1.3:Wind farm and aerodynamical interaction. a) DAG representing wind interaction, b) Computational Fluid Dynamic Simulation of a wind farm of 6 turbines,Fleming et al.(2013).

interconnection structure are biological networks. In this case, the structure of the graph is not limited to be a 1-D string of interconnected systems but a network with more complex interconnection topology. In biological regulatory networks, the genes, which are the segments of the DioxidriboNucleic Acid (DNA), interact with each other through a common medium called mRNA (messenger RiboNu-cleic Acid) and by using other substances in the cell,Ward et al.(2009). This inter-action generates concentrations of proteins that interact with other genes giving rise to proteins with new information. These proteins change the properties of the cell and produce a determined characteristic or phenotype in an organism (for instance, the color of the eyes of a person). Then, it is possible to model this inter-actions as a graph where every node represents a gene that can act as an inhibitor or a stimulator. The edges of the graph represent the concentrations of proteins as depicted in Fig.1.2(see alsoZecevic and Siljak(2010)). One of the main objectives of Systems Biology is to understand the structure of these networks given some concentration measurements, Ward et al.(2009). By doing this, it is possible to understand better the causes of many diseases and look for possible cures,Ward et al.(2009). Therefore, the reconstruction of the topology of this kind of networks by having input-output data is crucial to comprehend better the mechanisms that generate chronic diseases as Cancer. In many cases, the structure of these net-works can be modelled by using Directed Acyclic Graphs (DAGs),Spirtes et al.

(2000),Husmeier et al.(2005).

A Directed Acyclic Graph (DAG) can be defined as a type of network that does not contain directed cycles,Jensen and Gutin(2009). They have been larg-erly applied in different fields such as computer science for instruction scheduling in low-level program optimization, Smotherman et al.(1991), efficient compiler design, Huelsbergen (1997), dataflow programming languages, Johnston et al.

(16)

(1997), price discovery, Haigh and Blessler (1997), epidemiology, Tsai and Ca-margo(2009),Vanderweele and Robins(2009) and so on.

Another interesting area of application for DAGs is in the modeling of wind farms,Gebraad et al.(2013). In a wind farm, the turbines influence each other due to wake interaction,Sanderse(2009). A wake is a turbulent trail of reduced wind speed that propagates behind the rotor. In the case of several wind mills placed to-gether in a certain area, the turbines that receive the incoming wind field generate wakes that affect the turbines that are located behind (see Fig.1.3). Consequently, the power generated by these turbines is lower, as the kinetic energy of the wind has been reduced,Bianchi et al. (2007). Taking this into account, the idea is to control the orientation of the turbines in the wind farm (yaw) such that the max-imum total power is produced. In particular, we are interested in modeling the interaction between the turbines due to the wakes and include this information in the controller design,Torres et al.(2011). This is a task that presents formidable difficulties as the nature of the wind is highly uncertain and almost impossible to predict.

1.2 Challenges, scope and goals of the thesis.

As shown in the previous section, the identification of networks of dynamical sys-tems is a challenging problem that involves highly relevant applications in several fields of science. Some of the challenges that we need to face according to the ex-position made before are:

Challenge 1: “The network can contain a large number of subsystems interacting with each other”.

As shown before, the number of subsystems interacting in the network can be ex-tremely large (N >> 1000). This poses severe restrictions on the amount of mem-ory and number of computations in which the identification method should pro-vide a solution. In the case of the conventional Prediction Error Methods (PEM),

Ljung(1987), the number of flops needed to compute the estimates increases as O(N3_{) and the amount of memory as}

O(N2_{), thus they are not adequate for}

iden-tification of large networks. Thus, the idea is to develop ideniden-tification methods that compute the solution in linear complexity (_{O(N)) or less.}

Challenge 2:“The topology of the network can be unknown”.

There exist many applications for which the structure of the network need to be revealed by using a limited amount of data and without acknowledge about the interconnection structure, (e.g, biological networks and wind farms). Moreover, the interconection variables can not be measured.

(17)

Challenge 3: “The model should be suitable and flexible for control design”.

In this case, we consider models in state-space representation. This choice allows more flexibility in the design of controllers for subsystems with Multiple Inputs and Multiple Outputs (MIMO), provides some information about a realization of the state and permits to work with different initial conditions (this compared with the classical approaches based on transfer functions proposed in the 50’s). Moreover, there exist several control methods that have the ability to cope with networks of dynamical systems and that are based on state-space models, e.g,

Langbort et al.(2004),D’Andrea and Dullerud(2003),Rice(2010),Rice and Ver-haegen(2009),Rice and Verhaegen(2008),Massioni and Verhaegen(2009), Mas-sioni(2010) andMassioni and Verhaegen(2008).

Therefore, new identification methods that can cope with large networks of in-terconnected systems, (i.e., with number of operations and memory consumption increasing in linear computational complexity with respect to N ) and that can reconstruct the topology of the graph by using input-output data are required. These methods should provide models that are flexible for analysis and control design. In this context, the main goal of this thesis can be stated as follows:

Goal: “Develop identification methods for networks of dynamical systems that can reconstruct the interconnection structure (topology), the dynamics of every subsystem and that are computationally efficient. The resulting models should be in linear state-space representation”.

1.3 Contributions of this thesis

According to the goal stablished in the previous section, the main contributions of this thesis can be classified in two branches:

• First, an efficient Output-Error identification method for 1-D large scale het-erogeneous interconnected systems is proposed. Given a string of N differ-ent subsystems in state-space represdiffer-entation, the method iddiffer-entifies the local system matrices in linear complexityO(N) with respect to the number of subsystems N . This is done by exploiting the Sequentially Semi-Separable (SSS) structure of the matrices involved. In the example given in Fig. (1.1), this means that by increasing the number of cars N in the highway, the num-ber of operations (flops) used to find the model of every vehicle grows lin-early with respect to the number of cars. In addition, the amount of memory used in the computations also increases linearly. Therefore, the method pro-posed in this thesis is ‘scalable’, i.e., it can be used for a large number of interconnected systems, typically, N >> 100.

(18)

• Second, the classical PO-MOESP (Past Output Multivariable Output-Error State-sPace) subspace identification method is extended for parameter es-timation and topology-reconstruction of Directed Acyclic Graphs (DAGs). It is assumed that the structure of the interconnections is completely un-known. In particular, two different PO-MOESP algorithms are introduced. The first one is able to reconstruct the topology of a special class of network that we call ‘Almost-Complete DAG’. Even though the method is able to reconstruct the topology of this particular type of graph, we show that any DAG can be modeled by this class. On the other side, the second method is devoted to the reconstruction of DAGs of multiple levels or hierarchies. This class enclosses directed trees (polytrees), directed caterpillars, some types of bipartite graphs and lattices among others. In this case, exact topology-reconstruction is achieved by the proposed method.

Next, the contributions of this thesis are ordered by chapter and associated to a submitted or accepted publication:

Chapter2: Semi-Separable Output-Error Identification of Large 1-D Hetero-geneous Interconnected Systems

• We propose an efficient output-error identification method for large strings of heterogeneous interconnected systems. The method exploits the SSS arithmetic and therefore, all the computations can be performed in linear complexity_{O(N) with respect to the number of subsystems N.}

• Both the Jaccobian and the Hessian are enterely expressed in terms of SSS matrices. New formulas are provided for the tensorial derivatives involved in the computations.

• A comparative study in terms of convergence and computational complex-ity between the new SSS Gauss-Newton and SSS Steepest-Descent method is provided.

The results contained in this chapter have been submitted or published in:

Torres, P., J.W. van Wingerden, and M. Verhaegen. (2013). Output-error identification of large scale 1-D spatially varying interconnected systems.

accepted in IEEE Transactions on Automatic Control.

van Wingerden, J.W. and P. Torres (2012). Identification of spatially interconnected systems using a sequentially semi-separable gradient method. In Proccedings of the IFAC Symphosium of System Identification

(IFAC SysID), Brussels, Belgium, 16(1), 173-178.

Chapter3: Subspace Identification of Directed Acyclic Graphs

(19)

• We provide persistency of excitation conditions for the reconstruction of the observability matrices.

• A topology-reconstruction algorithm for a class of directed acyclic graphs is proposed. A characterization of this kind of graphs and persistency of exci-tation conditions for topology-reconstruction are also provided. It is shown that any DAG can be modeled by this class with a good level of accuracy. The result contained in this chapter have been submitted or published in:

Torres, P., J.W. van Wingerden, and M. Verhaegen. (2013). PO-MOESP subspace identification for a class of directed acyclic graphs. submitted to

Automatica (second review).

Chapter4: Subspace Identification of Multilevel Directed Acyclic Graphs

• A hierarchical PO-MOESP subspace identification method for DAGs is pro-posed.

• A topology-reconstruction algorithm for DAGs of multiple levels or hierar-chies is introduced. Exact topology-reconstruction is achieved.

• We point out the existence of graphs that can be modeled with good level of precision by a Multilevel DAG of reduced number of interconnections (linkage reduction).

The result contained in this chapter have been submitted or published in:

Torres, P., J.W. van Wingerden, and M. Verhaegen. (2013). Hierarchical PO-MOESP subspace identification for directed acyclic graphs. accepted

in International Journal of Control.

1.4 Organization of this thesis

This thesis is divided in two parts, one dedicated to the exposition of Output-Error methods and the second one for Subspace-Identification. The first part is devoted to the development of an efficient Output-Error identification technique for heterogeneous interconnected systems in line. All the computations of the identification routine are performed in linear complexity with respect to the num-ber of subsystems by using SSS matrix operations. This material is contained in Chapter2.

In the second part, PO-MOESP subspace identification methods for DAGs are introduced. Two different algorithms are developed. The first one identifies the

(20)

topology of Almost Complete DAGs by using dedicated projection matrices. It is demonstrated that any DAG can be approximated by this class with a good level of accuracy. This algorithm is presented in Chapter3. The second method is devoted to the identification of DAGs of multiple levels or hierarchies. Exact topology-reconstruction is achieved in this case. The main theoretical results to-gether with numerical examples are presented in Chapter4.

• Part I: Output-Error identification methods

– Chapter2. “Semi-Separable Output-Error Identification of Large 1-D Heterogeneous Interconnected Systems”. At first, some background on existing identification methods for 1-D heterogeneous interconnected systems is given. Then, the derivation of the Steepest-Descent and Gauss-Newton methods in terms of SSS matrix operations is presented. Finally, some numerical examples are provided in order to show the effectiveness of the proposed approach. See alsovan Wingerden and Torres(2012),Torres et al.(2013).

• Part II: Subspace identification methods

– Chapter3. “Subspace Identification of Directed Acyclic Graphs”. An overview on existing identification methods for topology-reconstruction and estimation in graphs is provided. Then, persistency of excitation conditions for topology-reconstruction and the PO-MOESP subspace identification algorithm for DAGs are derived. Dedicated projection matrices typically used in subspace identification techniques combined with a past instrumental variable approach for variance reducion are exploited. The main results and contributions of this chapter have been presented inTorres et al.(2013).

– Chapter4. “Subspace Identification of Multilevel Directed Acyclic Graphs”. After some background on identification methods for Multilevel DAG structures is provided, the main theorems for hierarchical reconstruc-tion are presented. A hierarchical PO-MOESP method for DAGs is also exposed. Refer toTorres et al.(2013).

Finally, we draw the conclusions of the thesis and provide some recommendations towards improvements and future research in Chapter5.

(21)

O rg a n iz a ti o n o f th is th e si s 9

Table 1.1:Contributions of the thesis.

Output Error methods Subspace Identification methods

Topology Σ2 Σ1 Σ3 Σ1 Σ2 Σ3 Σ2 Σ1 Σ3 Σ5 Σ4 Σ6

a) 1-D String. b) Almost-Complete DAG. c) Multilevel DAG. Structure Not (structure assumed known). Yes. Yes.

Recovery.

Precision 100% for Strings. 100% for any DAG. 100% for Multilevel DAG

(VAF). + DAG Subset.

Memory. O(N) O(N2₎

O(N2₎

Location in Chapter2. Chapter3. Chapter4. the thesis.

(22)

(23)

2 C

Semi-Separable Output-Error

Identification of Large 1-D

Heterogeneous Interconnected

Systems

I

n this chapter, a new identification method for large heterogeneous spa-tially interconnected systems is presented. A string of different sys-tems in state-space representation is considered. The proposed algorithm optimizes the Output-Error of the global system by using the Steepest-Descent and the Gauss-Newton methods. The main contribution of this chapter is to show that both the Jacobian and Hessian matrices can be entirely expressed in terms of Sequentially Semi-Separable (SSS) matri-ces. Therefore, all the computations in the optimization routine can be performed with complexity that is linear in the number of subsystems. This fact permits to obtain models for large interconnected systems at low computational cost. Finally, a numerical example is presented in order to show the effectiveness of the proposed algorithm.

2.1 Introduction

From a theoretical point of view, the identification of spatially interconnected sys-tems has attracted considerable attention in the last years. This interest comes from the necessity to provide models for large scale systems suitable for control implementations, D’Andrea and Dullerud (2003);Rice(2010);Rice and Verhae-gen(2009,2008). One approach available in the literature consist of discretizing the Partial Differential Equations (PDE’s) associated to the process. An example of this procedure is presented in Scholte and D’Andrea(2003), where the equa-tions of motion of a structural beam are discretized by using the finite difference

(24)

method resulting in an interconnected state-space model. Also for heat problems, finite difference, finite volume and finite element discretizations will result in the aforementioned model structure (see Versteeg and Malalasekera(2007)). Other applications for which this method was used are large telescope mirror model-ing, Jiang et al. (2006), vehicle platooning,Horowitz and Varaiya(2000), image processing,Roesser(1975) and irrigation networks,Cantoni et al.(2007).

Even though these models are obtained from first-principles, there is still a large group of systems whose interconnected models can not be easily obtained by hand, e.g. wind farms,Sanderse(2009). Moreover, the coefficients of the PDE’s are uncertain parameters that can hardly be estimated and/or validated in a realistic scenario.

In addition to this, the identification of large interconnected systems is com-putationally very expensive. The system is usually approximated by a large (and finite) number of interconnected state space equations. If the number of inter-connected systems is N and the order of every state is n, then the system matrix describing the global input-state-output behavior is nN × nN. Thus, most of the matrix operations will require n3_N3_{computations for the case of dense matrices.}

This makes classical identification techniques prohibitively expensive,Verhaegen and Verdult(2007).

In order to cope with this issue, just a small number of algorithms have been proposed in the last years. InFraanje and Verhaegen(2005), the spatial canonical approach for identification of interconnected systems was introduced. The main step of the procedure consists in decoupling the state-space equation for every subsystem by choosing the interconnected state equal to the delayed version of the inputs and outputs. On the other hand,Haber and Verhaegen(2011) propose an identification method for interconnected systems that exploits the spatial de-caying (SD) structure of the Markov parameters. Under some rank assumptions, the states of every subsystem can be recovered as a linear combination of the in-puts and outin-puts. Even though the complexity of the algorithms mentioned above is_O(n3_{N), they have the disadvantage that they start from particular cases of the}

state-space parameterization presented inD’Andrea and Dullerud(2003) so they lose generality. In particular, they do not allow linear combinations of the inputs to be included in the interconnection variables.

On the other hand, in Prediction-Error (PE) methods, an identification ap-proach for LPV (Linear Parameter Varying) spatially interconnected systems in input-output form based on Least Squares (LS) estimation is presented inMukhtar et al. (2010). This approach has also been extended to Box-Jenkins models in

Mukhtar et al.(2011) by using a recursive Instrumental Variable (IV) technique in order to reduce the variance of the estimates. Besides, controller synthesis pro-cedures have also been developed inSaulat and Werner(2008). The disadvantage of the first two identification methods is that both rely on SD asumptions. In par-ticular, nothing is said in these papers about how to choose the size of the mask that later defines the interaction between subsystems and what is the influence of this decision on the final performance.

(25)

_Σ

s

Σ

s−1

Σ

s+1 u(s−1)k w(s+1)k e(s−1)k w(s)k e(s)k u(s)k yk(s) y (s+1) k

Figure 2.1:Interconnected string of systems.

Chandrasekaran et al.(2003) and its application to the control of interconnected systems inRice and Verhaegen(2009), the gap between large scale systems and computational efficiency was reduced. As a natural consequence of the state-space parameterization introduced inD’Andrea and Dullerud(2003), the matrices of the lifted (global) system exhibit SSS structure. Then, by using the fact that the SSS matrix operations are structure preserving and admit distributed implemen-tations, i.e., +, _{×, ()}−1 _{can be performed in}

O(1 · N) (for the sum) and O(n3_N)

complexity (for the rest), most of the techniques used in centralized approaches can be translated in a distributed fashion. Furthermore, by using iterative meth-ods, it is possible to check matrix stability, calculate solutions to the Lyapunov and Riccati equations, synthesize H∞controllers, compute QR decompositions, all in

O(n3_N_{) complexity,}_{Rice and Verhaegen}₍₂₀₀₉_,₂₀₀₈_,₂₀₁₀_{). In the field of system}

identification, the SSS approach has been successfully applied as well. InRice and Verhaegen(2011), an efficient extended Kalman filter for parameter identification over interconnected systems has been proposed. The method uses the Sequen-tially Semi-Separable (SSS) structure of the partial derivatives in order to estimate the parameters and the states withO(n3_{N) complexity,}_Rice₍₂₀₁₀_{). Even though}

this method uses first order derivatives to perform the estimation, nothing is said about the use of second order methods, in particular, the computation of Hessians is not investigated and if they preserve the SSS structure or not was left as an open question.

In this chapter, we propose an efficient Output-Error identification method for heterogeneous interconnected systems. The method exploits SSS matrix opera-tions to implement both the Steepest-Descent and Gauss-Newton optimization routines in linear complexity with respect to the number of subsystems. In par-ticular, the computation of the Hessian is thouroghly investigated and later on completely expressed in terms of SSS matrices. In addition, the projected descent direction in the Gauss-Newton approach can also be computed by using SSS ma-trices. Unlike the work presented in Mukhtar et al. (2010) andMukhtar et al.

(2011), our method does not require the selection of spatial decaying bands and it assumes the general form of the parameterization exposed in D’Andrea and Dullerud(2003).

The main contribution of this chapter is to show that both the Jacobian and Hessian matrices can be fully embedded in the Sequentially Semi-Separable (SSS) arithmetic. Therefore, all the computations in the optimization routine can be

(26)

per-formed in linear complexity (at mostO(n3_N_{)). All the computations associated to}

the Hessian are new and significant for both the Identification and the Numerical Analysis communities.

The outline of this chapter is as follows: in Sections2.2and2.3, the basic no-tation, the problem formulation, the basics regarding the SSS approach and the statement of the identification problem are presented. The complete algorithm is exposed in Section2.4while in Section2.5we provide an example in order to show the potential of the proposed approach. Finally, we draw our conclusions in Section2.6.

2.2 Notation and preliminaries

In this chapter, the string of N ∈ N interconnected systems with forward and backward interconnection shown in Fig. 2.1is considered. If s ∈ {1, · · · , N} is the variable indexing every subsystem and k ∈ {1, · · · , K} denotes the discrete time within the interval [1, K], then x(s)_k ∈ Rn×1 _{represents the state vector for}

every subsystem, u(s)_k ∈ Rr×1_{the input vector, y}(s) k ∈ R

l×1 _{the output vector and}

e(s)_k _{∈ R}q×1_{, w}(s)

k ∈ Rq×1the interconnection variables. For the sake of simplicity,

we assume that all the systems have the same number of states, inputs, outputs and interconnection variables. In spite of this, all the derivations provided in this chapter can easily be extended to signals of different dimensions, thus, there is no loss of generality in this assumption. In addition, we define xk ∈ RN n×1the global

state vector that contains all the local states stacked together. Similar definitions hold for uk ∈ RN r×1and yk ∈ RN l×1. In addtion, ηk ∈ RN l×1 is the vector that

contains the white noise sequences contaminating the outputs. The 2-norm of a given vector will be denoted by|| · ||2and _∂θ∂ will be the column vector of partial

derivatives with respect to the elements of θ ∈ Rp_{. Consequently,} ∂ ∂θT = (

∂ ∂θ)

T

represents the row vector of derivatives with respect to the elements of θ. On the other hand, we define the transpose derivative of a matrix A(θ) with respect to the vector θ as follows: ∂A(θ) ∂θT = h ∂A(θ) ∂θ(1) ∂A(θ) ∂θ(2) · · · ∂A(θ) ∂θ(p) i , (2.1)

where every element of A(θ) depends on the components of θ. In this chapter, the symbol⊗ represents the Kronecker product between matrices. Let JK(θ) be

a scalar functional depending on the vector θ (in the context of this chapter, an objective function that needs to be minimized). Then, J_K′ (θ) = ∂JK(θ)

∂θ is the

Jaco-bian associated to JK(θ), (i.e., the vector of partial derivatives as explained above)

and J_K′′(θ) = ∂2JK(θ)

∂θ∂θT is the Hessian associated to JK(θ) (the matrix containing the

second partial derivatives of JK(θ)). Finally, the symbolO(·) will be used to refer

to the computational complexity of a given algorithm, i.e., the number of opera-tions (or flops) needed to compute the solution. Formally,O(·) is a set of functions defined in the following way:

(27)

A=     A1 B1,wC2,w B1,wW2,wC3,w B1,wW2,wW3,wC4,w B2,eC1,e A2 B2,wC3,w B2,wW3,wC4,w

B3,eW2,eC1,e B3,eC2,e A3 B3,wC4,w

B4,eW3,eW2,eC1,e B4,eW3,eC2,e B4,eC3,e A4

   .

Figure 2.2:Sequentially Semi-Separable system matrix for N = 4.

Definition 1 We say that f (N )_{∈ O(N}α_{), if there exist finite positive constants, 0 <}

κ <_{∞ and 0 < c < ∞ such that f(N) < cN}α,_{∀N > κ,}Rice and Verhaegen(2009).

In this framework, the function f (N ) represents the number of operations needed for an algorithm to perform a desired task. Then, we will say that an ‘algorithm’ is ‘O(Nα_{)’ if the number of operations used by that algorithm f (N )}_∈

O(Nα_{). For instance, if we add two matrices of size N}_{× N element by element,}

the number of operations that we need is N2, so the sum of matrices performed in this way isO(N2_).

2.3 Problem Statement

In this chapter, the most general type of models describing the dynamics of every subsystem is considered: Σs:      x(s)_k+1 y(s)_k e(s)_k w_k(s)      =     ˆ As Bˆs Bˆs,e Bˆs,w ˆ Cs Dˆs Fˆs,e Fˆs,w ˆ

Cs,e Dˆs,e Wˆs,e Zˆs,e ˆ Cs,w Dˆs,w Zˆs,w Wˆs,w          x(s)_k u(s)_k e(s−1)_k w_k(s+1)      , (2.6)

where ˆWs,e and ˆWs,w are the matrices that represent the feedthrough from the

neighboring subsystems Σs−1 and Σs+1. In this case, every subsystem Σs is

al-lowed to be different of each other. Therefore, a string of heterogeneous subsys-tems is considered. For the sake of simplicity, we focus on the 1D case, but the equation (2.6) can also be extended to the 2D or 3D case. Similar models were introduced in the context of 2D image processing inRoesser(1975) and combina-tional circuits inGivone and Roesser(1972). Later, new models appeared in some works on control of spatially interconnected systems, D’Andrea and Dullerud

(2003) and finally they were used to develop a unified theory for efficient control design and analysis of large interconnected systems by using Sequentially Semi-Separable (SSS) matrices,Rice(2010),Rice and Verhaegen(2008),Rice and Verhae-gen(2009) andRice and Verhaegen(2010). In this chapter, the model (2.6) is used to develop an efficient identification algorithm for large interconnected systems whose final purpose is to provide suitable models for the control synthesis al-gorithms provided inRice and Verhaegen(2009,2010). The same as in previous works, it is assumed that the interconnection variables e(s−1)_k and w_k(s+1)are ideal, so no delays are present. An analysis of the effects of delays in these interconnec-tions and how to estimate them are out of the scope of this thesis. Besides, we

(28)

suppose that the following boundary conditions are satisfied: e(0)_k = 0, w_k(1) = 0, e(N )_k = 0 and w(N +1)_k = 0. This says that the first system is not interconnected with a system before and that the last system is not connected with any system after.

Due to the non-zero matrices ˆZs,e and ˆZs,w in (2.6), the interconnection

be-tween the subsystems in Fig.2.1might be ill posed. A discussion about the condi-tions that the matrices in (2.6) need to fulfill in order to satisfy well-posedness can be found inRice and Verhaegen(2009),D’Andrea and Dullerud(2003). Therefore, if we assume that the interconnection of the N subsystems in Fig.2.1is well-posed, the model (2.6) for the subsystem Σs,∀s ∈ {1, · · · , N} can be transformed into the

Spatially Strictly-Proper form,Rice and Verhaegen(2009):

Σs:      x(s)_k+1 y_k(s) e(s)_k w(s)_k     =     As Bs Bs,e Bs,w Cs Ds Fs,e Fs,w

Cs,e Ds,e Ws,e 0

Cs,w Ds,w 0 Ws,w          x(s)_k u(s)_k e(s−1)_k w_k(s+1)     . (2.7)

Thus, let us consider N heterogeneous subsystems (s = 1, 2, . . . , N ) in Spatially

Strictly-Proper form interconnected as in Fig. 2.1and with the boundary condi-tions provided in Section2.2. Then, the global system equation can be obtained by substituting the interconnection variables into the neighboring models (this procedure is called ‘lifting’):

Σ : xk+1 y_k = A B C D xk uk , (2.8) where A _{∈ R}N n×N n_{, B} ∈ RN n×N r_{, C} ∈ RN l×N n _{and D} ∈ RN l×N r_{are matrices}

with a special structure called Sequentially Semi-Separable (SSS),Chandrasekaran et al.(2003), (an example for the case where N = 4 is depicted in Fig.2.2). These matrices have been previously studied in the context of LTV systems,Rice(2010) and for their own sake inChandrasekaran et al.(2003). They exhibit very attrac-tive numerical properties that can be exploited to perform efficient matrix calcula-tions,Rice(2010). In addition, each matrix can be parameterized by 7N generators as follows:

A= SSS(Bs,e, Ws,e, Cs,e, As, Bs,w, Ws,w, Cs,w), (2.9)

B = SSS(Bs,e, Ws,e, Ds,e, Bs, Bs,w, Ws,w, Ds,w), (2.10)

C= SSS(Fs,e, Ws,e, Cs,e, Cs, Fs,w, Ws,w, Cs,w), (2.11)

D= SSS(Fs,e, Ws,e, Ds,e, Ds, Fs,w, Ws,w, Ds,w). (2.12)

In this case, the matrices in the argument of the symbol SSS(_{·) are called} ‘gen-erators’. These matrices are the same ones used to parametrize every subsystem in equation (2.7). Therefore, if we assume that the size of the generators have the same order of magnitude i.e. n _{∼ r ∼ l ∼ q and that the number of subsystems} in the string N is much bigger than the size of the generators n << N , then the resulting SSS structure of A,B,C and D can be efficiently exploited to perform

(29)

calculations in linear complexityO(N) with respect to the number of subsystems N. In particular, the class of SSS matrices is closed under the basic operations of +, _{× and ()}−1_{and all the computations associated to them can be performed in}

O(1 · N) (for the +) and O(n3_N_{) complexity (for}

× and ()−1_),_Rice₍₂₀₁₀_{). Besides,}

matrix-vector multiplication, ()T_{, QR and LU factorizations can also be performed}

in_O(n2_N_{) (for the vector product),}

O(1·N), (for ()T_{) and}

O(n3_N_{), complexity (for}

QR and LU),Chandrasekaran et al.(2003). These facts will be used later to demon-strate that the number of operations needed for our algorithms is at mostO(n3_N)

by exploiting the structure of (2.9)-(2.12). On the other hand, the system (2.8) is assumed to be observable.

As is true for LTI systems,Verhaegen and Verdult(2007), the existence of sim-ilarity transformations of the state does not modify the input-output behavior of the system (2.8). This can be retrieved from the following lemma.

Lemma 2.1 Let ˜Ts, ˜Ts−1∈ R(n+r+2q)×(n+r+2q)be similarity transformations applied to

(2.7), with: ˜ Ts=     Ts 0 0 0 0 I 0 0 0 0 Ts,e 0 0 0 0 Ts,w    , (2.13) ˜ T_s−1=     T_s−1 0 0 0 0 I 0 0 0 0 T_s−1,e−1 0 0 0 0 T_s+1,w−1    . (2.14)

Then the pair {yk, uk} associated to (2.8) remains the same ∀ ˜Ts, ˜Ts−1 and ∀k ∈

{1, · · · , K}.

Proof: Apply the similarity transformations (2.13)-(2.14) in equation (2.7)∀s ∈ {1, · · · , N} and lift the system to obtain (2.8). Then the result is straightforward. 2 Thus, the identification problem can now be formulated as follows:

Theorem 2.1 Given the input-output sequences:

{u(s)k , y (s) k }

s={1,...,N } k={1,...,K},

estimate the system matrices{As}Ns=1,{Bs}Ns=1,{Bs,e}Ns=1,{Bs,w}Ns=1,{Cs}Ns=1,{Cs,e}Ns=1,

{Cs,w}Ns=1, {Ds}s=1N , {Ds,e}Ns=1, {Ds,w}Ns=1,{Fs,e}Ns=1,{Fs,w}Ns=1, {Ws,e}Ns=1, and

{Ws,w}Ns=1up to the set of similarity transformations given by (2.13)-(2.14).

Notice from Problem2.1that we do not have measurements of the intercon-nection variables_{e(s)_k _}s={1,...,N }_k={1,...,K}and_{w(s)_k _}s={1,...,N }_k={1,...,K}available. This means that we assume that e(s)_k and w(s)_k are not accessible. In addition, the initial conditions

(30)

of the state can also be included in the identification problem the same as done with the rest of the matrices, (seeVerhaegen and Verdult(2007)), without harming the computational complexity of the proposed algorithms.

On the other hand, the number of parameters that we need to estimate for the whole system is approximately∼ 14n2_N_{(assuming n << N and n}

∼ r ∼ l ∼ q). Compared to the full parameterization problem where the matrices in equation (2.8) contains 4n2_N2 _{entries (i.e. we count them element-wise), the amount of}

memory required in the new representation is clearly lower if N is large. In-deed, the amount of memory used to store a single matrix in SSS representation is _{∼ 7n}2_N _{and therefore it will increase linearly with respect to the number of}

subsystems N . This will allow efficient storage in our implementations, making it suitable for large scale problems.

2.4 SSS Output-Error approach

In this section, the identification problem introduced above is solved by minimiz-ing an Output-Error objective function. The main contribution of this chapter is to show that the Steepest-Descent and the Gauss-Newton methods used in the opti-mization process can be entirely implemented in linear complexity with respect to N, (in this case,O(n2_N_{) for Steepest-Descent and}_O(n3_N_{) for Gauss-Newton), by}

using efficient SSS matrix operations only. The memory usage is also optimized by using the SSS representation. First, we introduce the Output-Error optimiza-tion problem. Then, we present the Steepest-Descent and the Gauss-Newton al-gorithms and the corresponding SSS modifications.

2.4.1 Output-Error cost function

In order to solve Problem2.1, an Output-Error cost function is minimized, Verhae-gen and Verdult(2007). In the Output-Error approach, the output of the system (2.8) is assumed to be contaminated by a white noise vector sequence ηk, normally

distributed around 0 and with prescribed covariance matrix: Σ : xk+1 yk = A B C D xk uk + 0 ηk , (2.15) where ηkis independent of uk.

The goal is to determine the matrices: {As, Bs, Bs,e, Bs,w, Cs, Cs,e, Cs,w, Ds,

Ds,e, Ds,w, Fs,e, Fs,w, Ws,e, Ws,w}Ns=1 that parametrize A, B, C and D such that

the output of the model:

_ˆ xk+1 ˆ y_k = A B C D _ˆ xk uk (2.16)

(31)

approximates ykwith the smallest quadratic error. This leads to the minimization

of the following unconstrained Output-Error objective function:

JK(θ) = 1 K K−1_X k=0 ||yk− ˆyk(θ)|| 2 2= 1 KE T K(θ)EK(θ), (2.17) where EK(θ) =      y₀_{− ˆy}₀(θ) y₁_{− ˆy}₁(θ) .. . y_K−1_{− ˆy}_K−1(θ)     =      ǫ0 θ ǫ1 θ .. . ǫK−1 θ     

and θ is the vector that contains all the parameters of the generators stacked to-gether: θ =θT 1 . . . θTN T , with: θs=                         vec (As) vec (Bs) vec (Bs,e) vec (Bs,w) vec (Cs) vec (Cs,e) vec (Cs,w) vec (Ds) vec (Ds,e) vec (Ds,w) vec (Fs,e) vec (Fs,w) vec (Ws,e) vec (Ws,w)                         .

Therefore, Problem2.1can be re-formulated as the following unconstrained optimization problem: Problem Description 2.1 min θ JK(θ) = min θ 1 K K−1_X k=0 ||yk− ˆyk(θ)||22, (2.18) s.t. θ∈ Rp_,

where p is the number of parameters equal to N (n2_{+ 2q}2_{+ 4nq + 2lq + 2qr + nr +}

ln+ lr). Notice that Problem2.1presents several difficulties and challenges. First of all, the parametrization introduced by the SSS representation of the matrices A,B,C and D makes the optimization problem not convex. This means that we need to tackle the solution by using numerical algorithms,Luenberger(2003). In this case, we concentrate our efforts on the popular Steepest-Descent and Gauss-Newton methods. In particular, Gauss-Gauss-Newton offers excellent properties of

(32)

con-vergency (quadratic, Luenberger(2003)). However, due to the non-convexity of Problem2.1these methods can get stuck in local minima, therefore the selection of a good initial guess is fundamental to find the global optimum. Secondly, due to the existence of the similarity transformations (2.13)-(2.14) there exists a mani-fold_{M(θ) in the space of parameters for which the value of the objective function} JK(θ) does not change,Verhaegen and Verdult(2007),Lee and Poolla(1997). This

is a direct consequence of Lemma2.1(the output of system (2.8) does not change under similarity transformations of the state, so there exists an infinite number for the parameters in the model that give the same output). As a result of this, during the optimization process some descent directions could fall in the region where the objective function does not change (so after that iteration the value of the objective function is still the same). This region corresponds to the hyperplane H(θ) tangent to the set M(θ) at the point θ, Verhaegen and Verdult(2007). In addition, there exists an infinite number of values of θ that are global minima of JK(θ). More details about the influence of the similarity transformations on the

state-space identification of LTI systems can be found inVerhaegen and Verdult

(2007) and for LPV systems inLee and Poolla(1997). In this chapter, a characteri-zation for the tangent hyperplane_{H(θ) in terms of SSS matrices will be provided} as well.

2.4.2 Steepest-Descent

The Steepest-Descent method changes the value of the parameters in every step into the direction of the largest decrease of the cost function (2.17). This direction is given by−J′

K(θ),Verhaegen and Verdult(2007). Hence, the Steepest-Descent

algorithm updates the parameters as follows: θi+1= θi− µJ

′

K, (2.19)

where µ_{∈ [0, 1] is the step size which can be determined by solving the following} scalar optimization problem:

Problem Description 2.2

µ= arg min

µ∈[0,1]JK(θi+1(µ)). (2.20)

In general, this is a difficult problem that, depending on the cost function, can have a non-explicit solution and consume most of the computational effort during every iteration,Luenberger(2003). In addition, the performance of the whole al-gorithm can be highly sensitive to the selection of the step size,Luenberger(2003). To cope with this issue, there exist several approximated methods, known as ’Line Search’ algorithms, that can tackle this problem with reasonable computational ef-fort and still find local minima in the non-convex case. See Goldstein and Armijo’s rules and the Golden section methods,Luenberger(2003). The implementation re-sults presented in this chapter uses an Armijo type method that chooses the step

(33)

size in such a manner that the value of the objective function in the previous iter-ation is not exceed.

Now, we concentrate our efforts on obtaining explicit expressions for comput-ing J_K′ = ∂JK(θ)

∂θ . We start with the following Lemma:

Lemma 2.2 Let JK(θ) = _K1EKT(θ)EK(θ) be the cost function defined in (2.17). Then,

the Jacobian J_K′ (θ) is given by:

J_K′ (θ) = 2 KΨ T K(θ)EK(θ), (2.21) where: ΨTK(θ) = ∂EK(θ) ∂θT =− h (∂ˆy0(θ) ∂θT ) T ₍∂ˆy1(θ) ∂θT ) T _{. . .} ₍∂ˆyK−1(θ) ∂θT ) TiT _(2.22) and ∂ˆy_k(θ) ∂θT =h∂ˆyk(θ) ∂θ(1) ∂ˆyk(θ) ∂θ(2) · · · ∂ˆyk(θ) ∂θ(p) i . (2.23)

Proof: The interested reader is referred toVerhaegen and Verdult(2007). 2

From Lemma2.2we realize that to compute J_K′ (θ) the following sum is needed:

ΨT K(θ)EK(θ) = K−1_X k=0 −∂ˆyk(θ) ∂θT T ǫk(θ). (2.24)

In particular, we need to calculatePK−1_k=0 ∂ˆyk(θ)

∂θ(i)

T

ǫk(θ),∀i ∈ {1, · · · , p}. To

this end, we have to compute the vectorial derivative of the output of the model: _ˆ xk+1(θ) ˆ y_k(θ) = A(θ) B(θ) C(θ) D(θ) _ˆ xk(θ) uk . (2.25) Then:

Xi,k+1(θ) = A(θ)Xi,k(θ) +

∂A(θ) ∂θ(i)xˆk(θ) + ∂B(θ) ∂θ(i)u(k), (2.26) ∂ˆy_k(θ) ∂θ(i) = C(θ)Xi,k(θ) + ∂C(θ) ∂θ(i)ˆxk(θ) + ∂D(θ) ∂θ(i)u(k), (2.27) ∀i ∈ {1, · · · , p}

(34)

where Xi,k(θ) =

∂ ˆxk(θ)

∂θ(i)

. Finally, the sum can be written as follows:

K−1_X k=0 ∂ˆy_k(θ) ∂θ(i) T ǫk(θ) = K−1_X k=0 Xi,k(θ)TC(θ)Tǫk(θ) + K−1_X k=0 ∂C(θ) ∂θ(i)ˆxk(θ) T ǫk(θ) + K−1_X k=0 ∂D(θ) ∂θ(i) u(k) T ǫk(θ). (2.28)

According to (2.28), we need to perform p + 1 simulations in order to calculate JK(θ): the p simulations from (2.26)-(2.27) and the model simulation (2.25) used to

obtain the state ˆxk(θ). However, the following Lemma provides a way to calculate

JK(θ) by just implementing two simulations.

Lemma 2.3 Let Xi,k(θ) be the adjoint state result of performing the following backward

simulation with final state Xi,K−1(θ) = 0:

Xk−1(θ) = A T

(θ)Xk(θ) + C T

(θ)ǫk(θ) (2.29)

and let Xi,k(θ) the state appearing in equation (2.26) with initial condition Xi,0(θ) = 0

and obtained from the following forward simulation:

Xk+1(θ) = A(θ)Xk(θ) + Wi,k(θ), (2.30)

with Wi,k(θ) =∂A(θ)_∂θ(i)ˆxk(θ) +∂B(θ)_∂θ(i)u(k). Then: K−1_X k=0 Xi,k(θ)TC(θ)Tǫk(θ) = K−1_X k=0 Wi,k(θ)TXi,k(θ). (2.31)

Proof: seeVerhaegen and Verdult(2007). 2

From Lemma2.3, the following Corollary can be obtained:

Corollary 2.1 Let JK(θ) be the objective function defined in (2.17). Then, the Jacobian

J_K′ (θ) of (2.17) can be calculated as follows:

J_K′ =₋2 K J₁′+ J2′ + J ′ 3+ J ′ 4, , (2.32) where: J₁′ = K−1_X k=0 ∂A(θ) ∂θT Ip⊗ ˆxk(θ) T Xk(θ), (2.33) J₂′ = K−1_X k=0 ∂B(θ) ∂θT Ip⊗ uk T Xk(θ), (2.34)

(35)

J₃′ = K−1_X k=0 ∂C(θ) ∂θT Ip⊗ ˆxk(θ) T ǫk(θ), (2.35) J4′ = K−1_X k=0 _∂D(θ) ∂θT Ip⊗ uk T ǫk(θ), (2.36)

Proof: Substitute equation (2.31) in (2.28). Then concatenate the equations (2.28) for i∈ {1, · · · , p} in a column and then use the transpose. 2 In this case, ˆxk(θ) and ǫk(θ) can be acquired from equation (2.25) and Xk(θ)

from equation (2.29). Therefore, it is possible to compute JK′ (θ) by only using two

simulations and the derivatives∂A(θ)

∂θT Ip⊗ ˆxk(θ) ,∂B(θ) ∂θT Ip⊗ uk ,∂C(θ) ∂θT Ip⊗ ˆxk(θ) and ∂D(θ) ∂θT Ip⊗ uk .

2.4.3 Gauss-Newton

In the Gauss-Newton approach, the direction of maximum decrease given by −JK′ (θ) is corrected by using the information of the curvature contained in the

Hessian J_K′′(θ). This improves greatly the speed of convergence of the algorithm compared with other first order methods like Steepest-Descent. In addition, JK′′(θ)

is required to be a positive definite matrix (hence invertible) so that the condition of descent direction is satisfied in every iteration, Luenberger(2003),Verhaegen and Verdult(2007). Thus, the update law takes the following form:

θj+1= θj− µ(J

′′

K(θ))−1JK(θ)

′

, (2.37)

where µ∈ [0, 1] is the chosen step size. In practice it is very difficult to compute J_K′′(θ) exactly which would produce high computational costs. Instead, approxi-mated versions of the Hessian can be used. In general these approximations are valid in a neighborhood of the optimum where the second derivative of the error and the error itself EK(θ) are weakly correlated, Verhaegen and Verdult (2007).

Then, the approximation of the Hessian can be computed using the following Lemma:

Lemma 2.4 Let JK(θ) be the objective function defined in (2.17). Then the Hessian of

(2.17) can be approximated as follows:

J_K′′(θ)_≈ 2 KΨ

T

K(θ)ΨK(θ). (2.38)

Proof: The derivation can be consulted inVerhaegen and Verdult(2007). 2 In the case that the initial guess is sufficiently near to the optimum, the Gauss-Newton method converges quadratically and much faster than the Steepest-Descent

(36)

approach (linearly). An analysis of convergence of these algorithms can be found inLuenberger(2003).

From the approximation of the Hessian obtained in (2.38), it is clear that the product ΨT

K(θ)ΨK(θ) has to be computed. This means that the simulation of the

adjoint state cannot be used as we did for the Jaccobian. Hence, we need to simu-late p systems of the form:

∂ ˆxk+1(θ) ∂θ(i) = A(θ) ∂ ˆxk(θ) ∂θ(i) + ∂A(θ) ∂θ(i)xˆk(θ) + ∂B(θ) ∂θ(i)u(k), (2.39) ∂ˆy_k(θ) ∂θ(i) = C(θ) ∂ ˆxk(θ) ∂θ(i) + ∂C(θ) ∂θ(i)xˆk(θ) + ∂D(θ) ∂θ(i) u(k). (2.40) ∀i ∈ {1, · · · , p}

However, by concatenating the p equations in a row it is possible to express (2.39)-(2.40) as one equation in terms of the derivatives∂A(θ)

∂θT Ip⊗ ˆxk(θ) ,∂B(θ) ∂θT Ip⊗ uk, ∂C(θ) ∂θT Ip⊗ ˆxk(θ) and∂D(θ) ∂θT Ip⊗ ukas follows: ∂ ˆxk+1(θ) ∂θT = A(θ)∂ ˆxk(θ) ∂θT +∂A(θ) ∂θT Ip⊗ ˆxk(θ) +∂B(θ) ∂θT Ip⊗ uk , (2.41) ∂ˆy_k(θ) ∂θT = C(θ)∂ ˆxk(θ) ∂θT +∂C(θ) ∂θT Ip⊗ ˆxk(θ)+ ∂D(θ) ∂θT Ip⊗ uk. (2.42) Notice that ∂ ˆxk(θ) ∂θT ∈ R

nN×p_{is the state matrix which is iteratively updated.}

2.4.4 SSS partial derivatives.

From the analysis presented before, both the Steepest-Descent and the Gauss-Newton methods can be entirely expressed in terms of:

∂A(θ) ∂θT Ip⊗ ˆxk(θ) , ∂B(θ) ∂θT Ip⊗ uk, ∂C(θ) ∂θT Ip⊗ ˆxk(θ) , ∂D(θ) ∂θT Ip⊗ uk. (2.43) Furthermore, it is possible to show that these derivatives are also SSS matri-ces. This is a powerful result because it permits to fully express both methods (Steepest-Descent and Gauss Newton) in terms of basic operations between SSS matrices. Therefore, the number of operations needed in every iteration is linear with respect to the number of subsystems N . Later on, an analysis of complexity based on flop counting will be presented.

On the other hand, this theory can be easily extended to other optimization methods like Quasi-Newton (DFP, BFGS, Broyden, etc, Luenberger(2003), to-gether with the Wolfe conditions for the selection of the step size).

(37)

S S S O u tp u t-E rr o r a p p ro a ch 2 5              e Cs,e e Cs,w e As e Cs e Bs e Ds e Ds,e e Ds,w              =              0_q×n2 0q×nr 0q×nq 0q×nq 0q×nl xTs ⊗ Iq 0q×nq 0q×rl 0q×rq 0q×rq 0q×lq 0q×lq φs⊗ Iq 0_q×q2 0q×n2 0q×nr 0q×nq 0q×nq 0q×nl 0q×nq xTs ⊗ Iq 0q×rl 0q×rq 0q×rq 0q×lq 0q×lq 0q×q2 βs⊗ Iq ˆ xT s ⊗ In 0n×nr φs⊗ In βs⊗ In 0n×nl 0n×nq 0n×nq 0n×rl 0n×rq 0n×rq 0n×lq 0n×lq 0n×q2 0_n×q2 0_ℓ×n2 0ℓ×nr 0ℓ×nq 0l×nq xˆTs ⊗ Il 0l×nq 0l×nq 0l×rl 0l×rq 0l×rq φs⊗ Il βs⊗ Il 0l×q2 0_l×q2 0n×n2 uˆTs ⊗ In φs⊗ In βs⊗ In 0n×nl 0n×nq 0n×nq 0n×rl 0n×rq 0n×rq 0n×lq 0n×lq 0n×q2 0_n×q2 0l×n2 0l×nr 0l×nq 0l×nq 0l×ln 0l×nq 0l×nq uTs ⊗ Il 0l×rq 0l×rq φs⊗ Il βs⊗ Il 0l×q2 0_l×q2 0q×n2 0q×nr 0q×nq 0q×nq 0q×nl 0q×nq 0q×nq 0q×rl uTs ⊗ Iq 0q×rq 0q×lq 0q×lq φs⊗ Iq 0q×q2 0q×n2 0q×nr 0q×nq 0q×nq 0q×nl 0q×nq 0q×nq 0q×rl 0q×rq uTs ⊗ Iq 0q×lq 0q×lq 0q×q2 βs⊗ Iq             

(38)

Theorem 2.2 Let A(θ), B(θ), C(θ) and D(θ) be SSS matrices of conformable sizes. Be-sides, let ˆxk(θ) be the state resulting from the simulation of the system (2.25) and u the

global input vector. Then:

∂A(θ) ∂θT

Ip⊗ ˆxk(θ)

= SSS(Bs,e, Ws,e, eCs,e, eAs, Bs,w, Ws,w, eCs,w),

∂B(θ) ∂θT

Ip⊗ uk= SSS(Bs,e, Ws,e, eDs,e, eBs, Bs,w, Ws,w, eDs,w),

∂C(θ) ∂θT

Ip⊗ ˆxk(θ)= SSS(Fs,e, Ws,e, eCs,e, eCs, Fs,w, Ws,w, eCs,w),

∂D(θ) ∂θT

Ip⊗ uk

= SSS(Fs,e, Ws,e, eDs,e, eDs, Fs,w, Ws,w, eDs,w), (2.44)

where the matrices under the sign ˜· are depicted in Fig.2.3.

Proof: The result is a direct consequence of the SSS structure of the matrices A(θ), B(θ), C(θ) and D(θ). For the sake of simplicity, we define the following parameter vector:

θA=vec(A1)T vec(A2)T . . . vec(AN)T T

.

The same can be done for the generators: Bs, Bs,e, Bs,w, Cs, Cs,e, Cs,w, Ds,

Ds,e, Ds,w, Fs,e, Fs,w, Ws,e, and Ws,w.

Thus, the proof consists of calculating the derivatives with respect to the pa-rameters of every generator separately: ∂A(θ)

∂θT A IN n2⊗ ˆxk(θ) ,∂A(θ) ∂θT B IN nr⊗ ˆxk(θ) , ∂A(θ) ∂θT Be IN np⊗ ˆxk(θ) ,∂A(θ)_∂θT Bw IN np⊗ ˆxk(θ) ,∂A(θ)_∂θT C IN ln⊗ ˆxk(θ) ,∂A(θ)_∂θT Ce IN np⊗ ˆxk(θ) , ∂A(θ) ∂θT Cw IN np⊗ ˆxk(θ) ,∂A(θ) ∂θT D IN lr⊗ ˆxk(θ) ,∂A(θ) ∂θT De IN pr⊗ ˆxk(θ) ,∂A(θ) ∂θT Dw IN pr⊗ ˆxk(θ) , ∂A(θ) ∂θT Fe IN lp⊗ ˆxk(θ) ,∂A(θ)_∂θT Fw IN lp⊗ ˆxk(θ) ,∂A(θ)_∂θT We h IN p2⊗ ˆxk(θ) i ,_∂θ∂A(θ)T Ww h IN p2⊗ ˆxk(θ) i .Then,

we combine and ’sandwich’ the columns of every matrix according to the order given by the parameter vector θ. This procedure is known in the literature as ’shuffling’ and corresponds to combining the columns of the matrices the same as you would do with a deck of cards,Rice(2010).

So, at first we devote our efforts to obtain expressions for∂A(θ)_∂θT A IN n2⊗ ˆx_k(θ). We realize that: ∂A(θ) ∂θT A IN n2⊗ ˆx_k(θ)= ∂[A(θ)ˆ xk(θ)] ∂θT A − A(θ)∂ ˆxk(θ) ∂θT A . (2.45)

Now we concentrate on the derivative of [A(θ)ˆxk(θ)]. To do this, the

matrix-vector product between A(θ) and ˆxk(θ) is calculated explicitly. Let A(θ) be:

(39)

=         A1 B1,wC2,w B1,wW2,wC3,w . .. B2,eC1,e A2 B2,wC3,w . ..

B3,eW2,eC1,e B3,eC2,e A3 ...

. . . .. . ..         (2.46)

and ˆxk(θ) the state vector divided in N components with sizes conforming the

dimensions of the blocks of A(θ): ˆ xk(θ) =xT1(θ) xT2(θ) . . . xTN(θ) T . (2.47) Then: A(θ)ˆxk(θ) =      A1x1(θ) + B1,wC2,wx2(θ) + B1,wW2,wC3,wx3(θ) +· · · B2,eC1,ex1(θ) + A2x2(θ) + B2,wC3,wx3(θ) +· · ·

B3,eW2,eC1,ex1(θ) + B3,eC2,ex2(θ) + A3x3(θ) +· · ·

.. .     . (2.48)

Using the identity vec(XY Z) = (ZT_{⊗ X)vec(Y ) yields:}

Asxs(θ) = vec(Asxs(θ)), (2.49)

= (xT

s(θ)⊗ In)vec(As), (2.50)

where s∈ {1, · · · , N} and θT

As = vec(As)

T_{. Calculating the derivative boils down}

to: ∂[Asxs(θ)] ∂θT As = ∂vec(Asxs(θ)) ∂θT As , (2.51) = ∂ ∂θT As (xT s(θ)⊗ In)vec(As) , (2.52) = (∂x T s(θ) ∂θT As

⊗ In)(In2⊗ vec(A_s))I_n2+ (xT_s(θ)⊗ I_n). (2.53)

However, the identity matrix In2 can be written as I_n2 =eˆ₁,eˆ₂,· · · , ˆe_n2, where

ˆ

eiis the canonical vector that contains a 1 in the position i and 0 otherwise. Then:

= (∂x

T s(θ)

∂θT As

⊗ In)(In2⊗ vec(A_s))eˆ₁,eˆ₂,· · · , ˆe_n2+ (xT_s(θ)⊗ I_n). (2.54)

Linear State-Space Identification of Interconnected Systems: A structured approach