Design aspects of a distributed MIMD processor

(1)

Design Aspects of a

(2)

(3)

Design Aspects of a Distributed MIMD Processor

BIBLIOTHEEK TU Delft P 1774 4091

(4)

» \J> V ' * S Í J V.

(5)

Design Aspects of a

Distributed MIMD Processor

P R O E F S C H R I F T

ter verkrijging van de graad van doctor

in de technische wetenschappen aan

de Technische Hogeschool Delft, op

gezag van de rector magnificus

Prof. ir. B. P. Th. Veltman, in het

openbaar te verdedigen ten overstaan

dinsdag 13 maart 1984 te 16.00 uur

van het College van Dekanen op

H E N D R I C U S J O H A N N E S SIPS

door

electrotechnisch ingenieur, geboren te Amsterdam

n

1984

(6)

D i t p r o e f s c h r i f t i s goedgekeurd door de promotor P r o f . d r . i r . L. Dekker

(7)

aan Annette en mijn ouders

(8)

1.2 Analog and h y b r i d computers 11 1.3 D i g i t a l D i f f e r e n t i a l A n a l y z e r s (DDA's) 13 1.4 D i g i t a l computers 14 Chapter 2. P a r a l l e l l i s m i n d i f f e r e n t i a l systems 2.1 O r d i n a r y d i f f e r e n t i a l e q u a t i o n s 17 2.2 Types of p a r a l l e l l i s m 18 2.3 P a r a l l e l i n t e g r a t i o n methods 20 2.4 F u n c t i o n p a r t i t i o n i n g 24 2.5 S t a b i l i t y c h a r a c t e r i s t i c s of p i p e l i n e d methods 30 Chapter 3. P a r a l l e l p r o c e s s o r s t r u c t u r e s 3.1 P a r a l l e l p r o c e s s o r s t r u c t u r e s 39 3.2 Shared v a r i a b l e s 41 3.3 Data f l o w systems 44 3.4 The impact of VLSI 49

Chapter 4. Design c o n c e p t s and i m p l e m e n t a t i o n of a d i s t r i b u t e d MIMD p r o c e s s o r 4.1 Design c o n c e p t s of the p a r a l l e l p r o c e s s o r 51 4.2 The D e l f t P a r a l l e l P r o c e s s o r (DPP) 52 4.3 G e n e r a l i z a t i o n of the DPP s t r u c t u r e ; C l u s t e r e d P a r a l l e l P r o c e s s o r s 60 Chapter 5. Programming c o n c e p t s f o r c l u s t e r e d p a r a l l e l p r o c e s s o r s 5.1 Task systems 65 5.2 Task graph d e s c r i p t o r s 69 5.3 L e v e l and a l l o c a t i o n a l g o r i t h m s 70 5.4 Task communication 72 5.5 The s c h e d u l i n g of c o n d i t i o n a l s t a t e m e n t s 73 5.6 I t e r a t i v e p r o c e s s e s 76 5.7 High l e v e l language programming c o n s t r u c t s 79

5.8 An e x p e r i m e n t a l c o m p i l e r and s c h e d u l e r 82 5.9 The d i s t r i b u t i o n of r e c u r r e n c e systems on

(9)

Chapter 6. B i t - s e q u e n t i a l a r i t h m e t i c f o r p a r a l l e l p r o c e s s o r s 6.1. The a l g o r i t h m s 93 6.1.1 I n t r o d u c t i o n 93 6.1.2 Semi o n - l i n e a l g o r i t h m s 95 6.1.3 The i n n e r p r o d u c t 99 6.1.4 P r e c i s i o n 103 6.1.5 D i v i s i o n 104 6.1.6 Square r o o t 106 6.2 The i m p l e m e n t a t i o n 107 6.2.1 Exponent h a n d l i n g 108 6.2.2 M a n t i s s a c a l c u l a t i o n 111 6.2.3 M a n t i s s a g e n e r a t i o n and a l i g n m e n t 116 6.2.4 P o s t n o r m a l i z a t i o n 120 6.2.5 D i v i s i o n and square r o o t 123 6.2.6 G e n e r a l l a y - o u t 126 6.3 E f f i c i e n c y o f t h e i n n e r p r o d u c t p r o c e s s o r 130 6.4 A p p l i c a b i l i t y o f t h e i n n e r p r o d u c t p r o c e s s o r 138

Chapter 7. Some a s p e c t s o f c o n s t r u c t i n g CPP systems

7.1 I n t r o d u c t i o n 141 7.2 PE communication; t h e d a t a t r a n s p o r t system 143 7.3 Tag f i e l d s 147 7.4 PE s t r u c t u r e 148 7.5 P E - c l u s t e r communication 150 7.6 Performance o f CPP systems 154 Capter 8. C o n c l u s i o n s and d i s c u s s i o n 157 REFERENCES 159 SUMMARY 170 SAMENVATTING 172 ACKNOWLEDGMENT 175 LIST OF SYMBOLS 176

Appendix A. Round-off e r r o r s and word l e n g t h 177

Appendix B. Programming example DPP 179 Appendix C. Example H i g h L e v e l Language p a r a l l e l program 182

(10)

(11)

Chapter 1 Introduction

1.1 P a r a l l e l p r o c e s s i n g and s i m u l a t i o n of systems

I n many a p p l i c a t i o n s of l a r g e - s c a l e s c i e n t i f i c c o m p u t a t i o n , t h e r e i s a f a s t growing demand f o r computing power which cannot be met by a s i n g l e s e q u e n t i a l computer. The m a j o r i t y of problems i n t h i s a r e a a r e e i t h e r data c o l l e c t i o n and a n a l y s i s problems ( s u c h as i n h i g h energy p h y s i c s and s e i s m o l o g y ) , o r s i m u l a t i o n problems ( e . g . magnetic f u s i o n , o i l r e s e r v o i r m o d e l i n g , m o l e c u l a r d y n a m i c s ) . Computer s i m u l a t i o n of systems w i l l have a major impact on t h e o r e t i c a l and e x p e r i m e n t a l r e s e a r c h i n t h e near f u t u r e . T h i s w i l l o c c u r because i n many cases t h e p o s s i b i l i t i e s t o p e r f o r m

e x p e r i m e n t s on a c t u a l systems a r e r e s t r i c t e d by such f e a t u r e s as t h e danger, t h e c o s t , o r t h e J u r a t i o n of an e x p e r i m e n t . Computer s i m u l a t i o n of a system i s an a t t r a c t i v e a l t e r n a t i v e .

Today's s i m u l a t i o n p r a c t i c e i n c l u d e s t h e s t u d y of v e r y l a r g e ( i n terms of e q u a t i o n s ) and o f t e n c o m p l i c a t e d systems, r e q u i r i n g a c o n s i d e r a b l e amount of computing power. To a c e r t a i n e x t e n t , c u r r e n t computer equipment cannot meet a l l t h e r e q u i r e m e n t s of i n t e r a c t i v e s i m u l a t i o n . A c c o r d i n g t o Dekker e t a l . [DEKK79], a f u t u r e s i m u l a t o r has t o c o n t a i n a p o w e r f u l d a t a p r o c e s s i n g system, a b l e t o p e r f o r m t h e f o l l o w i n g t a s k s :

- p e r f o r m i n g a s i m u l a t i o n r u n , i . e . c o m p u t a t i o n a l m o d e l i n g , i m p l e m e n t a t i o n , and e x e c u t i o n of both v a l i d a t i o n and s i m u l a t i o n e x p e r i m e n t s ,

- u t i l i z i n g and u p d a t i n g t h e model d a t a base,

- p e r f o r m i n g background a c t i v i t i e s , e.g. f o r d e c i s i o n making p u r p o s e s , - a i d i n g t h e e x p e r i m e n t e r u s i n g implemented s i m u l a t i o n methodology

(12)

The c o m p u t a t i o n a l power needed f o r such a s i m u l a t o r cannot be p r o v i d e d by the t r a d i t i o n a l s e q u e n t i a l d i g i t a l computer. Of c o u r s e t h e r e a r e s t i l l improvements i n the speed of the s e q u e n t i a l computer, but the appearance of the p h y s i c a l speed l i m i t s w i l l i n h i b i t s u b s t a n t i a l improvements from b e i n g made. The most v i a b l e s o l u t i o n i s t o c o n s t r u c t a p a r a l l e l p r o c e s s o r , i n w h i c h a number of p r o c e s s o r s can c o o p e r a t i v e l y work on the same c o m p u t a t i o n a l problem. The c o n s t r u c t i o n of such a p a r a l l e l p r o c e s s o r as a s i m u l a t o r r e q u i r e s e x t e n s i v e r e s e a r c h i n t o t h e a r e a s of p a r a l l e l p r o c e s s o r a r c h i t e c t u r e , p a r a l l e l a l g o r i t h m s , s i m u l a t i o n methodology, and model d a t a base management.

The s u b j e c t of t h i s t h e s i s i s the c o n s t r u c t a b i l i t y and p r o g r a m m a b i l i t y of a modular and f a s t p a r a l l e l p r o c e s s o r . A l t h o u g h the p r i m a r y d e s i g n o b j e c t i v e s of the p a r a l l e l p r o c e s s o r have been based on t h e r e q u i r e m e n t s f o r s i m u l a t i o n of systems, the p a r a l l e l p r o c e s s o r a r c h i t e c t u r e I s e q u a l l y

s u i t a b l e f o r r e l a t e d problems ( e . g . s i g n a l a n a l y s i s , g e n e r a l m a t r i x c a l c u l u s ) . S i m u l a t i o n of systems i m p l i e s the s o l u t i o n of problems w i t h an i r r e g u l a r s t r u c t u r e and d a t a f l o w , which o f t e n c o n t a i n n o n - l i n e a r i t i e s . T h i s imposes f a r - r e a c h i n g c o n d i t i o n s on the a r c h i t e c t u r e of a p a r a l l e l p r o c e s s o r f o r s i m u l a t i o n . On t h e o t h e r hand, because of t h i s , s i m u l a t i o n s e r v e s as an i d e a l t e s t i n g ground f o r p a r a l l e l p r o c e s s i n g .

I n t h e remainder of t h i s Chapter an o v e r v i e w and a h i s t o r i c a l s k e t c h i s g i v e n of computer systems as a t o o l f o r s i m u l a t i o n s t u d i e s .

Chapter 2 d i s c u s s e s t h e o c c u r r e n c e of p a r a l l e l i s m i n s i m u l a t i o n , e s p e c i a l l y f o c u s e d on the s o l u t i o n of o r d i n a r y d i f f e r e n t i a l e q u a t i o n s (ODE's). S e v e r a l forms of p a r a l l e l i s m a r e i d e n t i f i e d , i n c l u d i n g o p e r a t o r and f u n c t i o n p a r a l l e l i s m . An o v e r v i e w of r e s e a r c h i n t h a t a r e a i s g i v e n .

A t t e n t i o n i s p a i d t o a new c l a s s of p a r a l l e l a l g o r i t h m s , which use f u n c t i o n p i p e l i n i n g .

P a r a l l e l p r o c e s s i n g s t r u c t u r e s a r e t h e t o p i c of Chapter 3. F i r s t p a r a l l e l p r o c e s s o r s are c l a s s i f i e d a c c o r d i n g t o t h e i r method of programming. Then p a r a l l e l p r o c e s s o r s t r u c t u r e s a r e c l a s s i f i e d as t o t h e way they t r e a t shared v a r i a b l e s . The c h a r a c t e r i s t i c s of a new c l a s s of p a r a l l e l computers, c a l l e d d a t a f l o w computers, are a n a l y z e d .

(13)

o u t l i n e d . The d e s i g n c o n c e p t s and t h e a r c h i t e c t u r e of an i n i t i a l

i m p l e m e n t a t i o n of such a p r o c e s s o r s t r u c t u r e , the D e l f t P a r a l l e l P r o c e s s o r ( D P P ) , a r e d i s c u s s e d . The a r c h i t e c t u r e of the DPP i s g e n e r a l i z e d i n the concept of C l u s t e r e d P a r a l l e l P r o c e s s o r s ( C P P ) .

I n Chapter 5 the i m p l e m e n t a t i o n of i n s t r u c t i o n and d a t a streams on the p a r a l l e l p r o c e s s o r i s t r e a t e d . S e v e r a l methods of t a s k s c h e d u l i n g a r e c o n s i d e r e d i n r e l a t i o n t o t h e i m p l e m e n t a t i o n of programming c o n s t r u c t s such as c o n d i t i o n a l s t a t e m e n t s , i t e r a t i o n , and r e c u r s i o n . An a l g o r i t h m f o r t h e d i s t r i b u t i o n of t a s k systems over a number of p r o c e s s o r c l u s t e r s i s p r e s e n t e d , i n which each t a s k system d e s c r i b e s one or more c o u p l e d r e c u r r e n c e systems.

I n C h a p t e r 6 a b i t - s e q u e n t i a l a r i t h m e t i c u n i t w i t h 0(n) c o m p l e x i t y i s d e s c r i b e d , where n i s the word l e n g t h of the o p e r a n d s . The a r i t h m e t i c u n i t d e s c r i b e d can s e r v e as a f u t u r e improvement f o r D P P - l i k e systems, e s p e c i a l l y when VLSI i m p l e m e n t a t i o n i s c o n s i d e r e d . The o p e r a t i o n s performed by t h e element a r e t h e i n n e r p r o d u c t , d i v i s i o n , and s q u a r e r o o t o p e r a t i o n . I n t h e a r i t h m e t i c u n i t the a b s o r p t i o n of the operands and the o p e r a t i o n i t s e l f a r e performed i n an o v e r l a p p e d / p i p e l i n e d f a s h i o n . A performance c o m p a r i s o n between the b i t - s e q u e n t i a l a r i t h m e t i c u n i t and c o n v e n t i o n a l p i p e l i n e d a r i t h m e t i c u n i t s i s g i v e n .

F i n a l l y i n Chapter 7 some t o p i c s f o r f u t u r e i m p l e m e n t a t i o n s of CPP systems are t r e a t e d . These t o p i c s i n c l u d e the communication between P E - c l u s t e r s and the i n t e r n a l o r g a n i z a t i o n of p r o c e s s i n g e l e m e n t s .

1.2 Analog and h y b r i d computers

I n an a n a l o g computer t h e dependent v a r i a b l e s a r e r e p r e s e n t e d by p h y s i c a l q u a n t i t i e s such as c u r r e n t , v o l t a g e , number of r o t a t i o n s p e r s e c o n d , e t c . , and appear i n c o n t i n u o u s form. The h i s t o r y of a n a l o g computing d e v i c e s s t a r t e d w i t h the i n v e n t i o n by W i l l i a m Oughtred (1620) of the s l i d e r u l e , i n w h i c h t h e s c a l e l e n g t h i s t h e p h y s i c a l a n a l o g of numbers t o a l o g a r i t h m i c base [HUSK76]. I n the b e g i n n i n g of t h e 19th c e n t u r y the p l a n i m e t e r , a f o r m of m e c h a n i c a l i n t e g r a t o r , was d e v e l o p e d . The f i r s t concept of a d i f f e r e n t i a l a n a l y z e r came from W. Thomson ( l a t e r L o r d K e l v i n ) , who d e s c r i b e d a harmonic a n a l y z e r , u s i n g an improved m e c h a n i c a l

(14)

i n t e g r a t o r , t o s o l v e a system of d i f f e r e n t i a l e q u a t i o n s i n a c l o s e d l o o p [THOM76].

A l t h o u g h t h e concept of t h e d i f f e r e n t i a l a n a l y z e r was d e v e l o p e d by L o r d K e l v i n , the f i r s t s u c c e s s f u l m e c h a n i c a l d i f f e r e n t i a l a n a l y z e r was

c o n s t r u c t e d by Vannevar Bush [BUSH31]. I n f a c t , a n a l o g computers were the f i r s t t o o l s i n h i s t o r y w h i c h made a c o n s i d e r a b l e r e d u c t i o n of problem c a l c u l a t i o n t i m e s p o s s i b l e [GOLD72]. The m e c h a n i c a l d i f f e r e n t i a l a n a l y z e r was r e l a t i v e l y slow, but a v e r y h i g h a c c u r a c y c o u l d be a c h i e v e d [CRAN47]. Due t o t h e i r m e c h a n i c a l components, t h e s e d i f f e r e n t i a l a n a l y z e r s were not v e r y f l e x i b l e .

D u r i n g World War I I , a g r e a t impetus was g i v e n t o a n a l o g s i m u l a t o r s and the need f o r f a s t and a c c u r a t e components l e d t o a new c l a s s of e l e c t r o n i c and e l e c t r o - m e c h a n i c a l d e v i c e s . The f i r s t e l e c t r o n i c a n a l o g computers were s p e c i a l - p u r p o s e computers. L a t e r on, g e n e r a l - p u r p o s e a n a l o g computers became a v a i l a b l e , w h i c h c o u l d be programmed by means of a p a t c h b o a r d , i n t e r c o n n e c t i n g t h e v a r i o u s hardware components of t h e system.

S i n c e t h e t i m e t h a t i t was r e c o g n i z e d t h a t a n a l o g and d i g i t a l computers each had t h e i r s p e c i f i c a d v a n t a g e s , b o t h systems were combined i n a h y b r i d computer, y i e l d i n g a v e r y p o w e r f u l s i m u l a t o r [BEKE68]. Modern h y b r i d computers have been improved s i n c e the a r r i v a l of a u t o p a t c h computers i n w h i c h t h e i n t e r c o n n e c t i o n s a r e made by means of a programmable s w i t c h m a t r i x . Worth m e n t i o n i n g i s t h e work on t i m e - s h a r e d h y b r i d a u t o p a t c h systems a t t h e computer c e n t r e s of t h e D e l f t U n i v e r s i t y of T e c h n o l o g y [SIPS78,BROK79] and the T e c h n i c a l U n i v e r s i t y of V i e n n a [KLEIN83].

The a n a l o g computer has some b a s i c p r o p e r t i e s , w h i c h makes i t p a r t i c u l a r l y s u i t a b l e f o r s i m u l a t i o n . Most systems a r e p a r a l l e l by n a t u r e ; t h e r e f o r e a one-to-one a n a l o g y i n space and time of the implemented computer model and the system under study i s p r e f e r r e d . The a n a l o g computer meets t h i s

o b j e c t i v e , because of t h e p a r a l l e l o p e r a t i o n of a l l t h e computing components. The t r a n s f e r of v a r i a b l e s i n an a n a l o g computer i s r e a l i z e d n a t u r a l l y i n p a r a l l e l , because t h e r e i s o n l y one time s e t f o r b o t h d a t a t r a n s f e r and d a t a p r o c e s s i n g . Another p r o p e r t y of the a n a l o g computer i s t h a t the r a t i o between p h y s i c a l time and d a t a p r o c e s s i n g time ( t i m e s c a l e f a c t o r ) i s f i x e d i n t h e sense t h a t t h i s r a t i o i s not i n f l u e n c e d by t h e d a t a p r o c e s s i n g and can e a s i l y be changed. T h i s f e a t u r e makes t h e a n a l o g computer v e r y s u i t a b l e f o r

r e a l - t i m e p r o c e s s i n g .

The f a c t t h a t t h e a n a l o g computer has o n l y one a l g o r i t h m f o r t h e b a s i c o p e r a t i o n " I n t e g r a t i o n w i t h r e s p e c t t o t i m e " makes t h e i m p l e m e n t a t i o n of

(15)

d i f f e r e n t i a l systems much l e s s cumbersome t h a n on a d i g i t a l computer. The most a t t r a c t i v e p r o p e r t y of the a n a l o g computer i s i t s v e r y h i g h speed. T h i s speed i s d e t e r m i n e d o n l y by t h e bandwidth of t h e computing components and not the c o m p l e x i t y of the problem. T h i s speed e n a b l e s r e a l i n t e r a c t i v e s i m u l a t i o n .

Next t o t h e s e a d v a n t a g e s , t h e r e a r e some d i s a d v a n t a g e s a s s o c i a t e d w i t h a n a l o g computers, i n c l u d i n g the l i m i t e d a c c u r a c y of t h e computing components, t h e s m a l l s e t of o p e r a t i o n s , the l a c k of d a t a s t o r a g e c a p a c i t y , and the need f o r the s c a l i n g of the v a r i a b l e s due to the f i x e d p o i n t number r e p r e s e n t a t i o n .

1.3 D i g i t a l d i f f e r e n t i a l a n a l y z e r s (PDA's)

About 1950 t h e r e was a need f o r f a s t and r e l i a b l e d i f f e r e n t i a l e q u a t i o n s o l v e r s i n the a r e a of m i s s i l e and a i r c r a f t c o n t r o l [ S I Z E 6 8 ] . A n a l o g components a t t h a t time were not s u i t a b l e because of a c c u r a c y , s t a b i l i t y , and c a l i b r a t i o n p r o b l e m s . On the o t h e r hand, the g e n e r a l - p u r p o s e d i g i t a l computer i n t h o s e days was b u l k y and consumed too much power. Out of t h e s e problems the concept of the d i g i t a l d i f f e r e n t i a l a n a l y z e r (DDA) a r o s e . The p r i n c i p l e of t h e DDA can be d e s c r i b e d as f o l l o w s . G i v e n a system of f i r s t o r d e r homogeneous l i n e a r d i f f e r e n t i a l e q u a t i o n s

where and j£ a r e n - v e c t o r s and A i s a n*n r e a l c o n s t a n t m a t r i x . T h i s system can be s o l v e d n u m e r i c a l l y by u s i n g the E u l e r i n t e g r a t i o n method

jl'(t) = A.J£(t) M_(tQ) = 1 (1.1) Hn+1 =Hn +At.A.Mn _{HO = 1} (1.2) I n the DDA e q u a t i o n (1.2) i s s o l v e d by t h e f o l l o w i n g s e t of r e g i s t e r t r a n s f e r e q u a t i o n s : At.F_ + R~Ay + R A.Aji—AF_ F + AF—F (1.3) (1. S) (1.4) where t h e r e g i s t e r s c o n t a i n t h e f u n c t i o n p a r t of ( 1 . 2 ) , i . e . A.yn, t h e i?

(16)

r e g i s t e r s c o n t a i n t h e r e s i d u e of the i n t e g r a l i n c r e m e n t , and AY i s the i n t e g r a l i n c r e m e n t .

From the f o r m u l a s ( 1 . 3 ) — ( 1 . 5 ) i t c l e a r l y f o l l o w s t h a t i n s t e a d of t h e i n t e g r a l v a l u e Y, o n l y i t s most s i g n i f i c a n t i n c r e m e n t a l p a r t AY i s

t r a n s m i t t e d . The l e a s t s i g n i f i c a n t p a r t i s r e t a i n e d i n a r e s i d u e r e g i s t e r (R). To s i m p l i f y the m u l t i p l i c a t i o n i n e q u a t i o n (1.4) the f i r s t DDA's o n l y used b i n a r y AY £ [1, -1], or t e r n a r y t r a n s f e r AY e [1, 0, -1] , w h i c h l i m i t s t h e speed of t h i s type of DDA. The f i r s t o p e r a t i o n a l DDA u s i n g t h i s p r i n c i p l e was MADDIDA ( M a g n e t i c Drum D i g i t a l D i f f e r e n t i a l A n a l y z e r ) [STEE50]. To i n c r e a s e the s o l u t i o n speed of the DDA, m u l t l d i g i t i n c r e m e n t a l t r a n s f e r [GHEE70], and f l o a t i n g p o i n t a r i t h m e t i c [ELSH70] have been p r o p o s e d .

Examples of DDA a r c h i t e c t u r e s a r e the systems d e s c r i b e d by Brafman and R e n t e r [BRAF77], H a n n i n g t o n and Whitehead [HANN76] , and A m e l i n g e t a l . [AMEL76].

D e s p i t e many a t t e m p t s , the DDA has never been a c c e p t e d as a g e n e r a l -purpose s i m u l a t i o n t o o l . I m p o r t a n t r e a s o n s a r e t h a t the i n c r e m e n t a l g e n e r a t i o n of n o n - l i n e a r f u n c t i o n s i s i n a c c u r a t e and t h a t the DDA a r c h i t e c t u r e i s d i r e c t e d too much toward hardware s a v i n g s .

1.4 D i g i t a l computers

I n 40 y e a r s the g e n e r a l purpose d i g i t a l computer e v o l v e d f r o m the ENIAC c o n t a i n i n g 18,000 vacuum tubes [HUSK76] t o t h e 3 2 - b i t m i c r o computer

c o n t a i n i n g 450,000 t r a n s i s t o r s on a 6.35 mm s i l i c o n c h i p [BEYE83]. As soon as the d i g i t a l computer became f a s t enough and the a p p r o p r i a t e language t o o l s were d e v e l o p e d , t h e d i g i t a l computer became a s t i f f c o m p e t i t o r i n s i m u l a t i o n of systems. Even today's m i n i c o m p u t e r s a r e used f o r s m a l l t o -medium-scale s i m u l a t i o n s t u d i e s .

The main d i s a d v a n t a g e of t h e c o n v e n t i o n a l d i g i t a l computer i s i t s

s e q u e n t i a l n a t u r e . T h i s i m p l i e s among o t h e r t h i n g s , t h a t the c o m p l e x i t y of the problem d e t e r m i n e s t h e s o l u t i o n speed t o a g r e a t e x t e n t . Because t h e i n c r e a s e i n hardware speed w i l l be l i m i t e d , some problems cannot be s o l v e d w i t h i n a r e a s o n a b l e time bound. To get r i d of t h e s e q u e n t i a l b o t t l e n e c k of the c o n v e n t i o n a l d i g i t a l computer, p a r a l l e l i s m has been i n t r o d u c e d o r proposed on a l l l e v e l s of the computer a r c h i t e c t u r e .

The l a s t t e n y e a r s t h e r e has been e x t e n s i v e r e s e a r c h i n t o the a r e a of p a r a l l e l p r o c e s s i n g . There a r e a number of ways t o speed up c o m p u t a t i o n .

(17)

One i s the use of m u l t i p l e p i p e l i n e d a r i t h m e t i c u n i t s , such as a p p l i e d i n the Cray-1 [CRAY77] and the Cyber 205 [CDCC80], i n w h i c h the a r i t h m e t i c u n i t p e r f o r m i n g t h e o p e r a t i o n i s d i v i d e d i n t o a number of segments w i t h e q u a l p r o c e s s i n g t i m e s . An example i s f l o a t i n g p o i n t a d d i t i o n , where t h e segments can c o n s i s t of exponent c o m p a r i s o n , exponent s u b t r a c t i o n , f r a c t i o n a l i g n m e n t , f r a c t i o n a d d i t i o n , and p o s t n o r m a l i z a t i o n . P i p e l i n i n g i s not advantageous f o r t h e o p e r a t i o n of the f i r s t operands, s i n c e the t o t a l p r o c e s s i n g time i s e q u a l t o the sum of the p r o c e s s i n g times of the segments, but each subsequent o p e r a t i o n on t h e s u c c e e d i n g operands i s c a r r i e d out i n t h e time r e q u i r e d t o p r o c e s s a s i n g l e segment. T h i s approach r e q u i r e s the problem t o be f o r m u l a t e d i n a p i p e l i n e d way, i . e . no dependency s h o u l d e x i s t between t h e operands b e i n g o p e r a t e d upon s i m u l t a n e o u s l y i n the p i p e l i n e . F o r t u n a t e l y many problems can be f o r m u l a t e d i n t h i s way. The r e q u i r e d number of segments i n a p i p e l i n e i s dependent on the s e g m e n t a t i o n a l g o r i t h m , w i t h as a consequence t h a t g i v e n a c e r t a i n number of segments, t h e e f f e c t i v e n e s s of a p i p e l i n e i s dependent on the v e c t o r l e n g t h of the o p e r a n d s .

I n s i m u l a t i o n of systems p i p e l i n i n g i s d i f f i c u l t t o a p p l y [KARP81]. A system can o f t e n be d e s c r i b e d by a s e t of d i f f e r e n t i a l or d i f f e r e n c e

e q u a t i o n s and t h e s o l u t i o n i s found by u s u a l l y s o l v i n g n o n l i n e a r r e c u r r e n c e systems. The d a t a dependency i s h i g h and o f t e n c o n d i t i o n a l l o o p i n g o c c u r s , i . e . d a t a dependent d e c i s i o n s det ermine t h e d a t a f l o w . T h e r e f o r e , p i p e l i n i n g the operands g i v e s l i t t l e speed up. Only f o r l i n e a r r e c u r r e n c e systems can p i p e l i n i n g be s u c c e s s f u l l y used by i n t r o d u c i n g some redundancy [GAJS81]. The c u r r e n t a v a i l a b i l i t y of h i g h l y p i p e l i n e d computers makes t e c h n i q u e s l i k e p i e c e - w i s e l i n e a r i z a t i o n a g a i n i m p o r t a n t .

An a l t e r n a t i v e t o a p i p e l i n e d a r c h i t e c t u r e i s a p a r a l l e l a r c h i t e c t u r e where a number of p r o c e s s i n g elements o p e r a t e s i m u l t a n e o u s l y w i t h i n a g i v e n i n t e r c o n n e c t i o n s t r u c t u r e . A c c o r d i n g t o F l y n n [FLYN72], t h e p a r a l l e l a r c h i t e c t u r e s can be c l a s s i f i e d as S i n g l e I n s t r u c t i o n stream/ M u l t i p l e Data s t r e a m (SIMD), or M u l t i p l e I n s t r u c t i o n s t r e a m / M u l t i p l e Data stream(MIMD) p r o c e s s o r s . I n b o t h t y p e s , a l a r g e number of p r o c e s s i n g elements must be i n c o r p o r a t e d i n o r d e r t o a c h i e v e a speed comparable t o t h a t of a p i p e l i n e d a r c h i t e c t u r e .

A d e d i c a t e d m u l t i p r o c e s s o r system e s p e c i a l l y d e s i g n e d f o r s i m u l a t i o n of systems i s t h e A p p l i e d Dynamics AD-10 [APPL81]. T h i s system c o n s i s t s of a number of f u n c t i o n a l l y d i f f e r e n t p r o c e s s o r s . Each p r o c e s s o r has been o p t i m i z e d t o p e r f o r m a c e r t a i n t a s k . There i s a N u m e r i c a l I n t e g r a t i o n P r o c e s s o r (NIP) d e s i g n e d t o p e r f o r m f a s t n u m e r i c a l i n t e g r a t i o n . The Memory

(18)

A d d r e s s P r o c e s s o r (MAP) and t h e D e c i s i o n P r o c e s s o r (DEP) a l l o w f a s t a c c e s s and comparison of v a r i a b l e v a l u e s , i n c l u d i n g f u n c t i o n g e n e r a t i o n . The A r i t h m e t i c P r o c e s s o r (ARP) has been d e s i g n e d f o r t h e f a s t e v a l u a t i o n of e x p r e s s i o n s of t h e t y p e ±(a ± b)a ± e. The C o n t r o l P r o c e s s o r (COP) c o o r d i n a t e s a l l t h e o t h e r p r o c e s s o r s i n the AD-10.

(19)

Chapter 2 Parallelism in Differential S y s t e m s

2.1 O r d i n a r y d i f f e r e n t i a l e q u a t i o n s

Many r e a l w o r l d systems c a n be d e s c r i b e d by systems o f d i f f e r e n t i a l e q u a t i o n s . Here we f o c u s on o r d i n a r y d i f f e r e n t i a l e q u a t i o n s of t h e i n i t i a l v a l u e k i n d . F o r c o n v e n i e n c e we r e s t r i c t o u r s e l v e s t o t h e homogeneous c a s e . The e q u a t i o n s a r e of t h e form

JL'U) =i(R(t),t) yf-b0)=l (*•!)

where y, % and f_ a r e m - v e c t o r s . The f u n c t i o n y'(t) i s t h e f i r s t

d e r i v a t i v e of y(t) w i t h r e s p e c t t o t h e independent v a r i a b l e t. The e l e m e n t s of ]£_ are t h e state variables and t h e e q u a t i o n s (2.1) t h e s t a t e e q u a t i o n s . The elements of f_ are the functions ( o r d e r i v a t i v e s ) of t h e e q u a t i o n s , and the elements of rj_ are t h e i n i t i a l v a l u e s . The e q u a t i o n s c a n be s o l v e d i n p a r a l l e l by u s i n g an a n a l o g o r h y b r i d computer system [BEKE68]. F o r

c o m p u t a t i o n on a d i s c r e t e - t i m e computing d e v i c e a n u m e r i c a l i n t e g r a t i o n method must be a p p l i e d . F o r t h e s o l u t i o n i n a d i s c r e t e - t i m e s e t (2.1) i s g e n e r a l l y t r a n s f o r m e d t o

Mn+1 = G(yIl+1,...,]in_ptfn+li...,tn_q,h) (2.2)

where G_ i s d e f i n e d by t h e n u m e r i c a l method and ^n^ ~ * M=^ *s ^ e s t e p s i z e .

(20)

computer i s i n p r i n c i p l e independent of the number and c o m p l e x i t y of the e q u a t i o n s i n ( 2 . 1 ) , p r o v i d e d of c o u r s e t h a t enough computing components a r e a v a i l a b l e .

The s t a t e e q u a t i o n s i n (2.2) a r e i n h e r e n t l y r e c u r r e n c e e q u a t i o n s . We are i n t e r e s t e d i n the speed up w h i c h can be a c h i e v e d by u s i n g a p a r a l l e l p r o c e s s o r i n s t e a d of a u n i - p r o c e s s o r when s o l v i n g t h e r e c u r r e n c e e q u a t i o n s (2.2) by e x p l o i t i n g the p a r a l l e l i s m i n the e q u a t i o n s . The speed up i s d e f i n e d as the r a t i o T :T , where T i s the time i n w h i c h the p a r a l l e l

p u p

p r o c e s s o r s o l v e s the problem, and Tu i s the time I n w h i c h the u n i - p r o c e s s o r

s o l v e s the problem.

The p a r a l l e l e v a l u a t i o n of l i n e a r r e c u r r e n c e systems ( e . g . dj;.-i=a-+b-.a-, i=0,l,2, .. . ) has been s t u d i e d q u i t e e x t e n s i v e l y ( s e e [KUCK77 ,GAIS81 ] ) . By means of e l i m i n a t i o n t e c h n i q u e s a c o n s i d e r a b l e speed up can be a c h i e v e d , sometimes a t the c o s t of an e x c e s s i v e number of p r o c e s s o r s . H y a f i l and Kung [HYAF77] showed t h a t t h e speed up of a l i n e a r r e c u r r e n c e system of t h e f i r s t o r d e r i s a t most 2p/3+l/3, where p i s t h e number of p r o c e s s o r s . T h i s means t h a t even f o r s i m p l e l i n e a r r e c u r r e n c e systems we get a t most 70% of the maximal speed up p. A r e l a t e d r e s u l t of Kung [KUNG76] s t a t e s t h a t , i n g e n e r a l , n o n l i n e a r r e c u r r e n c e systems can be speeded up a t most by a c o n s t a n t f a c t o r . The e q u a t i o n s i n (2.2) are i n g e n e r a l n o n l i n e a r , so o n l y a l i m i t e d speed up c a n be a c h i e v e d . T h e r e f o r e , i t i s v e r y i m p o r t a n t t o e x p l o i t t h e i n h e r e n t p a r a l l e l i s m as much as p o s s i b l e . Another c o n c l u s i o n might be t h a t by u s i n g p i e c e w i s e l i n e a r i z a t i o n t e c h n i q u e s ( e . g . by means of p a r a l l e l t a b l e lookup methods [DEKK79] or t r a n s f o r m a t i o n t e c h n i q u e s ) a c o n s i d e r a b l e speed up can be a c h i e v e d .

2.2 Types of p a r a l l e l i s m

In e v a l u a t i n g the system d e s c r i b e d by the e q u a t i o n s i n ( 2 . 2 ) , two s o r t s of p a r a l l e l i s m can be d i s t i n g u i s h e d :

- P a r a l l e l i n t e g r a t i o n . A l l e q u a t i o n s can be i n t e g r a t e d c o n c u r r e n t l y by u s i n g the i n t e g r a t i o n method G_. I n most c a s e s the i n t e g r a t i o n method

w i l l g i v e the same w o r k l o a d f o r e v e r y e q u a t i o n , so t h a t the i n t e g r a t i o n p a r t of the e q u a t i o n s can be s o l v e d i n p a r a l l e l , by u s i n g an e q u a l number of p r o c e s s o r s .

(21)

t h e f u n c t i o n e v a l u a t i o n s over an e q u a l number of p r o c e s s o r s . However, the most time-consuming f u n c t i o n e v a l u a t i o n then d e t e r m i n e s t h e s o l u t i o n t i m e , almost l i n e a r l y w i t h t h e number of o p e r a t i o n s i n t h e f u n c t i o n .

Because of t h e l i m i t e d speed up a c h i e v a b l e by e v a l u a t i n g a complete f u n c t i o n on a s i n g l e p r o c e s s o r , t h e f u n c t i o n s must be broken down i n t o s m a l l e r e x e c u t a b l e e n t i t i e s . We can go as f a r as t h e b a s i c a r i t h m e t i c o p e r a t o r s ( e . g . +,*). Any a r i t h m e t i c e x p r e s s i o n E(n) c o n t a i n i n g n d i s t i n c t v a r i a b l e s or c o n s t a n t s can be e v a l u a t e d i n O(logg\.) time s t e p s u s i n g 0(n) p r o c e s s o r s [KUCK77], assuming t h a t two operand o p e r a t o r s a r e used ( F i g . 2 . 1 ) . The lower bound on t h e p r o c e s s i n g time i s g i v e n by Tn^IfogvJ

Fig. 2.1 Expression evaluation, (a) sequential execution, (b) parallel execution.

The a c t u a l p r o c e s s i n g time i s i n f l u e n c e d by t h e depth of t h e p a r e n t h e s e s n e s t i n g i n an e x p r e s s i o n . F o r example, t h e e x p r e s s i o n E=2(a+Z(b+4(c+d))) can be e v a l u a t e d i n 6 time s t e p s by u s i n g t h e b a s i c o p e r a t o r s of a d d i t i o n and m u l t i p l i c a t i o n , w h i l e w i t h o u t p a r e n t h e s e s , i . e . E=24c+24d+6b+2a, o n l y 3 time s t e p s a r e needed. The t r e e h e i g h t can be reduced by u s i n g a s s o c i a t i v e and commutative l a w s . Bear and Bovet [BEAR68] d e v e l o p e d an o p t i m a l a l g o r i t h m f o r d o i n g so. Kuck and Muraoka [KUCK74] d e r i v e d an upper bound on the e x e c u t i o n t i m e : T„4flog^n\ +2d+l by u s i n g a t most \n/2-d\ p r o c e s s o r s , where

P «

d i s the depth of t h e p a r e n t h e s e s n e s t i n g , and o n l y a s s o c i a t i v e and

commutative l a w s a r e used. When t h e d i s t r i b u t i v e law i s a l s o i n c l u d e d , i t was shown by Brent [BREN74] t h a t t h e e x e c u t i o n time i s bounded by

Tp$. 4log^i+10(n-l)/p, u s i n g p^l p r o c e s s o r s , w h i c h c a n i n d e p e n d e n t l y p e r f o r m the a r i t h m e t i c o p e r a t i o n s of m u l t i p l i c a t i o n , a d d i t i o n , and d i v i s i o n i n u n i t t i m e . T h i s bound i s a s h a r p e r upper bound t h a n t h e Kuck and Muraoka bound f o r o n l y s t r o n g l y n e s t e d e x p r e s s i o n s such as t h e Horner type of p o l y n o m i a l s .

(22)

I t i s argued by Kuck t h a t o p e r a t o r p a r a l l e l i s m r e s u l t s i n a r e a s o n a b l e i n c r e a s e i n performance f o r l a r g e v a l u e s o f n, but g i v e s l i t t l e advantage f o r s m a l l e r v a l u e s of n [KUCK77]. He c l a i m s t h a t e x p r e s s i o n s i n programs r a r e l y have more t h a n 5 o r 6 arguments. However, i n s i m u l a t i o n a f u n c t i o n e v a l u a t i o n can be q u i t e c o m p l i c a t e d and time-consuming. S i n c e t h e l o n g e s t f u n c t i o n e v a l u a t i o n i n f a c t d e t e r m i n e s the t o t a l s o l u t i o n speed, i t i s v e r y i m p o r t a n t t o s h o r t e n t h e f u n c t i o n e v a l u a t i o n time as much as p o s s i b l e . P r o c e s s o r u t i l i z a t i o n w i l l not always be t h e most i m p o r t a n t

c o s t - e f f e c t i v e n e s s measure i n t h e f u t u r e , but r a t h e r f a c t o r s such as p r o g r a m m a b i l i t y , f l e x i b i l i t y , r e d u c t i o n of o v e r h e a d , communication c o s t s , e t c . O p e r a t o r p a r a l l e l i s m i s e x a c t l y t h e c o n c e p t u n d e r l y i n g d a t a

f l o w computers (see Chapter 3 ) .

2.3 P a r a l l e l i n t e g r a t i o n methods

Speed up of problem c a l c u l a t i o n t i m e s when s o l v i n g d i f f e r e n t i a l systems can be o b t a i n e d by d i s t r i b u t i n g t h e e q u a t i o n s over a number of p r o c e s s o r s . K o r n

[KORN72] suggested the i d e a of c o u p l i n g a number of s t a n d a r d m i n i c o m p u t e r s t o o b t a i n g r e a t e r speed at l o w e r c o s t . The d i f f i c u l t y of t h i s approach i s t h a t the t o t a l problem must be c a r e f u l l y d i v i d e d i n t o a number of p a r a l l e l e x e c u t a b l e t a s k s w i t h a p p r o x i m a t e l y e q u a l e x e c u t i o n t i m e s , o t h e r w i s e t h e performance i s degraded. F o r s p e c i f i c problems a f a i r l y even d i s t r i b u t i o n can be a c h i e v e d . An example can be found i n [KERC83]. However, i n most c a s e s t h i s d i v i s i o n p r o c e s s w i l l be a t e d i o u s and time-consuming p r o c e d u r e .

In t h i s s e c t i o n we w i l l d i s c u s s the speed up problem i n a more g e n e r a l way. S i n c e t h e s o l u t i o n time of a system of d i f f e r e n t i a l e q u a t i o n s i s bounded by the most time-consuming f u n c t i o n e v a l u a t i o n , s e v e r a l methods have been d e v e l o p e d t o c a l c u l a t e more t h a n one s o l u t i o n p o i n t p e r i n t e g r a t i o n s t e p . In l i t e r a t u r e t h e s e methods a r e c a l l e d p a r a l l e l i n t e g r a t i o n methods.

One of the f i r s t methods was proposed by N i e v e r g e l t [NIEV64]. The i d e a i s t o d i v i d e the s o l u t i o n i n t e r v a l [a,h] i n t o N e q u a l s u b i n t e r v a l s

[t^_j,t^J, -v=l,...3f!, t0=a, tj^=b. Then by some method, a p r e d i c t i o n z/? of

the s o l u t i o n y(t-) i s made. F u r t h e r , a number M- of v a l u e s u. . ¿=1, ...,M-are chosen, where the y^j v a l u e s l i e i n t h e v i c i n i t y of ( f o r each •£). Next a l l the .A, M- i n i t i a l v a l u e problems a r e s o l v e d s i m u l t a n e o u s l y ( F i g . 2.2). The f i n a l v a l u e s a r e found by an i n t e r p o l a t i o n of t h e end v a l u e s of the unique s o l u t i o n b r a n c h [tn,t-,J w i t h t h e s e t of b r a n c h e s of

(23)

c <>

-

»11T> Y2 M . > y2 2 ' 2 1 " N - 1 . MN_ V YN - 1 , 1 ' N - 1 'o=< t t - b N-1 N

Fig. 2.2 Nievergelt's method.

Ct-^,t2^s f o l l o w e d by an I n t e r p o l a t i o n of t h e end v a l u e of t g w i t h t h e s e t o f ^2 b r a n c h e s f i g , t g], e t c . , u n t i l t^=b h a s been r e a c h e d .

"The e r r o r i n t r o d u c e d by t h e i n t e r p o l a t i o n method i s dependent on t h e b e h a v i o r of the e q u a t i o n s . C o n s e q u e n t l y , t h i s method w i l l work w e l l f o r l i n e a r systems, b u t i s d i f f i c u l t t o a p p l y t o n o n l i n e a r systems. Moreover, N i e v e r g e l t d i d n o t propose the method as a p r a c t i c a l method f o r i n t r o d u c i n g p a r a l l e l i s m i n systems of d i f f e r e n t i a l e q u a t i o n s , but r a t h e r t o s t a r t a d i s c u s s i o n about p a r a l l e l a l g o r i t h m s f o r t h i s type o f problems.

The use of b l o c k - i m p l i c i t methods has been d e s c r i b e d by Rosser [ROSS67], Shampine and Watts [SHAM69], and Worland [WORL76]. I n a b l o c k - i m p l i c i t method r new v a l u e s are computed a t t h e same t i m e . B l o c k methods can be a p p l i e d as one-step methods o r p r e d i c t o r - c o r r e c t o r methods. I n t h e o n e - s t e p mode, the l a s t p o i n t i n a b l o c k i s used t o compute t h e f i r s t a p p r o x i m a t i o n s t o the 7» v a l u e s of the next b l o c k . Then i m p l i c i t methods are used i n

c a l c u l a t i n g more p r e c i s e a p p r o x i m a t i o n s t o t h e s o l u t i o n p o i n t s by i t e r a t i o n t o c o m p l e t i o n . Rosser [ROSS67] d e v e l o p e d a number o f b l o c k - i m p l i c i t

f o r m u l a s , some o f w h i c h must be a p p l i e d s e q u e n t i a l l y t o m i n i m i z e the number of f u n c t i o n e v a l u a t i o n s and t o speed up t h e convergence r a t e of t h e s e methods. Shampine and Watts [SHAM69] showed t h e convergence of these methods and s t u d i e d t h e s t a b i l i t y c h a r a c t e r i s t i c s of some p a r t i c u l a r methods. An example of a 2 p o i n t b l o c k scheme i s g i v e n by the f o l l o w i n g f o r m u l a s [WORL76];

(24)

Vn+r, O = Vn + r.h.fn v = 1,2 (2.3)

Vrn-l.e+1 = yn + Wl2)[5.fn + 8.f}

n+1, s n+2, s (2.4)

yn+2,s+i = yn + w w f n + _{n+1, e} _{n+2, s}] (2. 5)

where 8=0,1,2,.., S i s t h e i t e r a t i o n l e v e l and v i s t h e number of s o l u t i o n p o i n t s c a l c u l a t e d i n p a r a l l e l ( h e r e y>=2). Method ( 2 . 3 ) i s an e x p l i c i t

method used t o e s t a b l i s h a f i r s t a p p r o x i m a t i o n of t h e two s u c c e s s i v e s o l u t i o n p o i n t s . Then two i m p l i c i t methods, (2.4) and (2.5) a r e used t o c a l c u l a t e more p r e c i s e a p p r o x i m a t i o n s . I t must be n o t e d t h a t maximum s t a b i l i t y of t h e method c a n be o b t a i n e d by i t e r a t i o n t o c o m p l e t i o n . However, t h i s c a n c e l s a l o t of t h e advantages g a i n e d by u s i n g a p a r a l l e l method. On a p a r a l l e l computer yn+^ and yn+2 g + jare o b t a i n e d s i m u l t a n e o u s l y f o r each g by u s i n g

2 p r o c e s s o r s .

Both methods can a l s o be adapted t o p r e d i c t o r - c o r r e c t o r t y p e s of f o r m u l a s . In t h i s case a l l the v a l u e s of t h e p r e v i o u s b l o c k a r e used t o p r e d i c t t h e v v a l u e s of the next b l o c k . Shampine and Watts [SHAM69] showed t h a t a p r o p e r p r e d i c t o r - c o r r e c t o r c h o i c e l e a d s t o t h e same a s y m p t o t i c s t a b i l i t y b e h a v i o r as i t e r a t i o n t o c o m p l e t i o n . Worland [WORL76] d e r i v e d f o r m u l a s f o r t h e l e a d i n g l o c a l t r u n c a t i o n e r r o r terms f o r t h e p a r a l l e l method mentioned above and a s e q u e n t i a l b l o c k method, w h i c h proved t o be of t h e same o r d e r of magnitude.

Another approach has been f o l l o w e d by M i r a n k e r and L i n i g e r [MIRA67]. The i d e a i s t o compute t h e p r e d i c t o r and t h e c o r r e c t o r s i m u l t a n e o u s l y , but f o r d i f f e r e n t s o l u t i o n p o i n t s , and e v a l u a t e s p o i n t s a t t h e same t i m e . T h i s e n a b l e s t h e use of 2s p r o c e s s o r s . As an example c o n s i d e r t h e f o l l o w i n g p r e d i c t o r - c o r r e c t o r f o r m u l a s :

The c o m p u t a t i o n has been diagrammed i n F i g . 2.3a. I t i s c l e a r t h a t t h e p r e d i c t o r and c o r r e c t o r can o n l y be computed i n a s e q u e n t i a l o r d e r . The

(2.6) y° + (h/2)[fp. + f j an n+1 n (2.7) sequence of c o m p u t a t i o n i s f o r m u l a s a r e g i v e n by: n+1' e t c . The p a r a l l e l

(25)

,n+l ya , + 2h.f °n-l n 1n an-l ' J n J n-1 (2.8) (2.9)

The c o m p u t a t i o n a l sequence i s d e p i c t e d i n F i g . 2.3b. Here t h e sequence of c o m p u t a t i o n i s d i v i d e d i n t o two p a r t s : \ Pn+i ~ f ^n+i a n t^ which a r e

e x e c u t e d on s e p a r a t e p r o c e s s o r s . For t h i s method K a t z , F r a n k l i n , and Sen d e v e l o p e d an o p t i m a l l y s t a b l e p r e d i c t o r - c o r r e c t o r p a i r f o r t h e case s=l. A problem w i t h t h i s method i s the l i m i t e d s t a b i l i t y c h a r a c t e r i s t i c s . Of c o u r s e t h e s e c h a r a c t e r i s t i c s c o u l d be improved by a p p l y i n g the c o r r e c t o r more t h a n once, but then a l o t of t h e advantages of a p p l y i n g p a r a l l e l i s m are l o s t .

fig. 2.3 (a) Sequential and (b) Parallel predietor-correetor sahem.es.

F r a n k l i n [FRAN78] i n v e s t i g a t e d the e f f e c t i v e n e s s of the above-mentioned methods when s o l v e d by a number of g e n e r a l purpose p r o c e s s o r s i n t e r c o n n e c t e d by means of a s i n g l e or m u l t i p l e bus s t r u c t u r e . H i s c o n c l u s i o n i s t h a t t h e p r e f e r e n c e f o r one of the p a r a l l e l methods i s dependent on the problem c l a s s . An i m p o r t a n t parameter i s t h e c o u p l i n g f a c t o r of t h e e q u a t i o n s . Moreover, t h e c l e v e r n e s s of the a l l o c a t i o n p r o c e d u r e i s of i m p o r t a n c e . A benchmark showed t h a t f o r the two p r o c e s s o r case t h e p a r a l l e l methods are on t h e average 48 p e r c e n t f a s t e r than the c o r r e s p o n d i n g s e r i a l PC methods. I f more than two p r o c e s s o r s a r e used t h e speed up g a i n e d by adding more p r o c e s s o r s d e c r e a s e s r a p i d l y .

C o n s i d e r i n g a l l t h e s e p a r a l l e l i n t e g r a t i o n methods we can make t h e f o l l o w i n g remarks:

(26)

- S i n c e more t h a n one s o l u t i o n p o i n t per i n t e g r a t i o n s t e p i s c a l c u l a t e d the amount of work i n t h e f u n c t i o n e v a l u a t i o n i n c r e a s e s d r a s t i c a l l y . F u r t h e r , i t i s s t i l l t r u e t h a t the most time-consuming f u n c t i o n e v a l u a t i o n bounds the s o l u t i o n t i m e .

- I n some f o r m u l a s t h e r e i s a t h r e a t of c a n c e l l a t i o n . C a n c e l l a t i o n o c c u r s when two terms w i t h a p p r o x i m a t e l y the same v a l u e a r e

s u b t r a c t e d ( e . g . ?9y 'n~72y'n_j). S p e c i a l c a r e w i t h r e s p e c t t o s o l u t i o n

a c c u r a c y must be t a k e n i n e v a l u a t i n g t h e s e e x p r e s s i o n s , s i n c e t h e y f r e q u e n t l y occur i n p a r a l l e l i n t e g r a t i o n methods.

- The l i m i t e d s t a b i l i t y c h a r a c t e r i s t i c s of the p a r a l l e l methods a r e of some c o n c e r n when s t i f f systems must be s o l v e d . By u s i n g b l o c k methods t h e s t a b i l i t y improves w i t h the number of i t e r a t i o n s p e r f o r m e d , but t h e n a g a i n some of t h e advantages of t h e p a r a l l e l e v a l u a t i o n of m u l t i p l e s o l u t i o n p o i n t s are l o s t .

We draw here t h e c o n c l u s i o n t h a t the p a r a l l e l i n t e g r a t i o n methods d e s c r i b e d above may be v a l u a b l e f o r some c l a s s e s of problems. However, i n g e n e r a l , they do n o t a t t a c k the problem of the f u n c t i o n e v a l u a t i o n i n a p r o p e r way.

2.4 F u n c t i o n p a r t i t i o n i n g

I n t h i s s e c t i o n we w i l l i n t r o d u c e a new c l a s s of p a r a l l e l methods f o r i n i t i a l v a l u e p r o b l e m s , w h i c h are based on f u n c t i o n p a r t i t i o n i n g . The aim of the method i s t o speed up t h e c a l c u l a t i o n of f u n c t i o n s by a p p l y i n g p i p e l i n i n g t e c h n i q u e s .

G i v e n t h e f i r s t o r d e r homogeneous i n i t i a l v a l u e problem

y'(t) = f ( y ( t ) , t ) y(tQ) = t) (2.10)

T h i s d i f f e r e n t i a l e q u a t i o n c a n be s o l v e d n u m e r i c a l l y by u s i n g an a p p r o p r i a t e i n t e g r a t i o n method. N o r m a l l y , the c a l c u l a t i o n of b o t h t h e f u n c t i o n ƒ and t h e i n t e g r a t i o n f o r m u l a are implemented on a d i g i t a l computer as a delay

o p e r a t i o n , t h a t i s , a f t e r s u p p l y i n g t h e i n p u t arguments i t t a k e s a c e r t a i n t i m e t o produce a new a p p r o x i m a t i o n t o y. T h i s i s i l l u s t r a t e d i n F i g . 2.4a.

Suppose we p a r t i t i o n t h e whole p r o c e d u r e i n t o two function b l o c k s ( s e e F i g . 2.4b). The f i r s t f u n c t i o n b l o c k ( b l o c k I ) i s implemented as a d e l a y o p e r a t i o n w i t h e x e c u t i o n time T . The output of b l o c k I i s the new

(27)

integration and function evaluation n*1 < function block I function block II (p+1)T (a) PT (b)

Fig. 2.4 (a) Sequential computation of integration and function evaluation, (b) Parallel computation of integration and function evaluation.

a p p r o x i m a t i o n o f t h e s t a t e v a r i a b l e y; the i n p u t i s t h e p r e v i o u s l y c a l c u l a t e d a p p r o x i m a t i o n o f t h e s t a t e v a r i a b l e y and a number o f f u n c t i o n v a l u e s ( p o s s i b l y i n t e r m e d i a t e ) . Note t h a t t h e c a l c u l a t i o n of t h e f u n c t i o n ƒ i n (2.10) can be d i s t r i b u t e d over b l o c k I and b l o c k I I .

We c o n s i d e r o n l y i n t e g r a t i o n methods w h i c h use the a p p r o x i m a t i o n o f y c a l c u l a t e d i n t h e p r e v i o u s i n t e g r a t i o n s t e p . T h e r e f o r e , the minimum e x p r e s s i o n i n b l o c k I i s o f t h e form: yn+j= ayn+G, where a depends on t h e

n u m e r i c a l i n t e g r a t i o n method used and G i s a c o m b i n a t i o n o f t h e remainder o f t h e i n t e g r a t i o n f o r m u l a and the f u n c t i o n e v a l u a t i o n f.

The second f u n c t i o n b l o c k ( b l o c k I I ) has as b a s i c c h a r a c t e r i s t i c s t h a t : - each time s t e p T a l l o u t p u t s o f t h e f u n c t i o n b l o c k produce a f u n c t i o n

v a l u e .

- each time s t e p T t h e i n p u t arguments o f t h e f u n c t i o n b l o c k a r e a b s o r b e d . In g e n e r a l , d u r i n g a c e r t a i n time i n t e r v a l t h e i n t e g r a t i o n s t e p s i z e h=B.r, t ,-,-t =h.is f i x e d , where B i s t h e time s c a l e c o n s t a n t .

n+1 n '

I f the two f u n c t i o n b l o c k s a r e e x e c u t e d by s e p a r a t e p i e c e s o f hardware the o p e r a t i o n i s performed i n a p i p e l i n e d way, where t i s the e x e c u t i o n time o f a p i p e l i n e segment. I f the o r i g i n a l c a l c u l a t i o n t o o k (p+l)i time s t e p s ( F i g . 2.4a), t h e c a l c u l a t i o n a c c o r d i n g t o F i g . 2.4b r e s u l t s i n a speed up f a c t o r p+1.

sequential and hybrid pipelines

We w i l l c o n s i d e r two forms o f t h e f u n c t i o n b l o c k I I :

1. A l l f u n c t i o n arguments have an e q u a l d e l a y from the i n p u t s t o t h e o u t p u t s o f t h e f u n c t i o n b l o c k . C o n s e q u e n t l y , a l l arguments belong t o t h e same i n d e x o f the i n d e p e n d e n t v a r i a b l e . A f u n c t i o n i m p l e m e n t a t i o n h a v i n g these c h a r a c t e r i s t i c s i s d e f i n e d a s sequential pipeline ( f i g . 2 . 5 a ) . Another method i s t o d u p l i c a t e a d e l a y type o f f u n c t i o n i m p l e m e n t a t i o n p times and

(28)

m u l t i p l e x the r e s u l t s ( F i g . 2.5b). S i n c e the e f f e c t of t h i s method i s the same as t h a t of the s e q u e n t i a l p i p e l i n e , we w i l l use o n l y the term ' s e q u e n t i a l p i p e l i n e ' i n the remainder of the t e x t .

2. The f u n c t i o n arguments have a n o n e q u a l d e l a y from the i n p u t s t o the o u t p u t s of the f u n c t i o n b l o c k . A f u n c t i o n i m p l e m e n t a t i o n h a v i n g t h e s e c h a r a c t e r i s t i c s has been c a l l e d by Dekker [DEKK76b,DEKK81] a hybrid pipeline ( F i g . 2 . 5 c ) .

Fig. 2.5 (a) sequential, (b) multiplexed, and (a) hybrid pipelining.

We assume t h a t the o p e r a t i o n s performed i n a s e c t i o n of the p i p e l i n e are the normal a r i t h m e t i c o p e r a t i o n s on complete o p e r a n d s . We do not c o n s i d e r o p e r a t i o n s on p a r t s of operand v a l u e s , such as d i s c u s s e d by A n d r i e s s e n [ANDR81].

The i m p l e m e n t a t i o n of the f u n c t i o n ƒ i n (2.10) and of the i n t e g r a t i o n f o r m u l a as d e p i c t e d i n F i g . 2.4b i m p l i e s t h a t the c a l c u l a t e d v a l u e s of the f u n c t i o n ƒ have a t l e a s t a d e l a y of one i n t e g r a t i o n s t e p . The d e l a y s i n t r o d u c e d by the p i p e l i n e w i l l cause a d d i t i o n a l e r r o r s i n the s o l u t i o n of ( 2 . 1 0 ) . We w i l l now c o n s i d e r the e f f e c t s of t h e s e e r r o r s on the s o l u t i o n a c c u r a c y and the s t a b i l i t y c h a r a c t e r i s t i c s of the a p p l i e d n u m e r i c a l method. A n d r i e s s e n and de Swaan Arons [ANDR83] c o n c l u d e d t h a t as soon as t h e r e i s one s t e p d e l a y i n the c a l c u l a t i o n of ƒ an e r r o r o f 0(h) i s made, w h i c h i n g e n e r a l w i l l not d i s a p p e a r . T h e r e f o r e , a h i g h e r o r d e r i n t e g r a t i o n method than the E u l e r method i s u s e l e s s , because the e r r o r s i n t r o d u c e d by the d e l a y a r e of the same o r d e r as the t r u n c a t i o n o r d e r of the E u l e r i n t e g r a t i o n method. As w i l l be shown t h i s i s , i n g e n e r a l , t r u e f o r a h y b r i d p i p e l i n e f u n c t i o n e v a l u a t i o n , but not n e c e s s a r i l y f o r a s e q u e n t i a l p i p e l i n e f u n c t i o n e v a l u a t i o n .

(29)

As i n t e g r a t i o n method we c o n s i d e r o n l y t h e c l a s s of e x p l i c i t l i n e a r m u l t i s t e p methods. For o t h e r types of f o r m u l a s a s i m i l a r d i s c u s s i o n can be h e l d . I m p l i c i t methods can a l s o be implemented i n t h e way d e s c r i b e d above. I m p l i c i t methods a r e used m o s t l y t o i n c r e a s e t h e s t a b i l i t y c h a r a c t e r i s t i c s o f t h e method. S i n c e p i p e l i n e d methods have l i m i t e d s t a b i l i t y c h a r a c t e r i s t i c s ( s e e next s e c t i o n ) i m p l i c i t methods w i l l i n t h i s case be l e s s e f f i c i e n t .

sequential pipelines

E q u a t i o n (2.10) can be s o l v e d n u m e r i c a l l y by t h e f o l l o w i n g l i n e a r e x p l i c i t m u l t i s t e p method, w h i c h has been m o d i f i e d t o i n c l u d e t h e p i p e l i n e l e n g t h p;

k k

T a--y • = h- T P.-f . Vi>0, 8 = 0 , a*0 (2.11)

• „ 3 n+3 . „ 3 n+o-p k k

0=0 3=0

The parameters a • and 8. a r e t h e c o e f f i c i e n t s o f t h e l i n e a r m u l t i s t e p 3 3

method, h i s t h e s t e p s i z e , and p i s t h e p i p e l i n e l e n g t h of t h e f u n c t i o n b l o c k I I .

To d e t e r m i n e the c o e f f i c i e n t s of t h e method (2.11) y(t+jh) and f(t+(j-p)h) a r e expanded i n a T a y l o r s e r i e s about t and s u b t r a c t e d y i e l d i n g

t h e l o c a l t r u n c a t i o n e r r o r L[y(t);h] k_ = Tc .hqy(q) (t) (2.12) a .y(t+jh) - h. ?..f(t+d-p)h) 3 3 q=0 q L[y(t);h] = V 3=0 where y(t) i s an a r b i t r a r y f u n c t i o n , c o n t i n u o u s l y d i f f e r e n t i a b l e on t h e c o n s i d e r e d s o l u t i o n i n t e r v a l . The terms a r e t h e c o e f f i c i e n t s of t h e T a y l o r s e r i e s , and y^q^(t) i s t h e q-th d e r i v a t i v e o f y ( t ) . The method i s of t h e o r d e r

s i f Off=Gf= =ce=° and Cs+l*° and is called c o n s i s t e n t ( s e e [LAMB73] ) i f s^-1.

The method i s c o n v e r g e n t , t h a t i s L[y(t);h]^0 when h—0, i f i t i s c o n s i s t e n t and t h e r o o t s of p(r)=0 s a t i s f y | r > ^ ( i = = 0 , l , 2 , ...,k, and a l l r o o t s w i t h modulus one a r e s i m p l e , where p ( r ) i s t h e f i r s t c h a r a c t e r i s t i c p o l y n o m i a l d e f i n e d by p(v)= as.r**7 .

3=0 J

The c o e f f i c i e n t s o f t h e T a y l o r s e r i e s can be e x p r e s s e d i n terms of

(30)

c

7

= T

Cg. a -

« J

«7=0 <7 ¿7

(2.23;

« 7 ^ (q-l)l

qzl

Two methods w i l l now be c o n s i d e r e d .

I . e=2 2 1 <*0=-l

C2= i + p a c c o r d i n g t o (2.13)

F o r m u l a : yn+1 = yn + h.fn_p

T h i s i s t h e p i p e l i n e d E u l e r method. The method i s c o n v e r g e n t . The l e a d i n g e r r o r c o e f f i c i e n t Cg i n c r e a s e s l i n e a r l y w i t h p. I I . s=2 a2= 1 ß2= p(l-a) + \(Z-a) <*!= -1-a ßg= p(a-l) - \ - ha a(F a C3= hp2 (1-a) + v + 5/12 + (1/12) a The l e a d i n g e r r o r c o e f f i c i e n t depends q u a d r a t i c a l l y on p. By t a k i n g a=l, t h i s q u a d r a t i c term c o u l d be removed. However, f o r convergence p(v)=(v-l)(v-a)=0 must h o l d , l e a d i n g t o -l^a<l. The c h o i c e o f

a a l s o i n f l u e n c e s t h e s t a b i l i t y ( s e e S e c t i o n 2 . 5 ) . F o r a=l/2, we g e t t h e f o r m u l a yn+2= i'3yn+1-yn> + (h/4)[(2P+s)fn_p+1 + (-2P-3)fn_pj. and f o r a=0 we g e t yn+2= yn+i+ wwupuv^p+i - UP+^fn-p1 which i s a s e q u e n t i a l p i p e l i n e d Adams B a s h f o r d f o r m u l a . In g e n e r a l , we a r e i n t e r e s t e d i n systems o f e q u a t i o n s such as ( 2 . 1 ) . Because the c o n s i s t e n c y and t h e convergence o f a method depend o n l y on t h e

(31)

c o e f f i c i e n t s o f t h e method, t h e p r e v i o u s l y d e s c r i b e d f o r m u l a s can a l s o be a p p l i e d t o systems o f d i f f e r e n t i a l e q u a t i o n s i n v e c t o r n o t a t i o n . I t i s n o t n e c e s s a r y t o d e f i n e one system's p i p e l i n e l e n g t h ; each f i r s t o r d e r e q u a t i o n can have i t s own p i p e l i n e l e n g t h and i n t e g r a t i o n f o r m u l a .

Note: A f t e r t h i s work had been performed t h e a u t h o r l e a r n e d t h a t Yen and Cook [YEN82] had i n d e p e n d e n t l y d e s c r i b e d t h e t e c h n i q u e of f u n c t i o n p a r t i t i o n i n g . However, they have c o n s i d e r e d o n l y t h e c a s e p=l.

hybrid pipelines We w i l l c o n s i d e r t h e e f f e c t s of a h y b r i d p i p e l i n e f u n c t i o n e v a l u a t i o n u s i n g i t s s i m p l e s t form f hyb F^n-q' &n-p p,q=0,l,2, ...,K (2.14) F i g . 2.6 i l l u s t r a t e s t h e h y b r i d p i p e l i n e f u n c t i o n e v a l u a t i o n . The two i n p u t arguments o f t h e f u n c t i o n b l o c k I I have a d e l a y o f p and q time s t e p s t o the o u t p u t of t h e f u n c t i o n b l o c k . To d e t e r m i n e the c o e f f i c i e n t s of t h e l i n e a r m u l t i s t e p i n t e g r a t i o n method t h e f u n c t i o n F[y(t-qh) ,y(t-ph)J has t o be expanded i n a t w o - d i m e n s i o n a l T a y l o r s e r i e s of t h e form F[y(t-ph),y(t-qh)]= f [ y ( t ) ] + h[-p~ -q~^ l.Fly(x),y(z)] (2.1 5) + h2r-p£ - q j ^ J2. r [ y ( x ) , y ( z ) j mi

mi

The c o e f f i c i e n t s CQ and a r e t h e same as i n ( 2 . 1 2 ) . The c o e f f i c i e n t i s of t h e form k 3=0 \(ÔV >A(-T»3)j^+ (-q+3) d cz Fly(x),y(z)]/f'[y(t)l

r=i

(2.16)

C a n c e l l a t i o n o f Cg f o r method I I (s=2) can be a c c o m p l i s h e d o n l y i f one o f t h e f o l l o w i n g c o n d i t i o n s i s met:

2. dF/dx=dF/dz

3. BF/Bx and dF/dz a r e e x p l i c i t l y known.

C o n d i t i o n ( 1 ) l e a d s t o a=l i n the g e n e r a l s e c o n d - o r d e r e x p l i c i t l i n e a r m u l t i s t e p method. However, t h e method i s n o t c o n v e r g e n t f o r a=l, so c o n d i t i o n (1) cannot be r e a l i z e d .

(32)

f=F(y ,y ) n *yn - q ' * n - p ' « • 1 ' « • 1 yn +i = i i—

*

yn +i = 1 1 q sections p sections

Fig. 2.6 Hybrid pipeline function implementation.

C o n d i t i o n s (2) o r (3) c a n sometimes be met. However, i n a l l c a s e s the c o e f f i c i e n t s o f t h e method become dependent on t h e f u n c t i o n F. Thus, i n g e n e r a l , we may c o n c l u d e t h a t f o r h y b r i d p i p e l i n e f u n c t i o n i m p l e m e n t a t i o n s t h e E u l e r i n t e g r a t i o n method i s t h e o n l y p r a c t i c a l method to be a p p l i e d . 2.5 S t a b i l i t y o f p i p e l i n e d methods The s t a b i l i t y c h a r a c t e r i s t i c s o f t h e n u m e r i c a l i n t e g r a t i o n method a r e r e l a t e d m o s t l y t o t h e s o l u t i o n o f t h e l i n e a r e q u a t i o n y'= Ay X = constant X< 0 (2.17) The b e h a v i o r o f t h e r o o t s r^, i=0,l,2,..,p o f t h e s t a b i l i t y p o l y n o m i a l S(r,Ti)=0, h = Ah d e t e r m i n e s t h e s t a b i l i t y c h a r a c t e r i s t i c s o f t h e method. The s t a b i l i t y p o l y n o m i a l has been d e f i n e d by

S(r,h) = p(r) - h.a(r) (2.18)

where p(r) and a(r) a r e t h e f i r s t and second c h a r a c t e r i s t i c p o l y n o m i a l s . The system i s c a l l e d a b s o l u t e l y s t a b l e i f a l l r o o t s |t».[<X, i=0,l,2, . .,p (see [LAMB73] ) . An i n t e r v a l (a,b) o f t h e r e a l l i n e i s s a i d t o be

an i n t e r v a l o f a b s o l u t e s t a b i l i t y i f t h e method i s a b s o l u t e l y s t a b l e f o r a l l hs(a, b).

(33)

stability of sequential pipelines

For t h e p i p e l i n e d E u l e r method (method I ) t h e s t a b i l i t y p o l y n o m i a l i s o f the form

h = vP+1 - rP (2.19)

The p o i n t s a t which t h e r o o t l o c i o f (2.19) c r o s s t h e u n i t c i r c l e can be found by s u b s t i t u t i n g r=ev®. F o r t h i s method we have

h(0) = aos((p+l)0) - aos(p0) + i[sin((p+l)6) - sin(pG)] (2.20)

= -2.sin(id) ^sin((p+\)6) - i.aost ( p + l w j

The r e a l a x i s i s c r o s s e d f o r 0^n.2n, o r Q^i71 ± n.2n)/(2p+l), n=0,1, 2, .. , e t c .

At 0V h(01)=0, w h i l e a t 92, h( 6'2)=-2.sinl\n/'(2p+l)] (n=0 r e s u l t s i n t h e

s m a l l e s t n e g a t i v e v a l u e f o r h). These c r o s s i n g p o i n t s o n l y d e t e r m i n e t h e boundary o f t h e s t a b i l i t y i n t e r v a l o r i n t e r v a l s . By computing a number o f spot v a l u e s i t can be d e t e r m i n e d whether o r n o t a l l r o o t s w i t h i n t h e i n t e r v a l l i e w i t h i n t h e u n i t c i r c l e . The i n t e r v a l of a b s o l u t e s t a b i l i t y i s i n t h i s case (-2.sinL\n/(2p+l)], 0). The r e s u l t s a r e shown i n F i g . 2.7. For t h e second

o r d e r p i p e l i n e d method (method I I ) t h e s t a b i l i t y boundary has been d e t e r m i n e d by u s i n g a n u m e r i c a l r o o t - f i n d i n g method.

The s t a b i l i t y boundary h d e c r e a s e s e x p o n e n t i a l l y w i t h t h e p i p e l i n e l e n g t h p. T h i s means t h a t i f we want t o have t h e same s t a b i l i t y c h a r a c t e r i s t i c s of t h e p i p e l i n e d method compared t o t h e n o n - p i p e l i n e d method, t h e s t e p l e n g t h h must a l s o be d e c r e a s e d e x p o n e n t i a l l y .

stability of hybrid pipelines

For h y b r i d p i p e l i n e f u n c t i o n i m p l e m e n t a t i o n s t h e s t a b i l i t y i n t e r v a l i s , i n g e n e r a l , h a r d t o d e t e r m i n e . Dekker [DEKK81] c o n s i d e r e d t h e n u m e r i c a l s o l u t i o n o f i m p l i c i t f u n c t i o n s , where t h e f u n c i o n e v a l u a t i o n was implemented as a c l o s e d l o o p h y b r i d p i p e l i n e . He c o n c l u d e d t h a t t h e s t a b i l i t y i n t e r v a l can be e n l a r g e d by c h o o s i n g a s u i t a b l e h y b r i d p i p e l i n e f u n c t i o n i m p l e m e n t a t i o n . K e r c k h o f f s [KERC81] c o n s i d e r e d t h e s t a b i l i t y c h a r a c t e r i s t i c s of the e q u a t i o n yn+2= Vn + h'F[yn>G(yn_2)]• A c c o r d i n g t o h i s d e r i v a t i o n t h e maximal s t a b i l i t y

i n t e r v a l i s reached f o r dF/dy 3[dF/dG].[dG/dy]

(34)

Fig. 2.7 (a) Stability and (b) error characteristics of sequential pipelined methods.

Here we w i l l draw some i n d i c a t i v e c o n c l u s i o n s f o r t h e f u n c t i o n F d e f i n e d i n e q u a t i o n (2.14) on t h e b a s i s o f a number of s p e c i f i c c a s e s . Suppose t h e e q u a t i o n (2.17) i s s o l v e d by u s i n g t h e E u l e r i n t e g r a t i o n method, r e s u l t i n g i n

yn+i= yn + h xvn ( 2-2 1 )

We now r e p l a c e t h e f u n c t i o n f(yn)=-^yn bytne f u n c t i o n

F(yn_q,yn_p) = ai*yn-q + *2xyn-v w i t h a i + a s m l ( 2'2 2 )

T h i s i s t h e h y b r i d p i p e l i n e f u n c t i o n i m p l e m e n t a t i o n o f e q u a t i o n (2.17) as i l l u s t r a t e d i n F i g . 2.5c. The s t a b i l i t y p o l y n o m i a l becomes T- r-1 (2.23) -q -P a^.r 1 + a^.r c

(35)

Note t h a t an a r b i t r a r y f u n c t i o n f(u,v) can be l i n e a r i z e d around UQ,VQ

r e s u l t i n g i n the same s t a b i l i t y p o l y n o m i a l as (2.23) w i t h a^X b e i n g r e p l a c e d by 'f/ru\,, „ and a9A by Sf/dv\,, „ . S u b s t i t u t i o n of v=elQ i n (2.23) g i v e s

\u0'u0 * 'u0> 0

Re[h(8)]= EajCaoedq+DB) - aoe(qe)] + a2Caos((p+l)8) - aos(p0)]]/N

Im[h(e)]= [a^sint (q+l)8) - s-in(qO)] + a^sini (p+l)Q) - sin(p8)]]/N

w i t h N = a22 + a2 + 2a^a2 eos((p-q)8) (2.24)

The s o l u t i o n of (2.24) f o r a r b i t r a r y v a l u e s of a^a,,, p, and q i s not a s i m p l e a n a l y t i c a l f u n c t i o n . We w i l l a n a l y z e some s p e c i f i c c a s e s .

a. p=l and q=0. The r e a l a x i s i s c r o s s e d a c c o r d i n g t o (2.24) when ein6=0 or cos0=(,a2-a-z)/2a2. S u b s i t u t i o n i n Re[h(0)] of (2.24) r e s u l t s i n t h e

f o l l o w i n g boundary p o i n t s : h(6)=0, h(8 )=-2/(2a-L-l), and h(0)=-!/(l-aj). I n

F i g . 2.8 aj i s p l o t t e d a g a i n s t h. The h a t c h e d a r e a i s the s t a b i l i t y i n t e r v a l r e g i o n . T h i s p l o t shows t h a t the l e n g t h of the s t a b i l i t y i n t e r v a l depends on the c h o i c e o f and a%. For a^=0.75and a2=0.25 the s t a b i l i t y i n t e r v a l i s

(36)

a c c o r d a n c e w i t h the above mentioned r e s u l t d e r i v e d by K e r c k h o f f s [KERC81]. For O<a^<0. 5 and f o r 2<a^<2.75 the s t a b i l i t y i n t e r v a l of the h y b r i d p i p e l i n e l i e s between the v a l u e s of the s t a b i l i t y i n t e r v a l s of a s e q u e n t i a l p i p e l i n e w i t h p=l and p=0.

b. a^=a,,=!. The e x p r e s s i o n of (2.24) f o r a^=a2 can be r e w r i t t e n as

h(0) = -2sin(\0)\sin[\(p+q+l)0] - i.aosCi(p+q+1)0]}/N (2.2 5) w i t h N = aos[\(p-q)0] The r e a l a x i s i s c r o s s e d when 02 = n.2n or 02 = JLjJl£L ( 2,2 6 ) Sub s t i t u t i o n of 0^ i n (2.25) r e s u l t s i n T j f f l j ^ O . S u b s t i t i o n of 02 g i v e s h(0£) = -2(-l) nsin(\02)/oosLh(p-q) 0 ] (2.27)

Now \oos[\(p-q)02]\^l; t h e r e f o r e , \sin(\02)/oost\(p-q)02l\>>\sin(\02)\ .

S i n c e | e i n ( \ 02) \ has a minimum v a l u e f o r n=0 a bound h(0^) f o r h(0 ) i s

g i v e n by

h (0p = -2sin[\n/(p+q+l)] (2.28)

From (2.28) i t f o l l o w s t h a t f o r the case aj=a2 the bound on the s t a b i l i t y

i n t e r v a l d e c r e a s e s as q runs from 0 t o p.

c. Other examples. For s p e c i f i c v a l u e s of p and q the s t a b i l i t y can be determined by n u m e r i c a l l y computing the r o o t l o c i . In F i g . 2.9 the s t a b i l i t y r e g i o n i s shown f o r some v a l u e s of p and q.

A c o n c l u s i o n based on the above d i s c u s s i o n i s t h a t h y b r i d p i p e l i n e d f u n c t i o n i m p l e m e n t a t i o n can improve t h e s t a b i l i t y c h a r a c t e r i s t i c s of the E u l e r

i n t e g r a t i o n method as compared to s e q u e n t i a l p i p e l i n e f u n c t i o n i m p l e m e n t a t i o n , p r o v i d e d t h a t s u i t a b l e c h o i c e s of a^ an^ a2 are m a (le*

For systems of f i r s t - o r d e r d i f f e r e n t i a l e q u a t i o n s the s t a b i l i t y p o l y n o m i a l remains of the form ( 2 . 2 3 ) . However, h= X^.h , where a r e the e i g e n v a l u e s

Design aspects of a distributed MIMD processor

Design Aspects of a

Design Aspects of a

Distributed MIMD Processor

P R O E F S C H R I F T

ter verkrijging van de graad van doctor

in de technische wetenschappen aan

de Technische Hogeschool Delft, op

gezag van de rector magnificus

Prof. ir. B. P. Th. Veltman, in het

openbaar te verdedigen ten overstaan

dinsdag 13 maart 1984 te 16.00 uur

van het College van Dekanen op

H E N D R I C U S J O H A N N E S SIPS

door

n

CONTENTS

Chapter 1

Introduction

Chapter 2

Parallelism in Differential S y s t e m s

-

c

= T

« J

mi

r=i

*