Szeregowanie zadań jednorodnie podzielnych w heterogenicznych systemach rozproszonych

(1)

Fa ulty of Mathemati s and Computer S ien e

Joanna Berli«ska

S heduling divisible loads

in heterogeneous distributed systems

Ph.D. Thesis

Supervisor: Prof. Dr. Habil. Ma iej Drozdowski

(2)

IwishtoexpressmydeepgratitudetomysupervisorProfessorMa iejDrozdowski

for hiskeen interest,inspiration andperfe tguidan e throughoutthe ompletion

of this thesis. He introdu edme to the ex itingeld of divisible load theoryand

motivated me to ondu t original resear h with high standards. I am sin erely

grateful for hiseort intraining me tobe omea su essful resear her.

Theresear hreportedinthisthesishas beennan iallysupportedbythe

Pol-ishMinistryofS ien e and HigherEdu ation grants NN206372039"S heduling

divisible loads in heterogeneous distributed systems" and N N519 188933 "New

problems of s hedulingtheory omplexity analysis, algorithmization".

The work presented in this thesis has been also partially supported by the

(3)

1 Introdu tion 6

2 Single-Round Pro essing 10

2.1 Earlier Results . . . 13

2.2 FPTASfor Problem DLS{

C

i

= 0

}-Opt

V

. . . 14

2.3 FPTASfor Problem DLS{

C

i

= 0

}-Opt

T

. . . 18

2.4 Communi ation Sequen e for ProblemDLS{1Round} . . . 24

2.5 ApproximationAlgorithms for ProblemDLS{1Round} . . . 29

2.5.1 Problem DLS{1Round}-Opt

V

. . . 29

T

. . . 34

2.6 Con lusions . . . 35

3 Multi-Round Pro essing with Limited Memory 36 3.1 Earlier Results . . . 37

3.2 ProblemFormulation . . . 38

3.3 Bran h&Bound Algorithmand Geneti Algorithm . . . 43

3.3.1 Bran h&Bound Algorithm . . . 45

3.3.2 Geneti Algorithm . . . 46

3.3.3 Comparison of B&Band GA. . . 49

3.4 Properties of the Solutions . . . 52

3.4.1 Depth of Overlap . . . 53

(4)

3.4.4 DominatingSet of Pro essors . . . 67

3.4.5 Chunk Size Saturation . . . 74

3.4.6 When Is ItHard toFind a Good Solution? . . . 76

3.4.7 Con lusions . . . 80

3.5 Heuristi s . . . 82

3.5.1 Random Heuristi s . . . 82

3.5.2 First Free Heuristi . . . 84

3.5.3 Appender Heuristi s . . . 84

3.5.4 Best Rate Heuristi s . . . 86

3.6 Comparison of the Heuristi Algorithms . . . 87

3.6.1 Load Size . . . 88 3.6.2 Startup Time . . . 90 3.6.3 Communi ationRate . . . 92 3.6.4 Memory Limit. . . 93 3.6.5 ComputationRate . . . 95 3.6.6 Parameters Dispersion . . . 95 3.6.7 Performan e Dispersion . . . 97 3.7 Summary . . . 98 4 MapRedu e Computations 100 4.1 Outlineof MapRedu e . . . 100

4.2 Mathemati al Model of MapRedu e . . . 102

4.3 S hedule Dominan eProperties . . . 108

4.3.1 Pro essing with aSingle Redu er . . . 108

4.3.2 Pro essing with Many Redu ers . . . 113

4.4 S hedulingAlgorithms . . . 118

4.4.1 Single Redu er . . . 118

(5)

4.6 Summary . . . 136

5 Multilayer Divisible Appli ations 138 5.1 Model of MultilayerAppli ations . . . 138

5.2 S hedulingAlgorithms . . . 143

5.2.1 Load PartitioningforRedu er Layers . . . 143

5.2.2 Load PartitioningforMapper Layer. . . 146

5.2.3 The Complete Load PartitioningAlgorithm . . . 148

5.2.4 Finishing MapperComputations Order . . . 148

5.2.5 S hedulingCommuni ations . . . 152

5.3 ComputationalExperiments . . . 159

5.3.1 Speedup of MultilayerAppli ations . . . 159

5.3.2 Load Distribution between Redu ers . . . 161

5.3.3 Load Distribution between Mappers . . . 165

5.4 Summary . . . 168

6 Summary and Con lusions 170

(6)

The progress in many dis iplines of s ien e and te hnology is nowadays strongly

supported by omputationalmethods. The resear hisoftenbased onthe results

deliveredby omplexandtime- onsuming al ulations. The omputationalpower

of a single omputer is often insu ient. Hen e, performing the omputations

in distributed environments like grids or lusters be omes a ne essity. What is

more,usingadistributed omputersystemhas manyadvantages. Largenumbers

of pro essors taking part in omputations result in big total omputing power.

The system is s alable and the time needed for omputations an be redu ed

by employing more pro essors. On the other hand, ontrolling omputations in

a distributed system is more omplex. In order to obtain high e ien y, the

distributed appli ations need areful s heduling of ommuni ations and

ompu-tations. As the omputers may bespread around the world, the ommuni ation

delays maybequitebigand annotbenegle ted. The distributed omputer

sys-tem is usually heterogeneous, and onsequently, the dierent parameters of its

elements must be taken into a ount by the s heduling algorithms.

Divisibleloadtheory(DLT)isamodelofparallel omputationswhi hoersa

realisti approa htothisproblem. Itismostly usedtorepresentpro essinglarge

amounts of data in distributed systems. It assumes that the input data, alled

load, an be divided into pie es of arbitrary sizes and these pie es an be

pro- essed independently in parallel onremote omputers. The divisible load model

(7)

work of intelligentsensorswas studied. In both ases, the analyzed problemwas

howtos hedule ommuni ationsand omputations,sothatthetotaltimeneeded

topro ess the loadof agivensize is asshortaspossible. On the onehand, using

more pro essors redu es omputation time,but on the other handit needs more

ommuni ations,whi h osttime. Hen e,the problemiswhi hpro essorsshould

beused andwhat load quantitiesthey shouldre eive. The mathemati almodels

proposed in the early publi ations were omputationally tra table and redu ed

the s heduling problem to a set of linear equations. Later on, more omplex

models were developed and appliedtovariousnetworktopologies[16,20,21, 25℄,

systems with memory limitations[12, 30, 37℄, omputation osts [46℄ and other.

The most general divisible load s heduling problem was proved to be NP-hard

in [48℄. Surveys of divisible load theory an befound, e.g., in[3, 14, 24, 45℄. We

dis uss these results inmore detail inthe following se tions.

Therearemanyexamplesofdivisibleload omputations,likepro essing

mea-surement data [20℄, sear hing for patterns in text and database les [28℄, image

and video pro essing [38, 39, 43℄, solving linear algebra problems [22, 32℄, DNA

sequen e alignment [47℄. As we showed in [7, 10℄, pro essing large amounts of

data in MapRedu e model [23℄ on dedi ated lusters an also be analyzed on

the grounds of divisible load theory. Moreover, the omputations on volunteer

platforms like BOINC and distributed.net fulll the assumptions about the

di-visibility and independen e of the load grains. Therefore, the progress in DLTis

useful ine iently managingmany real distributed appli ations.

The maingoal of this work isthe analysisof several divisibleload s heduling

problemsinheterogeneousdistributedsystemsandthe onstru tionofalgorithms

solving these problems. As the analyzed problems are known to be

omputa-tionally hard, we will propose approximation algorithms and heuristi s. The

(8)

s hedule omputationsinnew parallelpro essingenvironments,likethe

MapRe-du e framework. Wewill onstru t a mathemati almodel of su h omputations

and propose s heduling algorithms. Performan e limits of the proposed

organi-zation of omputationswill beinvestigated.

The stru tureof this thesis isthe following. Chapter 2 isdedi ated to

single-round divisible load s heduling. In the single-round pro essing ea h omputer

re eives at most one messagewith the data topro ess. The s hedulingproblem

iswhi hpro essorsshouldtakepart in omputations,whatamountsofdatathey

should re eive and inwhat order. Our main ontributions presented inChapter

2 are fully polynomialtime approximation s hemes fortwos heduling problems.

These results have been already published in [6℄. Extensions to more general

ases are alsoanalyzed.

Chapter3 overs multi-rounddivisibleloads hedulinginsystemswithlimited

memory. Multi-roundpro essing meansthat ea h pro essor an re eive multiple

messageswithdatatopro ess. Itisassumedthatthewholeloadistoobigtostore

it in the memories of the omputers at the same moment. Therefore, the load

must bedistributedand pro essedinmanysmall pie esttingavailablememory

buers. Weprovideanexperimentalstudy ofthe features ofnear-optimum

solu-tions, and hen e, the nature of the s heduling problem. Based on these results,

several groups of heuristi s solving the analyzed problems are proposed. Their

advantages and weaknesses are demonstrated for a wide range of hanging

sys-tem parameters. The experimental omparison of the proposed algorithms with

the heuristi s known fromearlier literatureshows thata bigimprovementinthe

quality of the obtained solutions has been a hieved. The results ontained in

Chapter 3have been published in [8,9, 11, 12℄.

Chapter 4 introdu es MapRedu e paradigm for parallel omputations. We

(9)

appli a-formulatethe mathemati almodel ofsu h omputationsand propose s heduling

algorithms. Then, an experimental analysis of the MapRedu e performan e is

provided. Theseresultshavebeen publishedin[7,10℄. It wastherst timewhen

s heduling divisible loads withpre eden e onstraints was studied.

InChapter5theproblem onsideredinChapter4isgeneralized. Weintrodu e

thenotionofamultilayerappli ation. Anexampleofamultilayerappli ationisa

hainofMapRedu eappli ations,su hthatoneappli ationinthe hainprodu es

input for the next appli ation. The inuen e of the system parameters on the

stru ture of the s hedules isstudied.

The last hapter ontains a summary of all the presented results. We also

propose dire tions for future resear h on the aspe ts of divisible load theory

(10)

Inthis hapterwestudydivisibleloads hedulingforsingle-roundorganizationof

omputations. Let usstart with some generalassumptions about the omputing

environment. In this work we assume that ea h pro essor omprises a CPU,

somememoryandahardwarenetworkinterfa e(e.g. NICandDMA).Thewords

pro essor, omputer and pro essing element willbe used inter hangeably, unless

said to be otherwise. The CPU and network interfa e an work in parallel, so

that simultaneous omputation and ommuni ation is possible. Ea h omputer

an ommuni ate with at most one pro essor at a time (i.e. so- alled one-port

model is used).

In Chapters 2 and 3 we onsider lassi al divisible load s hedulingproblems

in a star network (see Fig. 2.1). The load tobe pro essed is initiallylo ated on

pro essor

P

0

alledtheoriginator,lo atedinthe enterofthestar. Theoriginator is onne ted to a set of

m

pro essors (workers)

{P

1 , . . . , P

m

}

. The originator dividestheloadintopie esandsendsthemdire tlytotheworkers. Su halogi al

topology an represent many parallel systems with dierent physi al topologies,

likeagrid ofmultipro essorsuper omputers, a lusterof workstations onne ted

via a lo alarea network, or aset of pro essorssharing a bus inan SMP system.

We assume that the originator only dispat hes the load to the other pro essors

and performs no omputations. In the opposite ase, the omputational power

of the originator an be represented asanadditionalpro essor. Forsimpli ity of

(11)

analyzed. Pra ti ally, it means that the results returning time is short and an

benegle ted. It has been shown in[18,28℄ thatthis simpli ationisnotlimiting

the generality of our onsiderations, as sending results ba k an be in luded in

the model.

Ea h worker

P

i

is des ribed by its omputing rate (inverse of speed, e.g. in se onds perbyte), denoted by

A

i

. Pro essingloadofsize

α

on

P

i

takestime

αA

i

. The ommuni ation link between

P

i

and the originator is des ribed by startup time

S

i

(e.g. in se onds) and ommuni ation rate (inverse of bandwidth)

C

i

. Hen e, the time required to send load of size

α

to pro essor

P

i

is

S

i

+ αC

i

. We will use the notation

A

max

= max

1≤i≤m

A

i

,

A

min

= min

1≤i≤m

A

i

, and similarly for the other parameters. In the general ase, all parameters

A

i

,

C

i

,

S

i

are nonnegative rationalnumbers.

Below we formulate several single-round divisible load s heduling problems.

Wefollowthenotationusedin[48℄,wheredierentdivisibleloads heduling

prob-lems are denoted by DLS

{restriction}

. The restri tion is the list of additional assumptions inthe analyzedproblem. Theserestri tions may be,for example:

•

1Round for single-rounds heduling problems,

• C

i

= 0

if all the bandwidths are innite(

C

i

= 0

for all

1 ≤ i ≤ m

),

• S

i

= 0

if there are nostartuptimes (

S

i

= 0

for all

1 ≤ i ≤ m

).

Thede isionversionofthegeneralsingle-rounddivisibleloads heduling

(12)

Given

m

workers, theirparameters

A

i

,

C

i

and

S

i

for

1 ≤ i ≤ m

, andtwo rational numbers

V > 0

and

T > 0

, is it possible to pro ess load of size

V

within time

T

from the moment whenthe originator starts sendingout the load?

Wealsodene thefollowingtwooptimizationproblems onne tedwith

prob-lem DLS

{

1Round

}

.

Problem 2.2. (DLS{1Round}-Opt

V

)

Given a rational time

T > 0

,

m

workers, their parameters

A

i

,

C

i

and

S

i

for

1 ≤ i ≤ m

, nd the greatest rational number

V

OP T

(T )

, su h that it is possible to pro ess load of size

V

OP T

(T )

within time

T

.

T

)

Given a rational load size

V > 0

,

m

workers, their parameters

A

i

,

C

i

and

S

i

for

1 ≤ i ≤ m

, nd the smallest rational number

T

OP T

(V ) ≥ 0

, su h that it is possible to pro ess the whole load

V

within time

T

OP T

(V )

.

Let us note that we are interested not only in nding the optimum time

T

or the amount of load

V

, but also in onstru ting the optimum s hedule. Constru ting as hedule involvesmaking the followingde isions:

•

The set

P

′

_{⊆ P}

of pro essors parti ipating in the omputations must be

hosen. Dependingontheparametersofthe pro essorsand ommuni ation

links, itmay be unprotable touse some of them for omputations.

•

The ommuni ationsequen e (also alleda tivationsequen e), deningthe order in whi h the pro essors re eive load, must be hosen. For

single-round pro essing, the ommuni ation sequen e is a permutation of indi es

of pro essors fromthe set

P

′

.

(13)

The early publi ations on erning s heduling divisible loads in a star system

used asimple linear ommuni ation model. All ommuni ation startup times

S

i

were assumed to be equal to zero. The analyzed problems were DLS{1Round,

S

i

= 0

} and the adequate optimization problems. It was proved independently in[5,13, 17,35℄ thatif allworkers takepart inthe omputationsandnish work

at the same moment, then the problem DLS{1Round,

S

i

= 0

} an be solved by sorting the pro essors by nonde reasing

C

i

in the a tivation sequen e. The hypothesis that inthe optimum solutionallworkers parti ipate in omputations

and nish work simultaneously was proved in [3℄.

The assumption about linear ommuni ation osts usually does not hold in

pra ti e. It has a side ee t that all pro essors an take part in the

omputa-tions, no matter how many of them are available, and no matter how far from

the originatorthey are. Hen e, a more realisti ane ommuni ation model,

in- luding startup times, was introdu ed by Bªa»ewi z and Drozdowski in [17℄. In

publi ation [3℄ itwas shown that inthe optimum solutions of both optimization

versionsoftheproblemDLS{1Round} allpro essorstakingpartin omputations

nishworkatthesamemoment. Additionally,theauthorsprovedthatiftheload

size

V

is large enough, then in any optimum solution all workers parti ipate in the omputationsand they should bea tivatedinthe order ofnonde reasing

C

i

. The omplexity of single-round divisible load s heduling problem remained

open until 2007. Finally, in [48℄ it was proved that the problem DLS{1Round,

C

i

= 0

}isNP- omplete. Theproofwasdonebyredu tionfromtheNP- omplete 2-Partition problem. The authors proposed pseudo-polynomial dynami

pro-gramming algorithms solving the problems DLS{1Round,

C

i

= 0

}-Opt

V

and DLS{1Round,

C

i

= 0

}-Opt

T

. However, sin e pseudopolynomial algorithms are infa texponential,it an bemore usefulto reate polynomialapproximation

(14)

al-that an be derived for NP-hardproblems (unlessP=NP) isafully polynomial

timeapproximations heme(FPTAS).AnFPTASforanoptimizationproblem

Π

with ost fun tion

f

isanapproximationalgorithm

A

whi h forany given

ε > 0

and an instan e

I

of problem

Π

•

returnsasolution

A(I)

su hthat

|f (A(I)) − OP T (I)| ≤ ε|OP T (I)|

,where

OP T (I)

is the optimum ost for instan e

I

, and

•

has running time polynomial inthe size of

I

and

1/ε

.

Constru ting fully polynomial time approximation s hemes for DLS{1Round,

C

i

= 0

}-Opt

V

and DLS{1Round,

C

i

= 0

}-Opt

T

is the aim of the next two se tions.

2.2 FPTAS for Problem DLS{

C

i

= 0

}-Opt

V

Let us start with an observation that if

C

i

= 0

for

1 ≤ i ≤ m

, then nothing an be gained by sending more then one message to the same pro essor. Hen e, for

the divisible loads hedulingproblemwith

C

i

= 0

forall

i

,there alwaysexists an optimumsolutionusingoneroundonly. Consequently,we anwriteDLS{

C

i

= 0

} instead of DLS{1Round,

C

i

= 0

}, be ause these two problems are equivalent.

We begin our onsiderations with the problem of optimizing the size of the

load pro essed in a given time

T

. Similarly as in [48℄, we assume here that

A

i

and

S

i

are integer numbers. The problem an beformulatedas follows.

Problem 2.4. (DLS{

C

i

= 0

}-Opt

V

)

Given a rational time

T > 0

,

m

workers, their integer parameters

A

i

and

S

i

for

1 ≤ i ≤ m

, andprovidedthatthebandwidthsareinnite,ndthegreatestrational number

V

OP T

(T )

, su h that it is possible to pro ess load of size

V

OP T

(T )

within time

T

.

(15)

Let us note that if

S

i

> T

for some pro essor

P

i

, then this pro essor annot be used for pro essing load in time

T

. Therefore, we assume that

S

i

≤ T

for

1 ≤ i ≤ m

. Moreover, if

A

i

= 0

for some pro essor

P

i

, then

P

i

an re eive and pro ess aninniteamountof loadintime

S

i

. As

S

i

≤ T

,the s hedulingproblem be omestrivialin this ase. Hen e, we assume that

A

i

> 0

for

1 ≤ i ≤ m

.

Inorderto onstru tanFPTASsolvingProblem2.4,weneedtoknowinwhat

order the pro essors should be a tivated. We will use the following proposition

given in [48℄.

Proposition 2.1. For a given time limit

T

and a set

P

′

_{⊆ {P}

1 , . . . , P

m

}

of workers taking part in the omputations, the maximum load is pro essed if the

workers are ordered a ording to nonde reasingvalues of

S

i

A

i

for

P

i

∈ P

′

.

Proposition 2.1 an be proved by the inter hange argument: ordering the

pro essors in

P

′

a ording to nonde reasing

S

i

A

i

does not redu e the amount of load pro essed in time

T

.

As it is known from [3℄ that in the optimum solution all pro essors taking

partin omputationsnishworkatthesamemoment,itfollowsfromProposition

2.1 that the s heduling problem an be redu ed to hoosing an optimum subset

of pro essors taking part in the omputations. Let us assume, without loss of

generality,that

S

1 A

1 ≤ . . . ≤ S

m

A

m

. Wedeneabinary ve tor

x

= (x

1 , . . . , x

m

)

as follows:

x

i

= 1

if pro essor

P

i

re eives some load to pro ess (i.e.

P

i

∈ P

′

)

and

x

i

= 0

in the opposite ase (

P

i

∈ P

/

′

). The maximum amount of load whi h

an be pro essedin time

T

using the subset of pro essors indi ated by

x

an be obtained from the formula

V

OP T

(T, x) =

m

X

i=1

T x

i

A

i

−

m

X

i=1

m

X

j=i

x

i

x

j

S

i

A

j

.

(2.1) The expression

P

m

i=1

T x

A

i

is the amount of loadwhi h ould be pro essedin time

(16)

Commu-ni ationwithpro essor

P

i

takestime

x

i

S

i

. During thistimepro essors

P

j

,where

j ≥ i

, annotpro ess any load be ause they did not re eive the inputyet. Thus,

P

m

i=1

P

m

j=i

x

i

x

j

S

i

A

j

is the amount of load whi h is lost be ause of ommuni ation

delays ( f. [48℄).

Our goal is to maximize the size

V

of load pro essed in a given time

T

as a fun tion of a binary ve tor

x

= (x

1 , . . . , x

m

)

. Instead of maximizing

V (x)

, we willminimizethe value of

−V (x)

. Sin e

x

i

are binaryvariables,wehave

x

2 i

= x

i

. Hen e wehave

−V (x) = −

m

X

i=1

T − S

i

A

i

x

i

+

X

1≤i<j≤m

S

i

1 A

j

x

i

x

j

.

(2.2) A half-produ t [2℄is afun tion

f : {0, 1}

m

_{→ R}

of the form

f (x) = f (x

1 , . . . , x

m

) = −

m

X

i=1

p

i

x

i

+

X

1≤i<j≤m

q

i

r

j

x

i

x

j

,

(2.3)

where

p

i

,

q

i

,

r

i

are nonnegative onstants for

1 ≤ i ≤ m

. Thus,

−V (x)

is a half-produ t, with

p

i

=

T −S

i

A

i

,

q

i

= S

i

,

r

j

=

1 A

j

.

An FPTAS for minimizing half-produ ts was proposed by Badi s and Boros

in [2℄. They assumed that the parameters

p

i

, q

i

, r

i

are nonnegative integers for

1 ≤ i ≤ m

. Inour aseallparametersarenonnegative,but

p

i

=

T −S

i

A

i

and

r

j

=

1 A

j

are not integer. However, the assumption about integrality of

p

i

and

r

i

is used neither for proving the orre tness of the Badi s and Boros algorithm, nor for

estimating itsrunningtime. Therefore, we an use the algorithmproposed in[2℄

tominimizethefun tion

−V (x)

. Thealgorithmre eivesnumber

m

,ve tors

p

,

q

,

r

of length

m

, and a positive approximation pre ision

ε < 1

. It returns a binary ve tor

x

ε

_{= (x}

ε

1 , . . . , x

ε

m

)

. For

1 ≤ k ≤ m

, let

g

k

(x) = −

P

k

i=1

p

i

x

i

+

P

1≤i<j≤k

q

i

r

j

x

i

x

j

and

Q

k

(x) =

P

k

i=1

q

i

x

i

. The FPTAS for minimizing half-produ ts proposed by Badi s and Boros is formulatedin Algorithm2.1 ( f. [2℄).

(17)

Algorithm 2.1MINIMIZE-HALF-PRODUCT(

m

,

p

,

q

,

r

,

ε

) STEP 0:

Let

δ > 0

be dened by the equation

(1 + δ)

m

_{= 1 + ε}

, let

Q =

P

m

i=1

q

i

,

N = ⌈

2m log Q

ε

⌉

,

k = 0

and

X

0 = {()}

. STEP 1: Let

k = k + 1

,

X

k

= ∅

,

t = 0

,

s = 0

,

L = {(y

1 , . . . , y

k−1

, 0), (y

1 , . . . , y

k−1

, 1)|(y

1 , . . . , y

k−1

) ∈ X

k−1

}

STEP 2: while

s ≤ N

do

sele t

z

= (z

1 , . . . , z

k

) ∈ L

for whi h

t ≤ Q

k

(z) < (1 + δ)

s

and forwhi h

g

k

(z)

is the smallest amongallsu h

z

. Let

X

k

= X

k

∪ {z}

,

t = (1 + δ)

s

,

s = s + 1

. end while STEP 3: if

k < m

then goto STEP 1 else goto STEP 4. end if STEP 4: Sele t

x

ε

_{∈ X}

m

with the smallest

g

m

(x

ε

₎

, return

x

ε

.

It was proved in [2℄that

f (x

ε

) ≤ f (x

∗

) + ε|f (x

∗

)|,

(2.4)

where

x

∗

isave torminimizing

f

,andtherunningtimeofthealgorithm MINIMIZE-HALF-PRODUCT is

O(m

2 _log(

P

m

i=1

q

i

)/ε

) [2℄.

Basedonthese results, we propose Algorithm2.2 forProblem 2.4 [6℄.

Theorem 2.2. Algorithm 2.2 is a fully polynomial time approximation s heme

(18)

Algorithm 2.2FPTAS-OPT-V(

T, m, A, S, ε

) for

i = 1

to

m

do

p

i

=

T −S

_A

_i

i

q

i

= S

i

r

i

=

_A

1 _i

end for

x

ε

=MINIMIZE-HALF-PRODUCT(

m

,

p

,

q

,

r

,

ε

) return

x

FPTAS

(T, ε) = x

ε

,

V

F P T AS

(T, ε) =

P

m

i=1

T x

ε

i

A

i

−

P

m

i=1

P

m

j=i

x

ε

i

x

ε

j

S

i

A

j

Proof. Sin e

x

FPTAS

(T, ε)

is returned by the MINIMIZE-HALF-PRODUCT al-gorithm forthe fun tion

−V (x)

,weget from(2.4)

−V

F P T AS

(T, ε) ≤ −V

OP T

(T ) + ε| − V

OP T

(T )|.

(2.5)

Astheamountofload

V

OP T

(T )

isalwaysnonnegative,thisformula anbe rewrit-ten as

−V

F P T AS

(T, ε) ≤ −V

OP T

(T ) + εV

OP T

(T ).

(2.6)

Hen e,

V

F P T AS

(T, ε) ≥ V

OP T

(T )(1 − ε).

(2.7)

Moreover, the running time of Algorithm 2.2 is dominated by the running time

of MINIMIZE-HALF-PRODUCT, andisequal toatmost

O(m

2 _log(

P

m

i=1

S

i

)/ε)

, whi hisboundedfromaboveby

O(m

2 _{(log m+log S}

max

)/ε)

. Hen e,Algorithm2.2 is anFPTAS for Problem2.4.

2.3 FPTAS for Problem DLS{

C

i

= 0

}-Opt

T

The se ond optimization problem we willanalyze is DLS{

C

i

= 0

}-Opt

T

, whi h an beformulatedin the following way.

(19)

T

)

Given a rational load size

V > 0

,

m

workers, their integer parameters

A

i

and

S

i

for

1 ≤ i ≤ m

, and provided that the bandwidths are innite, nd the smallest rational number

T

OP T

(V ) ≥ 0

, su h that it is possible to pro ess the whole load

V

within time

T

OP T

(V )

.

To reate an approximation s heme for Problem 2.5, we will use the dual

approximation algorithm approa h proposed in [34℄. As stated in [34℄, a dual

approximation algorithm is an algorithm whi h nds a superoptimal infeasible

solution of a given optimization problem. The performan e of the algorithm is

measured by the degree of the infeasibility of the solution, ontrolledby a given

value

ε > 0

. We will onstru t a dual approximation algorithmfor Problem2.4 (DLS{

C

i

= 0

}-Opt

V

). This algorithm should a ept a period of time

T

and a ura y

ε

(

0 < ε < 1

),and deliveras hedule pro essingthe loadof size atleast

V

OP T

(T )

intime not longer than

T (1 + ε)

. We propose the followingAlgorithm 2.3 [6℄.

Algorithm 2.3DUAL-OPT-V(

T, m, A, S, ε

) all FPTAS-OPT-V(

T, m, A, S, ε/2

)

return

x

DUAL

(T, ε) = x

FPTAS

(T, ε/2)

,

V

DU AL

(T, ε) = (1 + ε)V

F P T AS

(T, ε/2)

In order to prove that Algorithm 2.3 is a dual approximation algorithm for

Problem 2.4, we willuse the following fa t.

Proposition 2.3. If it is possible to pro ess load of size

V

in time

T

using the subset of pro essors indi ated by a binary ve tor

x

= (x

1 , . . . , x

m

)

, then it isalso possible topro ess load of size

V (1 + ε)

in time atmost

T (1 + ε)

, usingthe same subset of pro essors.

(20)

Proof. Let

V

′

denote the maximum size of load whi h an be pro essed in time

T (1 + ε)

using the pro essors indi ated by the ve tor

x

. From (2.1) we obtain

V

′

=

m

X

i=1

T (1 + ε)x

i

A

i

−

m

X

i=1

m

X

j=i

x

i

x

j

S

i

A

j

(2.8) and

V =

m

X

i=1

T x

i

A

i

−

m

X

i=1

m

X

j=i

x

i

x

j

S

i

A

j

.

(2.9) Hen e,

V

′

= (1 + ε)V + ε

m

X

i=1

m

X

j=i

x

i

x

j

S

i

A

j

≥ V (1 + ε).

(2.10)

Notethat if

T = T

OP T

(V )

, then by Proposition 2.3 load of size

V (1 + ε)

an be pro essed in time not longer than

T

OP T

(V )(1 + ε)

. Hen e, as a orollary, we an formulatethe following proposition.

Proposition 2.4. For any numbers

V ≥ 0

and

ε > 0

we have

T

OP T

(V (1 + ε)) ≤ T

OP T

(V )(1 + ε).

(2.11)

We willsay that analgorithmis a fully polynomialtime dual approximation

algorithm for a given problem if it is a dual approximation algorithm for this

problem with approximation pre ision

ε

and its running time is polynomial in both the problemsize and

1/ε

.

Theorem 2.5. Algorithm 2.3 is a fully polynomial time dual approximation

al-gorithm forProblem 2.4 (DLS

{C

i

= 0}

-Opt

V

).

Proof. As

V

DU AL

(T, ε) = (1 + ε)V

F P T AS

(T, ε/2)

inAlgorithm2.3,weobtainfrom (2.7) that

(21)

be ause

ε < 1

. Thus, the obtained solutionis superoptimal. The time needed to pro ess the loadof size

V

DU AL

(T, ε)

isatmost

T (1 + ε)

by Proposition 2.3, asit is possible topro ess load of size

V

F P T AS

(T, ε/2)

in time

T

.

The running time of Algorithm 2.3 is determined by the all to algorithm

FPTAS-OPT-V, when e it isequal toat most

O(m

2 _{(log m + log S}

max

)/ε)

.

Thedualapproximationalgorithm2.3isthekeyelementoftheFPTASsolving

Problem 2.5 (DLS{

C

i

= 0

}-Opt

T

),given inAlgorithm 2.4. Algorithm 2.4FPTAS-OPT-T(

V, m, A, S, ε

)

upper

=

S

max

+ V A

max

lower

= 0

LoBo = V A

min

/m

while

(upper − lower) >

ε(1−ε)

(2−ε)

LoBo

do

T

p

= (upper + lower)/2

all DUAL-OPT-V(

T

p

, m, A, S, ε

) if

V

DU AL

(T

p

, ε) < V (1 + ε)

then

lower = T

p

else

upper = T

p

end if end while

all FPTAS-OPT-V(

upper, m, A, S, ε/2

) return

x

= x

FPTAS

(upper, ε/2)

,

T = upper

The idea of Algorithm 2.4 is to nd a good approximation of

T

OP T

(V )

with a binary sear h. The initial sear h interval

[lower, upper]

is dened by trivial lower and upper bounds for

T

OP T

(V )

. Then, it is iteratively narrowed to its loweror upperhalf, depending onthe results delivered by Algorithm 2.3 for the

urrently examined value

T

p

. When the sear h interval be omes short enough, the sear hing pro edure is nished and the ve tor

x

representing the subset of pro essors whi hshould be used for omputationsis obtained by Algorithm 2.2.

(22)

for Problem 2.5 (DLS

{C

i

= 0}

-Opt

T

).

Proof. Let us start with the observation that at the beginning of the algorithm

upper

and

lower

are trivial upper and lower bounds for

T

OP T

(V )

.

LoBo

is also a lower bound on

T

OP T

(V )

and it is positive, sin e we assumed that

A

i

> 0

for

1 ≤ i ≤ m

.

First,we willanalyzethe variable

upper

inorder toprove that the algorithm always returns a feasible solution. At the beginning of the algorithm we have

upper = S

max

+ V A

max

. Ifthis value is not hanged in the binary sear h while loop, then the algorithm FPTAS-OPT-V is alled for parameters

T = upper =

S

max

+V A

max

andapproximationpre ision

ε/2

attheendofexe utingAlgorithm 2.4. The obtained s hedule allows for pro essing the load of size at least

V

, as it isenoughto hoose anynonempty subset ofthe set

{P

1 , . . . , P

m

}

topro ess

V

units of load in time

T = S

max

+ V A

max

.

Nowletusassumethatthevalueof

upper

is hangedatleaston eto

T

p

. This happens onlyif

V

DU AL

(T

p

, ε) ≥ V (1 + ε)

. Therefore, aswe have inAlgorithm2.3

V

DU AL

(T, ε) = (1 + ε)V

F P T AS

(T, ε/2),

(2.13)

there holds

V

F P T AS

(upper, ε/2) = V

DU AL

(upper, ε)/(1 + ε) ≥ V

(2.14)

at any timeduring the exe utionof Algorithm2.4. Hen e, the solution obtained

by the algorithmFPTAS-OPT-T is always feasible.

Now letus estimatethe quality of the obtained solution. We willshow that

lower < T

OP T

(V )(1 +

ε

(23)

throughouttheexe utionoftheprogram. Sin einitially

lower = 0

,this ondition istruebeforeenteringintothewhileloop. Thevalueofvariable

lower

is hanged to

T

p

only when

V

DU AL

(T

p

, ε) < V (1 + ε)

. It follows from(2.13) that

(1 + ε)V

F P T AS

(lower, ε/2) < V (1 + ε).

(2.16)

Furthermore, from(2.7) we get

(1 + ε)V

OP T

(lower)(1 − ε/2) < V (1 + ε),

(2.17)

V

OP T

(lower) < V /(1 − ε/2)

(2.18) and nally

V

OP T

(lower) < V (1 +

ε

2 − ε

).

(2.19)

Thus, itis impossibleto pro ess load

V (1 +

ε

2−ε

)

intime

lower

. Hen e,

lower < T

OP T

(V (1 +

ε

2 − ε

)).

(2.20) By Proposition2.4 wehave

T

OP T

(V (1 +

ε

2 − ε

)) ≤ T

OP T

(V )(1 +

ε

2 − ε

),

(2.21)

what provesthat (2.15)is true duringthe binary sear h.

Thebinarysear hisnishedwhen

upper ≤ lower+

ε(1−ε)

(2−ε)

LoBo

. Sin e

LoBo ≤

T

OP T

(V )

, by (2.15) we get

upper ≤ T

OP T

(V )(1 +

ε

2 − ε

) +

ε(1 − ε)

(2 − ε)

T

OP T

(V )

(2.22) and onsequently

upper ≤ T

OP T

(V )(1 + ε).

(2.23)

(24)

of the problem.

Thenumberofiterationsinthebinarysear hisatmostequalto

O(log((S

max

+

V A

max

)/(

ε(1−ε)

_(2−ε)

V A

min

/m)))

,whi hisbounded fromaboveby

O(log m + log S

max

+ log A

max

+log(1/ε)+max(log V, log(1/V ))).

Theexe utiontimeofea hiteration is

O(m

2 _{(log m+log S}

max

)/ε)

dueto allingAlgorithm2.3. Thus,therunningtime ofthewholealgorithmFPTAS-OPT-Tisatmost

O((log m+log S

max

+log A

max

+

log(1/ε) + max(log V, log(1/V )))m

2 _{(log m + log S}

max

)/ε)

.

2.4 Communi ation Sequen e for Problem

DLS{1Round}

It would bedesirable toextend the approximabilityresults presented inthe

pre- edingse tionstoproblemsDLS{1Round}-Opt

V

andDLS{1Round}-Opt

T

. Note that DLS{1Round,

C

i

= 0

} is a sele tionproblem. This means that itis ompu-tationally hard to sele t the set

P

′

of parti ipating pro essors, but for a given

P

′

theoptimum a tivation sequen e isknown. Moreover, thisfeature allowed for

onstru tion of an FPTAS sele ting the set

P

′

of parti ipating pro essors. The

main di ulty in problem DLS{1Round} is that for instan es with

C

i

> 0

, the optimumorderof a tivatingthe pro essorsisnotknown. Therefore, the

s hedul-ingproblems annotberedu edtojust hoosingthepro essorswhi hshouldtake

partin omputations. Letusremindthatageneralmethodoforderingpro essors

should overspe ial ases:

•

ordering pro essors a ording to nonde reasing values

S

i

A

i

if all

C

i

are equal tozero,

•

orderingpro essors a ording tononde reasing values

C

i

if all

S

i

are equal to zero,

(25)

be pro essed orthe time

T

used for pro essingis large enough.

Let us analyze the a tivation sequen e for problem DLS{1Round}-Opt

V

in-stan ewith

m = 3

. Wewill omparetheamountsof loadwhi h an bepro essed for a tivation sequen es

σ

′

_{= (1, 2, 3)}

and

σ

′′

_{= (2, 1, 3)}

. In both ases weassume

that all pro essors nish omputations at time

T

, as this is true in the opti-mum s hedule. It isalso assumed that the time

T

is so large that all pro essors

P

1 , P

2 , P

3

should take part inthe omputationsinthe optimum s hedule. Let

α

′

i

,

α

′′

i

denotethesizesofthe

i

-thpie eofloadsentfora tivationsequen es

σ

′

and

σ

′′

, orrespondingly. The sizes of the rst two parts of load, sent to

pro essors

P

1

and

P

2

for ommuni ation sequen e

σ

′

, are equal to

α

′

₁

=

T − S

1 C

1 + A

1

(2.24) and

α

2 ′

=

T − S

1 − C

1 α

1 − S

2 C

2 + A

2 ,

(2.25) whi hgives

α

′

2 =

A

1 (T − S

1 )

(C

1 + A

1 )(C

2 + A

2 )

−

S

2 C

2 + A

2 .

(2.26)

Similarly,for ommuni ationsequen e

σ

′′

, thesizesof thersttwopie esofload,

sent topro essors

P

2

and

P

1

orrespondingly, are equal to

α

′′

₁

=

T − S

2 C

2 + A

2

(2.27) and

α

′′

₂

=

A

2 (T − S

2 )

(C

1 + A

1 )(C

2 + A

2 )

−

S

1 C

1 + A

1 .

(2.28)

Let us observe that the time needed for sending the rst two pie es of load

may be dierent for a tivation sequen es

σ

′

and

σ

′′

. Therefore, the amount of

(26)

t

′

= S

1 + S

2 + C

1 T − S

1 C

1 + A

1 + C

2 (

A

1 (T − S

1 )

(C

1 + A

1 )(C

2 + A

2 )

−

S

2 C

2 + A

2 )

(2.29) if a tivation sequen e is

σ

′

, and intime

t

′′

= S

1 + S

2 + C

2 T − S

2 C

2 + A

2 + C

1 (

A

2 (T − S

2 )

(C

1 + A

1 )(C

2 + A

2 )

−

S

1 C

1 + A

1 )

(2.30) if a tivation sequen e is

σ

′′

. From (2.29) and (2.30) we obtain

∆t = t

′

− t

′′

=

C

1 A

2 S

2 − C

2 A

1 S

1 (C

1 + A

1 )(C

2 + A

2 )

.

(2.31) Let

t

′

3

and

t

′′

3

be the amounts of time used for ommuni ation and omputations of pro essor

P

3

for sequen es

σ

′

and

σ

′′

. Notethat

t

′′

3 − t

′

3 = ∆t.

(2.32) Therefore,

α

3 ′′

− α

′

3 =

∆t

C

3 + A

3 .

(2.33)

From equations(2.24)-(2.28) and (2.33), we an ompute the dieren e between

the amountsof load pro essed in both s hedules:

∆V =

3 X

i=1

α

′′

_i

−

3 X

i=1

α

_i

′

=

T (C

1 − C

2 ) + A

1 S

1 − A

2 S

2 (C

1 + A

1 )(C

2 + A

2 )

+

C

1 A

2 S

2 − C

2 A

1 S

1 (C

1 + A

1 )(C

2 + A

2 )(C

3 + A

3 )

.

(2.34)

It an beseenthatthe signof

∆V

dependsnot onlyontheparametersof pro es-sors

P

1

and

P

2

, but alsoon

A

3

and

C

3

. Similarly, for

m > 3

the order inwhi h the rst two pro essorsshould bea tivated depends onthe parameters ofallthe

(27)

a tivatethe pro essors,be ausethe de isionhowtosequen e, e.g.,

P

1

,

P

2

annot be onned to just

P

1

,

P

2

. The rst summand in formula (2.34) may suggest sorting the pro essors a ordingto nonde reasing values of

T C

i

+ A

i

S

i

. Su han algorithmwould handleproperlythespe ial ases mentionedatthe beginningof

this se tion.

However, onsider the following ounterexample. Let

T = 700

,

m = 4

, and let the parameters of the pro essors beas given inTable 2.1.

Table 2.1: Pro essorparameters for the ounterexample.

i

A

i

C

i

S

i

T C

i

+ A

i

S

i

for

T = 700

1 0.051 0.129 137.084 97.291284

2 2.146 0.050 34.487 109.009102

3 0.654 0.458 31.565 341.243510

4 1.838 0.152 32.747 166.588986

The amounts of load whi h an be pro essed for all a tivation sequen es are

given inTable2.2. Ifthe pro essorsaresorteda ording tononde reasingvalues

of

T C

i

+ A

i

S

i

, we obtain ommuni ation sequen e (1,2,4,3) and the size of pro- essed loadisabout 3275.0461. Onthe otherhand, theoptimum ommuni ation

sequen e is (2,1,4,3), whi h allows for pro essing the load of size approximately

3276.4212. Thus, the analyzed algorithmdoes not deliverthe optimum

ommu-ni ation sequen e.

Another approa h to sele ting the best ommuni ation sequen e is to start

from the initialsequen e

(1, 2, . . . , m)

, and improveit by hangingthe positions ofsomepro essors. Letusassumethatitisallowedtoperformtwooperationson

the ommuni ationsequen e: swapapairofpro essorsormoveasinglepro essor

to another pla e in the sequen e. Only the moves in reasing pro essed load

V

for the given s hedule length

T

an be made. However, for the instan e given above, theamountofload pro essedfor ommuni ation sequen e

σ

1 = (1, 2, 3, 4)

is approximately 3276.0243 (see Table 2.2). The only ommuni ation sequen e

(28)

Table2.2: Thesize

V

of loadpro essedfor dierent a tivation sequen es in the oun-terexample (rounded to4 digitsafter de imalpoint).

Sequen e Pro essedload

V

Sequen e Pro essed load

V

(1,2,3,4) 3276.0243 (1,2,4,3) 3275.0461 (1,3,2,4) 3264.4671 (1,3,4,2) 3265.8734 (1,4,2,3) 3272.7902 (1,4,3,2) 3275.0848 (2,1,3,4) 3274.1818 (2,1,4,3) 3276.4212 (2,3,1,4) 2135.6348 (2,3,4,1) 1963.7528 (2,4,1,3) 3102.9726 (2,4,3,1) 2097.1445 (3,1,2,4) 2040.9951 (3,1,4,2) 2044.6016 (3,2,1,4) 1963.8495 (3,2,4,1) 1792.8317 (3,4,1,2) 1879.3648 (3,4,2,1) 1776.3430 (4,1,2,3) 3104.2910 (4,1,3,2) 3103.4595 (4,2,1,3) 3078.8408 (4,2,3,1) 2076.1120 (4,3,1,2) 2021.0451 (4,3,2,1) 1920.2297

for whi h it is possible to pro ess larger load, is the optimum sequen e

σ

2 =

(2, 1, 4, 3)

. Yet, it is impossible to obtain this solution by the moves des ribed above, asanyallowed hangeto

σ

1

results inde reasingthe amountof pro essed load, and hen e annot be a epted.

The above ounterexample provesnot only that the des ribed type of greedy

algorithmsisnot apableof solvingour problem,but alsothatitisimpossibleto

nd the optimuma tivationsequen e by simplysorting thepro essors a ording

to some ombination of instan e parameters. Indeed, note that the

ommuni a-tionsequen e (1,2,3,4)isbetterthan(1,2,4,3)and thesequen e (2,1,4,3)isbetter

than (2,1,3,4). This shows that depending on the amount of time left for

pro- essing on

P

3

and

P

4

, it is better to a tivate one or the other pro essor earlier. Thus, theorderinwhi hpro essors

P

3

and

P

4

shouldbea tivateddependsonthe parameters of pro essors a tivated beforethem. Consequently, it is not possible

to determine the ommuni ation sequen e lo ally, without taking into a ount

the sequen e of other pro essors.

Moreover, for the above instan e, the load pro essed by

P

1

if it is a tivated rstismu hgreaterthantheloadpro essedby

P

2

inthe asewhenthea tivation

(29)

sequen e startswith 2. Still,intheoptimumsolutionpro essor

P

2

shouldre eive load before

P

1

. Thus, a greedy algorithm,always appendingto the ommuni a-tion sequen e the pro essor whi h an pro ess the greatest amount of load, also

does not deliver optimum solution.

Finally,it an be onje turedthat DLS{1Round} isnot asele tion problem.

2.5 Approximation Algorithms for Problem

DLS{1Round}

Withoutknowinghowtoorderthepro essorstakingpartinthe omputationsfor

problemDLS{1Round}, wearenot ableto reatesimilarapproximations hemes

as for problem DLS{

C

i

= 0

}. Therefore, we present several algorithms with approximationratio bounded but dependent onthe instan e parameters.

V

The simplest method of reating a solution of problem DLS{1Round}-Opt

V

is to send the whole load to asingle pro essor only. The size of the load pro essed

by asingle pro essor

P

i

intime

T

isequal to

(T − S

i

)/(A

i

+ C

i

)

. Thus, we sele t the pro essorforwhi hthis value isthegreatest, asitisshown inAlgorithm2.5.

Algorithm 2.5SINGLE-PROCESSOR-OPT-V(

T, m, A, C, S

)

j = 1

for

i = 2

to

m

do

if

(T − S

i

)/(A

i

+ C

i

) > (T − S

j

)/(A

j

+ C

j

)

then

j = i

end if

end for

(30)

Note that in the optimum s hedule at least one pro essor

P

i

must pro ess load ofsize atleast

V

OP T

(T )/m

(ingiventime

T

). Hen e, Algorithm2.5 delivers a solution pro essing load of size at least

V

OP T

(T )/m

and is an approximation algorithmwith relativeperforman e guarantee

m

. Notethat this bound istight. Consider aninstan e with

A

i

= 1

,

C

i

= S

i

= 0

for

i = 1, . . . , m

. In the optimum solution, all pro essors are a tivated and they pro ess load of size

mT

. In the solution delivered by Algorithm 2.5 only one pro essor is a tivated and the size

of the load is

T

. The runningtime of Algorithm2.5 is

O(m)

.

The above approa h an be extended by analyzing all ommuni ation

se-quen es of length

k

for some onstant

k ≤ m

. Similarly as before, we observe that if the optimum solutionof the problem a tivates at least

k

pro essors, then itmust ontainagroupof

k

pro essorswhi htogetherpro essloadofsizeatleast

kV

OP T

(T )/m

. Hen e, an algorithmenumerating all possible ommuni ation se-quen es oflength

k

delivers asolutionwith relativeperforman e guarantee

m/k

, provided that the optimum solutionof the instan e of the problemuses at least

k

pro essors. Unfortunately, the omplexity of su h an algorithmis

O(m

k

₎

and

it grows exponentiallywith the relativeperforman e guarantee.

Algorithm2.5 an be also extended to a greedy Algorithm 2.6, sele tingthe

pro essors in the ommuni ation sequen e one by one. As long as it is possible

to append a pro essor to the ommuni ation sequen e, the pro essor whi h an

pro ess the greatest load is hosen.

The running time of Algorithm 2.6 is

O(m

2 ₎

. The results delivered by this

algorithmarenot worse thenforAlgorithm2.5. Still,theperforman eguarantee

m

is tight. Indeed, onsider the following problem instan e. Let

A

1 = 1 − ε

,

C

1 = T − 1

,

S

1 = 0

,and

A

i

= T

,

C

i

= 0

,

S

i

= 0

for

i = 2, . . . , m

,where

0 < ε < 1

is a small onstant. Pro essor

P

1

an pro ess load of size

T

A

1 +C

1 =

T

T −ε

> 1

in time

T

. For

i ≥ 2

, pro essor

P

i

is apable of pro essing load of size

T

= 1

in time

T

. Hen e, Algorithm 2.6 will hoose pro essor

P

1

to obtain the rst load

(31)

Algorithm 2.6GREEDY-OPT-V(

T, m, A, C, S

)

σ = ()

V = 0

j = 1

while

j 6= 0

do

j = 0

for

i = 1

to

m

do

if

S

i

< T

and

i

is not ontained in

σ

then

if

j = 0

or

(T − S

i

)/(A

i

+ C

i

) > (T − S

j

)/(A

j

+ C

j

)

then

j = i

end if end if end for if

j 6= 0

then

σ = σ|j

{ on atenationof

σ

and

j

}

V = V + (T − S

j

)/(A

j

+ C

j

)

T = T − S

j

− C

j

(T − S

j

)/(A

j

+ C

j

)

end if end while return

σ

,

V

hunk. Sending data to pro essor

P

1

will take time

T

1 = C

1 T

T −ε

= (T − 1)

T

T −ε

. The remaining pro essors

P

i

will be a tivated afterwards and ea h of them will obtain the load of size

(T − T

1 )/A

i

=

(T −(T −1)

T

T −ε

)

T

= 1 −

T −1

T −ε

=

1−ε

T −ε

. Thus, the total size of the pro essedload willbe

V

1 =

T +(m−1)(1−ε)

T −ε

.

On the other hand, if pro essor

P

1

is a tivated as the last one, then ea h of pro essors

P

2 , . . . , P

m

re eives load of size

1

. The time left for ommuni ation and omputation on

P

1

is still

T

, and

P

1

pro esses load of size

T

T −ε

. The whole pro essed load has size

V

2 = m − 1 +

T

T −ε

. Thus, we have

V

2 V

1 =

mT −ε(m−1)

(m−1)(1−ε)+T

and

lim

T →∞

V

_V

2 ₁

= m

.

The quality of the results obtained by Algorithm2.6 in omparison to

(32)

a)

1

6

11

16

21

26 1E0

1E1

1E2

1E3

1E4

AVG

MAX

b)

0.00

0.05

0.10

0.15 1E0

1E1

1E2

1E3

1E4

Alg. 2.5, AVG

Alg. 2.5, WRST

Alg. 2.6, AVG

Alg. 2.6, WRST

Figure2.2: Experimental resultsfor therst setofinstan es (slow ommuni ation). a)

Number of pro essors used by Algorithm 2.6. b) Quality of the solutions obtained by

Algorithms 2.5and 2.6.

between the two algorithms we tested them on sets of random instan es. Ea h

instan e inthe rst set had

m = 100

pro essors, and their parameters

A

i

,

C

i

,

S

i

were hosen randomly from the interval

[0, 1]

. For ea h generated set of pro es-sors,5instan eswere reated,with

T = 1, 10, 100, 1000, 10000

. Thequalityofthe obtained solutionswas measuredasthequotient

V

a

U pBo

,where

V

a

isthe amountof load returned by the tested algorithm,and

UpBo

isthe upperboundon thesize of pro essed load, al ulated as

P

m

i=1

T −S

i

A

i

+C

i

. The results of the experiments are

presented inFig.2.2. Thenumberofpro essorsusedbythegreedyAlgorithm2.6

depends on

T

(see Fig. 2.2a). Althoughfor ea h value of

T

there were instan es for whi h only one pro essor was used, the average and the maximum number

of used pro essors (denoted by

AV G

and

MAX

in Fig. 2.2a, orrespondingly) in reases with

T

. Despitethis,the performan eofAlgorithm2.6 doesnot hange mu h withgrowing

T

( f. Fig.2.2b),bothonaverage(denoted

AV G

)and inthe worst ase (denoted

W RST

). This an be explainedby the fa tthat the pro es-sors a tivatedasthe lastonesre eiveonlyverysmallamountsofload. Moreover,

when startup times

S

i

are small in omparison to

T

, then the amounts of load pro essedbyasinglepro essororaxedgroupofpro essorsin rease roughly

(33)

se ond setof instan es (fast ommuni ation),

T = 10000

.

Algorithm 2.5 Algorithm2.6

AVG WRST AVG WRST

0.271073 0.061405 0.855082 0.598274

linearly with

T

. Hen e, the quality of the resultsobtained by both algorithmsis almost onstant inrelationtothe upperboundwhen

T

grows beyond 100. Note that it is mu h better than the worst- ase estimate

1 m

= 0.01

. It an be also seen inFig.2.2b thatonaverage Algorithm2.6 deliverssolutionsabout1.5 times

betterthan Algorithm 2.5.

The above results an be explained by the fa t that the ommuni ation

pa-rameters

C

i

,

S

i

were hosen from the same range as

A

i

. The time ne essary to send a hunkof data was quitebigand onlya small number of pro essors ould

bea tivated. Therefore,we reatedanothersetofinstan es,whereparameters

C

i

and

S

i

were hosenrandomly fromthe interval

[0, 0.001]

. The remaining param-eters were sele ted as in the previous set. Sin e the startup times

S

i

were very small in omparison to all used value of

T

,the quality of the obtained solutions was almostnot hangingwith

T

. Therefore,we presentonly theaverage andthe worst performan e of both algorithmsfor

T = 10000

in Table 2.3. The number of pro essors used by Algorithm 2.6 was

m = 100

for all instan es in this set. Therefore, the dieren e between the results obtainedby Algorithms2.5 and2.6

is greater than for the previous set of instan es, for whi h atmost 26pro essors

were used by the greedyalgorithm. The quality ofthe resultsof both algorithms

isbetterthan fortheprevious instan eset. On average, Algorithm2.5 allows for

pro essing load of size greater than 27% of the upper bound and Algorithm 2.6

greater than 85%.

We on ludethatthedieren einthe qualityoftheresultsobtainedby

(34)

good. However, this anbetheee t of theused measure ofquality. When

om-muni ation is slow, the upper bound we al ulated may be mu h greater than

the optimum solution. If ommuni ation is fast in omparison to omputations,

then the resultsobtained by both algorithmsget better. The dieren e between

the results of Algorithms2.5 and 2.6 is in reasing and the greedy Algorithm2.6

delivers solutionsof very good quality.

T

In orderto reateanapproximationalgorithmforproblemDLS{1Round}-Opt

T

, we an, similarlyasin Algorithm2.5, onsider only ommuni ation sequen es of

length 1. This approa h is used inAlgorithm 2.7.

Algorithm 2.7SINGLE-PROCESSOR-OPT-T(

V, m, A, C, S

)

j = 1

for

i = 2

to

m

do if

S

i

+ (A

i

+ C

i

)V < S

j

+ (A

j

+ C

j

)V

then

j = i

end if end for return

σ = (j)

,

T = S

j

+ (A

j

+ C

j

)V

Note that if pro essor

P

i

needs time

T

to pro ess the load of size

V

, then it annotpro ess theloadof size

V /m

faster thanintime

T /m

. Asintheoptimum solutionatleastonepro essorhas tore eiveloadof sizeatleast

V /m

,Algorithm 2.7 returns time

T ≤ mT

OP T

(V )

. Observe that this bound is tight. Consider an instan e with

A

i

= 1

,

C

i

= S

i

= 0

for

i = 1, . . . , m

. In the optimum solution, all pro essors are a tivated and they pro ess load

V

in time

V

m

. In the solution delivered by Algorithm 2.7 only one pro essor is a tivated and it needs time

V

to pro ess the whole load. The runningtime of Algorithm2.7 is

O(m)

.

(35)

In this hapter we analyzed single-round divisible load s heduling in star

net-works. We proposed fully polynomial time approximation s hemes for problems

DLS{

C

i

= 0

}-Opt

V

and DLS{

C

i

= 0

}-Opt

T

. As a by-produ t, a fully poly-nomial time dual approximation algorithm was designed for the rst problem.

We also analyzed the s hedulingproblems in the system with nite bandwidths

(i.e. when

C

i

> 0

). The order in whi h the pro essors should be a tivated was studied as the main obsta lein reating approximation algorithmsfor this ase.

Unfortunately, we showed that some lasses of pro essor sequen ing algorithms

annot be used to solve this problem. We onje ture that onstru tingthe

opti-mumsequen e anbe omputationallyhard,andDLS{1Round}isnotasele tion

problem. Finally,weproposedsimpleapproximationalgorithmsgivingtight

rela-tiveperforman e guarantee

m

forproblemDLS{1Round}-Opt

V

andforproblem DLS{1Round}-Opt

T

.

(36)

with Limited Memory

Thesingle-roundorganizationof omputationshasseveraldisadvantages. Firstly,

the ommuni ationdelaysmaybeverylong,whileno omputations anbestarted

untiltherstpro essorre eivesthewholeamountofloadassignedtoit. Se ondly,

inpra ti ethewholeload

V

isoftentoobigtobestoredinthememoriesofworker pro essors at the same time. In su h a ase it is impossible to reate a

single-round s hedule. It would be more protable to send the load in many small

pie es( hunks),sothat omputationsstart earlierand tin omputer memories.

Consequently, omputations ould interleave with ommuni ations.

Inthis hapterwestudymulti-rounddivisibleloads hedulinginsystemswith

limited memory. We analyze the star network topologydes ribed in Chapter 2.

Totakeintoa ountmemorylimitations,weintrodu eone moreparameter

har-a terizing ea h pro essor

P

i

. Namely,

B

i

is the size of memory buer available on

P

i

(e.g. inbytes). Our goalistond as hedulepro essingtheloadof agiven size inthe shortest possible time. As ea h pro essor an re eive many messages,

there are more s heduling de isions to be made than in the ase of single-round

pro essing:

•

The set

P

′

_{⊆ P}

of pro essors parti ipating in the omputations must be

hosen.

(37)

mu h larger then the number of pro essors

m

.

•

The ommuni ationsequen e must be hosen. Formulti-roundpro essing, the ommuni ation sequen e is an arbitrary sequen e whose elements are

indi es of pro essorsfrom the set

P

′

.

•

The sizes of the load parts sent in ea hmessage must besele ted.

We start our onsiderations with a short summary of the previous work on

multi-round divisible load s heduling. In Se tion 3.2 we des ribe the

mathe-mati al model used in this hapter. As our s heduling problem is known to be

omputationallyhard, wepropose anexponentialBran h&Bound algorithmand

a geneti algorithm in Se tion 3.3. We use the geneti algorithm not only as

a metaheuristi solving the s heduling problem, but also to gather information

about the features of good quality solutions. The results obtained from an

ex-tensive experimental study, as well as some analyti al results, are presented in

Se tion3.4. Basedonthisinformation,inSe tion3.5weproposeseveral lassesof

s heduling heuristi s. Weanalyze and ompare them, exposing their advantages

and weaknesses.

3.1 Earlier Results

S heduling divisible loads in systems with limited memory was rst analyzed in

[37℄. The authors onsidered single-round s hedules only, hen e they assumed

that thewholeloadts inthe memorybuersofthe workers. Otherassumptions

were that all pro essors take part in the omputations and that the a tivation

sequen e isgiven. The ommuni ationdelaymodelwaslinear(

S

i

= 0

for

1 ≤ i ≤

m

). A fast heuristi alled In remental Balan ing Strategy was proposed. This algorithmdid not always deliveroptimum solutions,what was shown in [30℄.

A more general ane ommuni ation delay model was studied in [30℄. A

(38)

givena tivationsequen e. Choosingtheoptimumset

P

′

ofpro essorstakingpart

in the omputations in systems with limited memory and ane ommuni ation

modelhasbeenshowntobeNP-hardin[31℄ andstronglyNP-hardin[4℄. In[31℄

the authors proposed and evaluated experimentally aBran h&Bound algorithm

and several heuristi s for single-round s hedulingwith limited memory.

Multi-round divisible load s heduling with limited memory was rst studied

in [29℄. Only the size of the hunk urrently pro essed by a given pro essor was

subje t tothe memory limit. The sizes of load parts arriving in the ba kground

of omputations were not taken into a ount. A more detailed memory model,

in whi h memory limitsae ted all hunks ofdata existing at agiven pro essor,

wasusedin[26℄. ABran h&Boundalgorithmandageneti algorithmsolvingthe

analyzeds hedulingproblemwereproposed. However,themathemati almodelof

memory managementwassimpliedtomake theproblemmore tra table. It was

assumed thatmemoryo upationisde reasinglinearlyduringthe omputations.

This simpli ation has been removed in [27℄. Wedis uss it inmore detail inthe

next se tion.

3.2 Problem Formulation

Before we present the mathemati al model used in this hapter, let us briey

analyze dierent models of memory management. The simplest approa h is to

assumethatonlyone load hunkmaybepresentinthe memoryofa omputerat

atime [31,37℄. The sizeof apie eof datasenttopro essor

P

i

annotex eedthe limit

B

i

( f. Fig. 3.1a). Thus, a pro essor annot perform omputations while re eiving a new pie e of load. This results in long idle times and de reases the

e ien y of pro essing.

In [26℄ it was assumed that ea h pro essor an store multiple load hunks