Fa ulty of Mathemati s and Computer S ien e
Joanna Berli«ska
S heduling divisible loads
in heterogeneous distributed systems
Ph.D. Thesis
Supervisor: Prof. Dr. Habil. Ma iej Drozdowski
IwishtoexpressmydeepgratitudetomysupervisorProfessorMa iejDrozdowski
for hiskeen interest,inspiration andperfe tguidan e throughoutthe ompletion
of this thesis. He introdu edme to the ex itingeld of divisible load theoryand
motivated me to ondu t original resear h with high standards. I am sin erely
grateful for hiseort intraining me tobe omea su essful resear her.
Theresear hreportedinthisthesishas beennan iallysupportedbythe
Pol-ishMinistryofS ien e and HigherEdu ation grants NN206372039"S heduling
divisible loads in heterogeneous distributed systems" and N N519 188933 "New
problems of s hedulingtheory omplexity analysis, algorithmization".
The work presented in this thesis has been also partially supported by the
1 Introdu tion 6
2 Single-Round Pro essing 10
2.1 Earlier Results . . . 13
2.2 FPTASfor Problem DLS{
C
i
= 0
}-OptV
. . . 142.3 FPTASfor Problem DLS{
C
i
= 0
}-OptT
. . . 182.4 Communi ation Sequen e for ProblemDLS{1Round} . . . 24
2.5 ApproximationAlgorithms for ProblemDLS{1Round} . . . 29
2.5.1 Problem DLS{1Round}-Opt
V
. . . 292.5.2 Problem DLS{1Round}-Opt
T
. . . 342.6 Con lusions . . . 35
3 Multi-Round Pro essing with Limited Memory 36 3.1 Earlier Results . . . 37
3.2 ProblemFormulation . . . 38
3.3 Bran h&Bound Algorithmand Geneti Algorithm . . . 43
3.3.1 Bran h&Bound Algorithm . . . 45
3.3.2 Geneti Algorithm . . . 46
3.3.3 Comparison of B&Band GA. . . 49
3.4 Properties of the Solutions . . . 52
3.4.1 Depth of Overlap . . . 53
3.4.4 DominatingSet of Pro essors . . . 67
3.4.5 Chunk Size Saturation . . . 74
3.4.6 When Is ItHard toFind a Good Solution? . . . 76
3.4.7 Con lusions . . . 80
3.5 Heuristi s . . . 82
3.5.1 Random Heuristi s . . . 82
3.5.2 First Free Heuristi . . . 84
3.5.3 Appender Heuristi s . . . 84
3.5.4 Best Rate Heuristi s . . . 86
3.6 Comparison of the Heuristi Algorithms . . . 87
3.6.1 Load Size . . . 88 3.6.2 Startup Time . . . 90 3.6.3 Communi ationRate . . . 92 3.6.4 Memory Limit. . . 93 3.6.5 ComputationRate . . . 95 3.6.6 Parameters Dispersion . . . 95 3.6.7 Performan e Dispersion . . . 97 3.7 Summary . . . 98 4 MapRedu e Computations 100 4.1 Outlineof MapRedu e . . . 100
4.2 Mathemati al Model of MapRedu e . . . 102
4.3 S hedule Dominan eProperties . . . 108
4.3.1 Pro essing with aSingle Redu er . . . 108
4.3.2 Pro essing with Many Redu ers . . . 113
4.4 S hedulingAlgorithms . . . 118
4.4.1 Single Redu er . . . 118
4.6 Summary . . . 136
5 Multilayer Divisible Appli ations 138 5.1 Model of MultilayerAppli ations . . . 138
5.2 S hedulingAlgorithms . . . 143
5.2.1 Load PartitioningforRedu er Layers . . . 143
5.2.2 Load PartitioningforMapper Layer. . . 146
5.2.3 The Complete Load PartitioningAlgorithm . . . 148
5.2.4 Finishing MapperComputations Order . . . 148
5.2.5 S hedulingCommuni ations . . . 152
5.3 ComputationalExperiments . . . 159
5.3.1 Speedup of MultilayerAppli ations . . . 159
5.3.2 Load Distribution between Redu ers . . . 161
5.3.3 Load Distribution between Mappers . . . 165
5.4 Summary . . . 168
6 Summary and Con lusions 170
The progress in many dis iplines of s ien e and te hnology is nowadays strongly
supported by omputationalmethods. The resear hisoftenbased onthe results
deliveredby omplexandtime- onsuming al ulations. The omputationalpower
of a single omputer is often insu ient. Hen e, performing the omputations
in distributed environments like grids or lusters be omes a ne essity. What is
more,usingadistributed omputersystemhas manyadvantages. Largenumbers
of pro essors taking part in omputations result in big total omputing power.
The system is s alable and the time needed for omputations an be redu ed
by employing more pro essors. On the other hand, ontrolling omputations in
a distributed system is more omplex. In order to obtain high e ien y, the
distributed appli ations need areful s heduling of ommuni ations and
ompu-tations. As the omputers may bespread around the world, the ommuni ation
delays maybequitebigand annotbenegle ted. The distributed omputer
sys-tem is usually heterogeneous, and onsequently, the dierent parameters of its
elements must be taken into a ount by the s heduling algorithms.
Divisibleloadtheory(DLT)isamodelofparallel omputationswhi hoersa
realisti approa htothisproblem. Itismostly usedtorepresentpro essinglarge
amounts of data in distributed systems. It assumes that the input data, alled
load, an be divided into pie es of arbitrary sizes and these pie es an be
pro- essed independently in parallel onremote omputers. The divisible load model
work of intelligentsensorswas studied. In both ases, the analyzed problemwas
howtos hedule ommuni ationsand omputations,sothatthetotaltimeneeded
topro ess the loadof agivensize is asshortaspossible. On the onehand, using
more pro essors redu es omputation time,but on the other handit needs more
ommuni ations,whi h osttime. Hen e,the problemiswhi hpro essorsshould
beused andwhat load quantitiesthey shouldre eive. The mathemati almodels
proposed in the early publi ations were omputationally tra table and redu ed
the s heduling problem to a set of linear equations. Later on, more omplex
models were developed and appliedtovariousnetworktopologies[16,20,21, 25℄,
systems with memory limitations[12, 30, 37℄, omputation osts [46℄ and other.
The most general divisible load s heduling problem was proved to be NP-hard
in [48℄. Surveys of divisible load theory an befound, e.g., in[3, 14, 24, 45℄. We
dis uss these results inmore detail inthe following se tions.
Therearemanyexamplesofdivisibleload omputations,likepro essing
mea-surement data [20℄, sear hing for patterns in text and database les [28℄, image
and video pro essing [38, 39, 43℄, solving linear algebra problems [22, 32℄, DNA
sequen e alignment [47℄. As we showed in [7, 10℄, pro essing large amounts of
data in MapRedu e model [23℄ on dedi ated lusters an also be analyzed on
the grounds of divisible load theory. Moreover, the omputations on volunteer
platforms like BOINC and distributed.net fulll the assumptions about the
di-visibility and independen e of the load grains. Therefore, the progress in DLTis
useful ine iently managingmany real distributed appli ations.
The maingoal of this work isthe analysisof several divisibleload s heduling
problemsinheterogeneousdistributedsystemsandthe onstru tionofalgorithms
solving these problems. As the analyzed problems are known to be
omputa-tionally hard, we will propose approximation algorithms and heuristi s. The
s hedule omputationsinnew parallelpro essingenvironments,likethe
MapRe-du e framework. Wewill onstru t a mathemati almodel of su h omputations
and propose s heduling algorithms. Performan e limits of the proposed
organi-zation of omputationswill beinvestigated.
The stru tureof this thesis isthe following. Chapter 2 isdedi ated to
single-round divisible load s heduling. In the single-round pro essing ea h omputer
re eives at most one messagewith the data topro ess. The s hedulingproblem
iswhi hpro essorsshouldtakepart in omputations,whatamountsofdatathey
should re eive and inwhat order. Our main ontributions presented inChapter
2 are fully polynomialtime approximation s hemes fortwos heduling problems.
These results have been already published in [6℄. Extensions to more general
ases are alsoanalyzed.
Chapter3 overs multi-rounddivisibleloads hedulinginsystemswithlimited
memory. Multi-roundpro essing meansthat ea h pro essor an re eive multiple
messageswithdatatopro ess. Itisassumedthatthewholeloadistoobigtostore
it in the memories of the omputers at the same moment. Therefore, the load
must bedistributedand pro essedinmanysmall pie esttingavailablememory
buers. Weprovideanexperimentalstudy ofthe features ofnear-optimum
solu-tions, and hen e, the nature of the s heduling problem. Based on these results,
several groups of heuristi s solving the analyzed problems are proposed. Their
advantages and weaknesses are demonstrated for a wide range of hanging
sys-tem parameters. The experimental omparison of the proposed algorithms with
the heuristi s known fromearlier literatureshows thata bigimprovementinthe
quality of the obtained solutions has been a hieved. The results ontained in
Chapter 3have been published in [8,9, 11, 12℄.
Chapter 4 introdu es MapRedu e paradigm for parallel omputations. We
appli a-formulatethe mathemati almodel ofsu h omputationsand propose s heduling
algorithms. Then, an experimental analysis of the MapRedu e performan e is
provided. Theseresultshavebeen publishedin[7,10℄. It wastherst timewhen
s heduling divisible loads withpre eden e onstraints was studied.
InChapter5theproblem onsideredinChapter4isgeneralized. Weintrodu e
thenotionofamultilayerappli ation. Anexampleofamultilayerappli ationisa
hainofMapRedu eappli ations,su hthatoneappli ationinthe hainprodu es
input for the next appli ation. The inuen e of the system parameters on the
stru ture of the s hedules isstudied.
The last hapter ontains a summary of all the presented results. We also
propose dire tions for future resear h on the aspe ts of divisible load theory
Inthis hapterwestudydivisibleloads hedulingforsingle-roundorganizationof
omputations. Let usstart with some generalassumptions about the omputing
environment. In this work we assume that ea h pro essor omprises a CPU,
somememoryandahardwarenetworkinterfa e(e.g. NICandDMA).Thewords
pro essor, omputer and pro essing element willbe used inter hangeably, unless
said to be otherwise. The CPU and network interfa e an work in parallel, so
that simultaneous omputation and ommuni ation is possible. Ea h omputer
an ommuni ate with at most one pro essor at a time (i.e. so- alled one-port
model is used).
In Chapters 2 and 3 we onsider lassi al divisible load s hedulingproblems
in a star network (see Fig. 2.1). The load tobe pro essed is initiallylo ated on
pro essor
P
0
alledtheoriginator,lo atedinthe enterofthestar. Theoriginator is onne ted to a set ofm
pro essors (workers){P
1
, . . . , P
m
}
. The originator dividestheloadintopie esandsendsthemdire tlytotheworkers. Su halogi altopology an represent many parallel systems with dierent physi al topologies,
likeagrid ofmultipro essorsuper omputers, a lusterof workstations onne ted
via a lo alarea network, or aset of pro essorssharing a bus inan SMP system.
We assume that the originator only dispat hes the load to the other pro essors
and performs no omputations. In the opposite ase, the omputational power
of the originator an be represented asanadditionalpro essor. Forsimpli ity of
analyzed. Pra ti ally, it means that the results returning time is short and an
benegle ted. It has been shown in[18,28℄ thatthis simpli ationisnotlimiting
the generality of our onsiderations, as sending results ba k an be in luded in
the model.
Ea h worker
P
i
is des ribed by its omputing rate (inverse of speed, e.g. in se onds perbyte), denoted byA
i
. Pro essingloadofsizeα
onP
i
takestimeαA
i
. The ommuni ation link betweenP
i
and the originator is des ribed by startup timeS
i
(e.g. in se onds) and ommuni ation rate (inverse of bandwidth)C
i
. Hen e, the time required to send load of sizeα
to pro essorP
i
isS
i
+ αC
i
. We will use the notationA
max
= max
1≤i≤m
A
i
,A
min
= min
1≤i≤m
A
i
, and similarly for the other parameters. In the general ase, all parametersA
i
,C
i
,S
i
are nonnegative rationalnumbers.Below we formulate several single-round divisible load s heduling problems.
Wefollowthenotationusedin[48℄,wheredierentdivisibleloads heduling
prob-lems are denoted by DLS
{restriction}
. The restri tion is the list of additional assumptions inthe analyzedproblem. Theserestri tions may be,for example:•
1Round for single-rounds heduling problems,• C
i
= 0
if all the bandwidths are innite(C
i
= 0
for all1 ≤ i ≤ m
),• S
i
= 0
if there are nostartuptimes (S
i
= 0
for all1 ≤ i ≤ m
).Thede isionversionofthegeneralsingle-rounddivisibleloads heduling
Given
m
workers, theirparametersA
i
,C
i
andS
i
for1 ≤ i ≤ m
, andtwo rational numbersV > 0
andT > 0
, is it possible to pro ess load of sizeV
within timeT
from the moment whenthe originator starts sendingout the load?Wealsodene thefollowingtwooptimizationproblems onne tedwith
prob-lem DLS
{
1Round}
.Problem 2.2. (DLS{1Round}-Opt
V
)Given a rational time
T > 0
,m
workers, their parametersA
i
,C
i
andS
i
for1 ≤ i ≤ m
, nd the greatest rational numberV
OP T
(T )
, su h that it is possible to pro ess load of sizeV
OP T
(T )
within timeT
.Problem 2.3. (DLS{1Round}-Opt
T
)Given a rational load size
V > 0
,m
workers, their parametersA
i
,C
i
andS
i
for1 ≤ i ≤ m
, nd the smallest rational numberT
OP T
(V ) ≥ 0
, su h that it is possible to pro ess the whole loadV
within timeT
OP T
(V )
.Let us note that we are interested not only in nding the optimum time
T
or the amount of loadV
, but also in onstru ting the optimum s hedule. Constru ting as hedule involvesmaking the followingde isions:•
The setP
′
⊆ P
of pro essors parti ipating in the omputations must be
hosen. Dependingontheparametersofthe pro essorsand ommuni ation
links, itmay be unprotable touse some of them for omputations.
•
The ommuni ationsequen e (also alleda tivationsequen e), deningthe order in whi h the pro essors re eive load, must be hosen. Forsingle-round pro essing, the ommuni ation sequen e is a permutation of indi es
of pro essors fromthe set
P
′
.
The early publi ations on erning s heduling divisible loads in a star system
used asimple linear ommuni ation model. All ommuni ation startup times
S
i
were assumed to be equal to zero. The analyzed problems were DLS{1Round,S
i
= 0
} and the adequate optimization problems. It was proved independently in[5,13, 17,35℄ thatif allworkers takepart inthe omputationsandnish workat the same moment, then the problem DLS{1Round,
S
i
= 0
} an be solved by sorting the pro essors by nonde reasingC
i
in the a tivation sequen e. The hypothesis that inthe optimum solutionallworkers parti ipate in omputationsand nish work simultaneously was proved in [3℄.
The assumption about linear ommuni ation osts usually does not hold in
pra ti e. It has a side ee t that all pro essors an take part in the
omputa-tions, no matter how many of them are available, and no matter how far from
the originatorthey are. Hen e, a more realisti ane ommuni ation model,
in- luding startup times, was introdu ed by Bªa»ewi z and Drozdowski in [17℄. In
publi ation [3℄ itwas shown that inthe optimum solutions of both optimization
versionsoftheproblemDLS{1Round} allpro essorstakingpartin omputations
nishworkatthesamemoment. Additionally,theauthorsprovedthatiftheload
size
V
is large enough, then in any optimum solution all workers parti ipate in the omputationsand they should bea tivatedinthe order ofnonde reasingC
i
. The omplexity of single-round divisible load s heduling problem remainedopen until 2007. Finally, in [48℄ it was proved that the problem DLS{1Round,
C
i
= 0
}isNP- omplete. Theproofwasdonebyredu tionfromtheNP- omplete 2-Partition problem. The authors proposed pseudo-polynomial dynamipro-gramming algorithms solving the problems DLS{1Round,
C
i
= 0
}-OptV
and DLS{1Round,C
i
= 0
}-OptT
. However, sin e pseudopolynomial algorithms are infa texponential,it an bemore usefulto reate polynomialapproximational-that an be derived for NP-hardproblems (unlessP=NP) isafully polynomial
timeapproximations heme(FPTAS).AnFPTASforanoptimizationproblem
Π
with ost fun tionf
isanapproximationalgorithmA
whi h forany givenε > 0
and an instan eI
of problemΠ
•
returnsasolutionA(I)
su hthat|f (A(I)) − OP T (I)| ≤ ε|OP T (I)|
,whereOP T (I)
is the optimum ost for instan eI
, and•
has running time polynomial inthe size ofI
and1/ε
.Constru ting fully polynomial time approximation s hemes for DLS{1Round,
C
i
= 0
}-OptV
and DLS{1Round,C
i
= 0
}-OptT
is the aim of the next two se tions.2.2 FPTAS for Problem DLS{
C
i
= 0
}-OptV
Let us start with an observation that if
C
i
= 0
for1 ≤ i ≤ m
, then nothing an be gained by sending more then one message to the same pro essor. Hen e, forthe divisible loads hedulingproblemwith
C
i
= 0
foralli
,there alwaysexists an optimumsolutionusingoneroundonly. Consequently,we anwriteDLS{C
i
= 0
} instead of DLS{1Round,C
i
= 0
}, be ause these two problems are equivalent.We begin our onsiderations with the problem of optimizing the size of the
load pro essed in a given time
T
. Similarly as in [48℄, we assume here thatA
i
andS
i
are integer numbers. The problem an beformulatedas follows.Problem 2.4. (DLS{
C
i
= 0
}-OptV
)Given a rational time
T > 0
,m
workers, their integer parametersA
i
andS
i
for1 ≤ i ≤ m
, andprovidedthatthebandwidthsareinnite,ndthegreatestrational numberV
OP T
(T )
, su h that it is possible to pro ess load of sizeV
OP T
(T )
within timeT
.Let us note that if
S
i
> T
for some pro essorP
i
, then this pro essor annot be used for pro essing load in timeT
. Therefore, we assume thatS
i
≤ T
for1 ≤ i ≤ m
. Moreover, ifA
i
= 0
for some pro essorP
i
, thenP
i
an re eive and pro ess aninniteamountof loadintimeS
i
. AsS
i
≤ T
,the s hedulingproblem be omestrivialin this ase. Hen e, we assume thatA
i
> 0
for1 ≤ i ≤ m
.Inorderto onstru tanFPTASsolvingProblem2.4,weneedtoknowinwhat
order the pro essors should be a tivated. We will use the following proposition
given in [48℄.
Proposition 2.1. For a given time limit
T
and a setP
′
⊆ {P
1
, . . . , P
m
}
of workers taking part in the omputations, the maximum load is pro essed if theworkers are ordered a ording to nonde reasingvalues of
S
i
A
i
forP
i
∈ P
′
.
Proposition 2.1 an be proved by the inter hange argument: ordering the
pro essors in
P
′
a ording to nonde reasing
S
i
A
i
does not redu e the amount of load pro essed in timeT
.As it is known from [3℄ that in the optimum solution all pro essors taking
partin omputationsnishworkatthesamemoment,itfollowsfromProposition
2.1 that the s heduling problem an be redu ed to hoosing an optimum subset
of pro essors taking part in the omputations. Let us assume, without loss of
generality,that
S
1
A
1
≤ . . . ≤ S
m
A
m
. Wedeneabinary ve torx
= (x
1
, . . . , x
m
)
as follows:x
i
= 1
if pro essorP
i
re eives some load to pro ess (i.e.P
i
∈ P
′
)
and
x
i
= 0
in the opposite ase (P
i
∈ P
/
′
). The maximum amount of load whi h
an be pro essedin time
T
using the subset of pro essors indi ated byx
an be obtained from the formulaV
OP T
(T, x) =
m
X
i=1
T x
i
A
i
−
m
X
i=1
m
X
j=i
x
i
x
j
S
i
A
j
.
(2.1) The expressionP
m
i=1
T x
A
i
i
is the amount of loadwhi h ould be pro essedin time
Commu-ni ationwithpro essor
P
i
takestimex
i
S
i
. During thistimepro essorsP
j
,wherej ≥ i
, annotpro ess any load be ause they did not re eive the inputyet. Thus,P
m
i=1
P
m
j=i
x
i
x
j
S
i
A
j
is the amount of load whi h is lost be ause of ommuni ation
delays ( f. [48℄).
Our goal is to maximize the size
V
of load pro essed in a given timeT
as a fun tion of a binary ve torx
= (x
1
, . . . , x
m
)
. Instead of maximizingV (x)
, we willminimizethe value of−V (x)
. Sin ex
i
are binaryvariables,wehavex
2
i
= x
i
. Hen e wehave−V (x) = −
m
X
i=1
T − S
i
A
i
x
i
+
X
1≤i<j≤m
S
i
1
A
j
x
i
x
j
.
(2.2) A half-produ t [2℄is afun tionf : {0, 1}
m
→ R
of the formf (x) = f (x
1
, . . . , x
m
) = −
m
X
i=1
p
i
x
i
+
X
1≤i<j≤m
q
i
r
j
x
i
x
j
,
(2.3)where
p
i
,q
i
,r
i
are nonnegative onstants for1 ≤ i ≤ m
. Thus,−V (x)
is a half-produ t, withp
i
=
T −S
i
A
i
,q
i
= S
i
,r
j
=
1
A
j
.An FPTAS for minimizing half-produ ts was proposed by Badi s and Boros
in [2℄. They assumed that the parameters
p
i
, q
i
, r
i
are nonnegative integers for1 ≤ i ≤ m
. Inour aseallparametersarenonnegative,butp
i
=
T −S
i
A
i
and
r
j
=
1
A
j
are not integer. However, the assumption about integrality of
p
i
andr
i
is used neither for proving the orre tness of the Badi s and Boros algorithm, nor forestimating itsrunningtime. Therefore, we an use the algorithmproposed in[2℄
tominimizethefun tion
−V (x)
. Thealgorithmre eivesnumberm
,ve torsp
,q
,r
of lengthm
, and a positive approximation pre isionε < 1
. It returns a binary ve torx
ε
= (x
ε
1
, . . . , x
ε
m
)
. For1 ≤ k ≤ m
, letg
k
(x) = −
P
k
i=1
p
i
x
i
+
P
1≤i<j≤k
q
i
r
j
x
i
x
j
andQ
k
(x) =
P
k
i=1
q
i
x
i
. The FPTAS for minimizing half-produ ts proposed by Badi s and Boros is formulatedin Algorithm2.1 ( f. [2℄).Algorithm 2.1MINIMIZE-HALF-PRODUCT(
m
,p
,q
,r
,ε
) STEP 0:Let
δ > 0
be dened by the equation(1 + δ)
m
= 1 + ε
, letQ =
P
m
i=1
q
i
,N = ⌈
2m log Q
ε
⌉
,k = 0
andX
0
= {()}
. STEP 1: Letk = k + 1
,X
k
= ∅
,t = 0
,s = 0
,L = {(y
1
, . . . , y
k−1
, 0), (y
1
, . . . , y
k−1
, 1)|(y
1
, . . . , y
k−1
) ∈ X
k−1
}
STEP 2: whiles ≤ N
dosele t
z
= (z
1
, . . . , z
k
) ∈ L
for whi ht ≤ Q
k
(z) < (1 + δ)
s
and forwhi h
g
k
(z)
is the smallest amongallsu hz
. LetX
k
= X
k
∪ {z}
,t = (1 + δ)
s
,s = s + 1
. end while STEP 3: ifk < m
then goto STEP 1 else goto STEP 4. end if STEP 4: Sele tx
ε
∈ X
m
with the smallestg
m
(x
ε
)
, return
x
ε
.
It was proved in [2℄that
f (x
ε
) ≤ f (x
∗
) + ε|f (x
∗
)|,
(2.4)where
x
∗
isave torminimizing
f
,andtherunningtimeofthealgorithm MINIMIZE-HALF-PRODUCT isO(m
2
log(
P
m
i=1
q
i
)/ε
) [2℄.Basedonthese results, we propose Algorithm2.2 forProblem 2.4 [6℄.
Theorem 2.2. Algorithm 2.2 is a fully polynomial time approximation s heme
Algorithm 2.2FPTAS-OPT-V(
T, m, A, S, ε
) fori = 1
tom
dop
i
=
T −S
A
i
i
q
i
= S
i
r
i
=
A
1
i
end forx
ε
=MINIMIZE-HALF-PRODUCT(m
,p
,q
,r
,ε
) returnx
FPTAS
(T, ε) = x
ε
,V
F P T AS
(T, ε) =
P
m
i=1
T x
ε
i
A
i
−
P
m
i=1
P
m
j=i
x
ε
i
x
ε
j
S
i
A
j
Proof. Sin e
x
FPTAS
(T, ε)
is returned by the MINIMIZE-HALF-PRODUCT al-gorithm forthe fun tion−V (x)
,weget from(2.4)−V
F P T AS
(T, ε) ≤ −V
OP T
(T ) + ε| − V
OP T
(T )|.
(2.5)Astheamountofload
V
OP T
(T )
isalwaysnonnegative,thisformula anbe rewrit-ten as−V
F P T AS
(T, ε) ≤ −V
OP T
(T ) + εV
OP T
(T ).
(2.6)Hen e,
V
F P T AS
(T, ε) ≥ V
OP T
(T )(1 − ε).
(2.7)Moreover, the running time of Algorithm 2.2 is dominated by the running time
of MINIMIZE-HALF-PRODUCT, andisequal toatmost
O(m
2
log(
P
m
i=1
S
i
)/ε)
, whi hisboundedfromabovebyO(m
2
(log m+log S
max
)/ε)
. Hen e,Algorithm2.2 is anFPTAS for Problem2.4.2.3 FPTAS for Problem DLS{
C
i
= 0
}-OptT
The se ond optimization problem we willanalyze is DLS{
C
i
= 0
}-OptT
, whi h an beformulatedin the following way.Problem 2.5. (DLS{1Round}-Opt
T
)Given a rational load size
V > 0
,m
workers, their integer parametersA
i
andS
i
for1 ≤ i ≤ m
, and provided that the bandwidths are innite, nd the smallest rational numberT
OP T
(V ) ≥ 0
, su h that it is possible to pro ess the whole loadV
within timeT
OP T
(V )
.To reate an approximation s heme for Problem 2.5, we will use the dual
approximation algorithm approa h proposed in [34℄. As stated in [34℄, a dual
approximation algorithm is an algorithm whi h nds a superoptimal infeasible
solution of a given optimization problem. The performan e of the algorithm is
measured by the degree of the infeasibility of the solution, ontrolledby a given
value
ε > 0
. We will onstru t a dual approximation algorithmfor Problem2.4 (DLS{C
i
= 0
}-OptV
). This algorithm should a ept a period of timeT
and a ura yε
(0 < ε < 1
),and deliveras hedule pro essingthe loadof size atleastV
OP T
(T )
intime not longer thanT (1 + ε)
. We propose the followingAlgorithm 2.3 [6℄.Algorithm 2.3DUAL-OPT-V(
T, m, A, S, ε
) all FPTAS-OPT-V(T, m, A, S, ε/2
)return
x
DUAL
(T, ε) = x
FPTAS
(T, ε/2)
,V
DU AL
(T, ε) = (1 + ε)V
F P T AS
(T, ε/2)
In order to prove that Algorithm 2.3 is a dual approximation algorithm for
Problem 2.4, we willuse the following fa t.
Proposition 2.3. If it is possible to pro ess load of size
V
in timeT
using the subset of pro essors indi ated by a binary ve torx
= (x
1
, . . . , x
m
)
, then it isalso possible topro ess load of sizeV (1 + ε)
in time atmostT (1 + ε)
, usingthe same subset of pro essors.Proof. Let
V
′
denote the maximum size of load whi h an be pro essed in time
T (1 + ε)
using the pro essors indi ated by the ve torx
. From (2.1) we obtainV
′
=
m
X
i=1
T (1 + ε)x
i
A
i
−
m
X
i=1
m
X
j=i
x
i
x
j
S
i
A
j
(2.8) andV =
m
X
i=1
T x
i
A
i
−
m
X
i=1
m
X
j=i
x
i
x
j
S
i
A
j
.
(2.9) Hen e,V
′
= (1 + ε)V + ε
m
X
i=1
m
X
j=i
x
i
x
j
S
i
A
j
≥ V (1 + ε).
(2.10)Notethat if
T = T
OP T
(V )
, then by Proposition 2.3 load of sizeV (1 + ε)
an be pro essed in time not longer thanT
OP T
(V )(1 + ε)
. Hen e, as a orollary, we an formulatethe following proposition.Proposition 2.4. For any numbers
V ≥ 0
andε > 0
we haveT
OP T
(V (1 + ε)) ≤ T
OP T
(V )(1 + ε).
(2.11)We willsay that analgorithmis a fully polynomialtime dual approximation
algorithm for a given problem if it is a dual approximation algorithm for this
problem with approximation pre ision
ε
and its running time is polynomial in both the problemsize and1/ε
.Theorem 2.5. Algorithm 2.3 is a fully polynomial time dual approximation
al-gorithm forProblem 2.4 (DLS
{C
i
= 0}
-OptV
).Proof. As
V
DU AL
(T, ε) = (1 + ε)V
F P T AS
(T, ε/2)
inAlgorithm2.3,weobtainfrom (2.7) thatbe ause
ε < 1
. Thus, the obtained solutionis superoptimal. The time needed to pro ess the loadof sizeV
DU AL
(T, ε)
isatmostT (1 + ε)
by Proposition 2.3, asit is possible topro ess load of sizeV
F P T AS
(T, ε/2)
in timeT
.The running time of Algorithm 2.3 is determined by the all to algorithm
FPTAS-OPT-V, when e it isequal toat most
O(m
2
(log m + log S
max
)/ε)
.Thedualapproximationalgorithm2.3isthekeyelementoftheFPTASsolving
Problem 2.5 (DLS{
C
i
= 0
}-OptT
),given inAlgorithm 2.4. Algorithm 2.4FPTAS-OPT-T(V, m, A, S, ε
)upper
=S
max
+ V A
max
lower
= 0LoBo = V A
min
/m
while
(upper − lower) >
ε(1−ε)
(2−ε)
LoBo
doT
p
= (upper + lower)/2
all DUAL-OPT-V(T
p
, m, A, S, ε
) ifV
DU AL
(T
p
, ε) < V (1 + ε)
thenlower = T
p
elseupper = T
p
end if end whileall FPTAS-OPT-V(
upper, m, A, S, ε/2
) returnx
= x
FPTAS
(upper, ε/2)
,T = upper
The idea of Algorithm 2.4 is to nd a good approximation of
T
OP T
(V )
with a binary sear h. The initial sear h interval[lower, upper]
is dened by trivial lower and upper bounds forT
OP T
(V )
. Then, it is iteratively narrowed to its loweror upperhalf, depending onthe results delivered by Algorithm 2.3 for theurrently examined value
T
p
. When the sear h interval be omes short enough, the sear hing pro edure is nished and the ve torx
representing the subset of pro essors whi hshould be used for omputationsis obtained by Algorithm 2.2.for Problem 2.5 (DLS
{C
i
= 0}
-OptT
).Proof. Let us start with the observation that at the beginning of the algorithm
upper
andlower
are trivial upper and lower bounds forT
OP T
(V )
.LoBo
is also a lower bound onT
OP T
(V )
and it is positive, sin e we assumed thatA
i
> 0
for1 ≤ i ≤ m
.First,we willanalyzethe variable
upper
inorder toprove that the algorithm always returns a feasible solution. At the beginning of the algorithm we haveupper = S
max
+ V A
max
. Ifthis value is not hanged in the binary sear h while loop, then the algorithm FPTAS-OPT-V is alled for parametersT = upper =
S
max
+V A
max
andapproximationpre isionε/2
attheendofexe utingAlgorithm 2.4. The obtained s hedule allows for pro essing the load of size at leastV
, as it isenoughto hoose anynonempty subset ofthe set{P
1
, . . . , P
m
}
topro essV
units of load in timeT = S
max
+ V A
max
.Nowletusassumethatthevalueof
upper
is hangedatleaston etoT
p
. This happens onlyifV
DU AL
(T
p
, ε) ≥ V (1 + ε)
. Therefore, aswe have inAlgorithm2.3V
DU AL
(T, ε) = (1 + ε)V
F P T AS
(T, ε/2),
(2.13)there holds
V
F P T AS
(upper, ε/2) = V
DU AL
(upper, ε)/(1 + ε) ≥ V
(2.14)at any timeduring the exe utionof Algorithm2.4. Hen e, the solution obtained
by the algorithmFPTAS-OPT-T is always feasible.
Now letus estimatethe quality of the obtained solution. We willshow that
lower < T
OP T
(V )(1 +
ε
throughouttheexe utionoftheprogram. Sin einitially
lower = 0
,this ondition istruebeforeenteringintothewhileloop. Thevalueofvariablelower
is hanged toT
p
only whenV
DU AL
(T
p
, ε) < V (1 + ε)
. It follows from(2.13) that(1 + ε)V
F P T AS
(lower, ε/2) < V (1 + ε).
(2.16)Furthermore, from(2.7) we get
(1 + ε)V
OP T
(lower)(1 − ε/2) < V (1 + ε),
(2.17)V
OP T
(lower) < V /(1 − ε/2)
(2.18) and nallyV
OP T
(lower) < V (1 +
ε
2 − ε
).
(2.19)Thus, itis impossibleto pro ess load
V (1 +
ε
2−ε
)
intimelower
. Hen e,lower < T
OP T
(V (1 +
ε
2 − ε
)).
(2.20) By Proposition2.4 wehaveT
OP T
(V (1 +
ε
2 − ε
)) ≤ T
OP T
(V )(1 +
ε
2 − ε
),
(2.21)what provesthat (2.15)is true duringthe binary sear h.
Thebinarysear hisnishedwhen
upper ≤ lower+
ε(1−ε)
(2−ε)
LoBo
. Sin eLoBo ≤
T
OP T
(V )
, by (2.15) we getupper ≤ T
OP T
(V )(1 +
ε
2 − ε
) +
ε(1 − ε)
(2 − ε)
T
OP T
(V )
(2.22) and onsequentlyupper ≤ T
OP T
(V )(1 + ε).
(2.23)of the problem.
Thenumberofiterationsinthebinarysear hisatmostequalto
O(log((S
max
+
V A
max
)/(
ε(1−ε)
(2−ε)
V A
min
/m)))
,whi hisbounded fromabovebyO(log m + log S
max
+ log A
max
+log(1/ε)+max(log V, log(1/V ))).
Theexe utiontimeofea hiteration isO(m
2
(log m+log S
max
)/ε)
dueto allingAlgorithm2.3. Thus,therunningtime ofthewholealgorithmFPTAS-OPT-TisatmostO((log m+log S
max
+log A
max
+
log(1/ε) + max(log V, log(1/V )))m
2
(log m + log S
max
)/ε)
.2.4 Communi ation Sequen e for Problem
DLS{1Round}
It would bedesirable toextend the approximabilityresults presented inthe
pre- edingse tionstoproblemsDLS{1Round}-Opt
V
andDLS{1Round}-OptT
. Note that DLS{1Round,C
i
= 0
} is a sele tionproblem. This means that itis ompu-tationally hard to sele t the setP
′
of parti ipating pro essors, but for a given
P
′
theoptimum a tivation sequen e isknown. Moreover, thisfeature allowed for
onstru tion of an FPTAS sele ting the set
P
′
of parti ipating pro essors. The
main di ulty in problem DLS{1Round} is that for instan es with
C
i
> 0
, the optimumorderof a tivatingthe pro essorsisnotknown. Therefore, thes hedul-ingproblems annotberedu edtojust hoosingthepro essorswhi hshouldtake
partin omputations. Letusremindthatageneralmethodoforderingpro essors
should overspe ial ases:
•
ordering pro essors a ording to nonde reasing valuesS
i
A
i
if allC
i
are equal tozero,•
orderingpro essors a ording tononde reasing valuesC
i
if allS
i
are equal to zero,be pro essed orthe time
T
used for pro essingis large enough.Let us analyze the a tivation sequen e for problem DLS{1Round}-Opt
V
in-stan ewithm = 3
. Wewill omparetheamountsof loadwhi h an bepro essed for a tivation sequen esσ
′
= (1, 2, 3)
and
σ
′′
= (2, 1, 3)
. In both ases weassume
that all pro essors nish omputations at time
T
, as this is true in the opti-mum s hedule. It isalso assumed that the timeT
is so large that all pro essorsP
1
, P
2
, P
3
should take part inthe omputationsinthe optimum s hedule. Letα
′
i
,α
′′
i
denotethesizesofthei
-thpie eofloadsentfora tivationsequen esσ
′
andσ
′′
, orrespondingly. The sizes of the rst two parts of load, sent to
pro essors
P
1
andP
2
for ommuni ation sequen eσ
′
, are equal toα
′
1
=
T − S
1
C
1
+ A
1
(2.24) andα
2
′
=
T − S
1
− C
1
α
1
− S
2
C
2
+ A
2
,
(2.25) whi hgivesα
′
2
=
A
1
(T − S
1
)
(C
1
+ A
1
)(C
2
+ A
2
)
−
S
2
C
2
+ A
2
.
(2.26)Similarly,for ommuni ationsequen e
σ
′′
, thesizesof thersttwopie esofload,
sent topro essors
P
2
andP
1
orrespondingly, are equal toα
′′
1
=
T − S
2
C
2
+ A
2
(2.27) andα
′′
2
=
A
2
(T − S
2
)
(C
1
+ A
1
)(C
2
+ A
2
)
−
S
1
C
1
+ A
1
.
(2.28)Let us observe that the time needed for sending the rst two pie es of load
may be dierent for a tivation sequen es
σ
′
and
σ
′′
. Therefore, the amount of
t
′
= S
1
+ S
2
+ C
1
T − S
1
C
1
+ A
1
+ C
2
(
A
1
(T − S
1
)
(C
1
+ A
1
)(C
2
+ A
2
)
−
S
2
C
2
+ A
2
)
(2.29) if a tivation sequen e isσ
′
, and intimet
′′
= S
1
+ S
2
+ C
2
T − S
2
C
2
+ A
2
+ C
1
(
A
2
(T − S
2
)
(C
1
+ A
1
)(C
2
+ A
2
)
−
S
1
C
1
+ A
1
)
(2.30) if a tivation sequen e isσ
′′
. From (2.29) and (2.30) we obtain
∆t = t
′
− t
′′
=
C
1
A
2
S
2
− C
2
A
1
S
1
(C
1
+ A
1
)(C
2
+ A
2
)
.
(2.31) Lett
′
3
andt
′′
3
be the amounts of time used for ommuni ation and omputations of pro essorP
3
for sequen esσ
′
andσ
′′
. Notethatt
′′
3
− t
′
3
= ∆t.
(2.32) Therefore,α
3
′′
− α
′
3
=
∆t
C
3
+ A
3
.
(2.33)From equations(2.24)-(2.28) and (2.33), we an ompute the dieren e between
the amountsof load pro essed in both s hedules:
∆V =
3
X
i=1
α
′′
i
−
3
X
i=1
α
i
′
=
T (C
1
− C
2
) + A
1
S
1
− A
2
S
2
(C
1
+ A
1
)(C
2
+ A
2
)
+
C
1
A
2
S
2
− C
2
A
1
S
1
(C
1
+ A
1
)(C
2
+ A
2
)(C
3
+ A
3
)
.
(2.34)It an beseenthatthe signof
∆V
dependsnot onlyontheparametersof pro es-sorsP
1
andP
2
, but alsoonA
3
andC
3
. Similarly, form > 3
the order inwhi h the rst two pro essorsshould bea tivated depends onthe parameters ofallthea tivatethe pro essors,be ausethe de isionhowtosequen e, e.g.,
P
1
,P
2
annot be onned to justP
1
,P
2
. The rst summand in formula (2.34) may suggest sorting the pro essors a ordingto nonde reasing values ofT C
i
+ A
i
S
i
. Su han algorithmwould handleproperlythespe ial ases mentionedatthe beginningofthis se tion.
However, onsider the following ounterexample. Let
T = 700
,m = 4
, and let the parameters of the pro essors beas given inTable 2.1.Table 2.1: Pro essorparameters for the ounterexample.
i
A
i
C
i
S
i
T C
i
+ A
i
S
i
forT = 700
1 0.051 0.129 137.084 97.291284
2 2.146 0.050 34.487 109.009102
3 0.654 0.458 31.565 341.243510
4 1.838 0.152 32.747 166.588986
The amounts of load whi h an be pro essed for all a tivation sequen es are
given inTable2.2. Ifthe pro essorsaresorteda ording tononde reasingvalues
of
T C
i
+ A
i
S
i
, we obtain ommuni ation sequen e (1,2,4,3) and the size of pro- essed loadisabout 3275.0461. Onthe otherhand, theoptimum ommuni ationsequen e is (2,1,4,3), whi h allows for pro essing the load of size approximately
3276.4212. Thus, the analyzed algorithmdoes not deliverthe optimum
ommu-ni ation sequen e.
Another approa h to sele ting the best ommuni ation sequen e is to start
from the initialsequen e
(1, 2, . . . , m)
, and improveit by hangingthe positions ofsomepro essors. Letusassumethatitisallowedtoperformtwooperationsonthe ommuni ationsequen e: swapapairofpro essorsormoveasinglepro essor
to another pla e in the sequen e. Only the moves in reasing pro essed load
V
for the given s hedule lengthT
an be made. However, for the instan e given above, theamountofload pro essedfor ommuni ation sequen eσ
1
= (1, 2, 3, 4)
is approximately 3276.0243 (see Table 2.2). The only ommuni ation sequen eTable2.2: Thesize
V
of loadpro essedfor dierent a tivation sequen es in the oun-terexample (rounded to4 digitsafter de imalpoint).Sequen e Pro essedload
V
Sequen e Pro essed loadV
(1,2,3,4) 3276.0243 (1,2,4,3) 3275.0461 (1,3,2,4) 3264.4671 (1,3,4,2) 3265.8734 (1,4,2,3) 3272.7902 (1,4,3,2) 3275.0848 (2,1,3,4) 3274.1818 (2,1,4,3) 3276.4212 (2,3,1,4) 2135.6348 (2,3,4,1) 1963.7528 (2,4,1,3) 3102.9726 (2,4,3,1) 2097.1445 (3,1,2,4) 2040.9951 (3,1,4,2) 2044.6016 (3,2,1,4) 1963.8495 (3,2,4,1) 1792.8317 (3,4,1,2) 1879.3648 (3,4,2,1) 1776.3430 (4,1,2,3) 3104.2910 (4,1,3,2) 3103.4595 (4,2,1,3) 3078.8408 (4,2,3,1) 2076.1120 (4,3,1,2) 2021.0451 (4,3,2,1) 1920.2297for whi h it is possible to pro ess larger load, is the optimum sequen e
σ
2
=
(2, 1, 4, 3)
. Yet, it is impossible to obtain this solution by the moves des ribed above, asanyallowed hangetoσ
1
results inde reasingthe amountof pro essed load, and hen e annot be a epted.The above ounterexample provesnot only that the des ribed type of greedy
algorithmsisnot apableof solvingour problem,but alsothatitisimpossibleto
nd the optimuma tivationsequen e by simplysorting thepro essors a ording
to some ombination of instan e parameters. Indeed, note that the
ommuni a-tionsequen e (1,2,3,4)isbetterthan(1,2,4,3)and thesequen e (2,1,4,3)isbetter
than (2,1,3,4). This shows that depending on the amount of time left for
pro- essing on
P
3
andP
4
, it is better to a tivate one or the other pro essor earlier. Thus, theorderinwhi hpro essorsP
3
andP
4
shouldbea tivateddependsonthe parameters of pro essors a tivated beforethem. Consequently, it is not possibleto determine the ommuni ation sequen e lo ally, without taking into a ount
the sequen e of other pro essors.
Moreover, for the above instan e, the load pro essed by
P
1
if it is a tivated rstismu hgreaterthantheloadpro essedbyP
2
inthe asewhenthea tivationsequen e startswith 2. Still,intheoptimumsolutionpro essor
P
2
shouldre eive load beforeP
1
. Thus, a greedy algorithm,always appendingto the ommuni a-tion sequen e the pro essor whi h an pro ess the greatest amount of load, alsodoes not deliver optimum solution.
Finally,it an be onje turedthat DLS{1Round} isnot asele tion problem.
2.5 Approximation Algorithms for Problem
DLS{1Round}
Withoutknowinghowtoorderthepro essorstakingpartinthe omputationsfor
problemDLS{1Round}, wearenot ableto reatesimilarapproximations hemes
as for problem DLS{
C
i
= 0
}. Therefore, we present several algorithms with approximationratio bounded but dependent onthe instan e parameters.2.5.1 Problem DLS{1Round}-Opt
V
The simplest method of reating a solution of problem DLS{1Round}-Opt
V
is to send the whole load to asingle pro essor only. The size of the load pro essedby asingle pro essor
P
i
intimeT
isequal to(T − S
i
)/(A
i
+ C
i
)
. Thus, we sele t the pro essorforwhi hthis value isthegreatest, asitisshown inAlgorithm2.5.Algorithm 2.5SINGLE-PROCESSOR-OPT-V(
T, m, A, C, S
)j = 1
for
i = 2
tom
doif
(T − S
i
)/(A
i
+ C
i
) > (T − S
j
)/(A
j
+ C
j
)
thenj = i
end if
end for
Note that in the optimum s hedule at least one pro essor
P
i
must pro ess load ofsize atleastV
OP T
(T )/m
(ingiventimeT
). Hen e, Algorithm2.5 delivers a solution pro essing load of size at leastV
OP T
(T )/m
and is an approximation algorithmwith relativeperforman e guaranteem
. Notethat this bound istight. Consider aninstan e withA
i
= 1
,C
i
= S
i
= 0
fori = 1, . . . , m
. In the optimum solution, all pro essors are a tivated and they pro ess load of sizemT
. In the solution delivered by Algorithm 2.5 only one pro essor is a tivated and the sizeof the load is
T
. The runningtime of Algorithm2.5 isO(m)
.The above approa h an be extended by analyzing all ommuni ation
se-quen es of length
k
for some onstantk ≤ m
. Similarly as before, we observe that if the optimum solutionof the problem a tivates at leastk
pro essors, then itmust ontainagroupofk
pro essorswhi htogetherpro essloadofsizeatleastkV
OP T
(T )/m
. Hen e, an algorithmenumerating all possible ommuni ation se-quen es oflengthk
delivers asolutionwith relativeperforman e guaranteem/k
, provided that the optimum solutionof the instan e of the problemuses at leastk
pro essors. Unfortunately, the omplexity of su h an algorithmisO(m
k
)
and
it grows exponentiallywith the relativeperforman e guarantee.
Algorithm2.5 an be also extended to a greedy Algorithm 2.6, sele tingthe
pro essors in the ommuni ation sequen e one by one. As long as it is possible
to append a pro essor to the ommuni ation sequen e, the pro essor whi h an
pro ess the greatest load is hosen.
The running time of Algorithm 2.6 is
O(m
2
)
. The results delivered by this
algorithmarenot worse thenforAlgorithm2.5. Still,theperforman eguarantee
m
is tight. Indeed, onsider the following problem instan e. LetA
1
= 1 − ε
,C
1
= T − 1
,S
1
= 0
,andA
i
= T
,C
i
= 0
,S
i
= 0
fori = 2, . . . , m
,where0 < ε < 1
is a small onstant. Pro essorP
1
an pro ess load of sizeT
A
1
+C
1
=
T
T −ε
> 1
in timeT
. Fori ≥ 2
, pro essorP
i
is apable of pro essing load of sizeT
T
= 1
in timeT
. Hen e, Algorithm 2.6 will hoose pro essorP
1
to obtain the rst loadAlgorithm 2.6GREEDY-OPT-V(
T, m, A, C, S
)σ = ()
V = 0
j = 1
whilej 6= 0
doj = 0
fori = 1
tom
doif
S
i
< T
andi
is not ontained inσ
thenif
j = 0
or(T − S
i
)/(A
i
+ C
i
) > (T − S
j
)/(A
j
+ C
j
)
thenj = i
end if end if end for ifj 6= 0
thenσ = σ|j
{ on atenationofσ
andj
}V = V + (T − S
j
)/(A
j
+ C
j
)
T = T − S
j
− C
j
(T − S
j
)/(A
j
+ C
j
)
end if end while returnσ
,V
hunk. Sending data to pro essor
P
1
will take timeT
1
= C
1
T
T −ε
= (T − 1)
T
T −ε
. The remaining pro essorsP
i
will be a tivated afterwards and ea h of them will obtain the load of size(T − T
1
)/A
i
=
(T −(T −1)
T
T −ε
)
T
= 1 −
T −1
T −ε
=
1−ε
T −ε
. Thus, the total size of the pro essedload willbeV
1
=
T +(m−1)(1−ε)
T −ε
.On the other hand, if pro essor
P
1
is a tivated as the last one, then ea h of pro essorsP
2
, . . . , P
m
re eives load of size1
. The time left for ommuni ation and omputation onP
1
is stillT
, andP
1
pro esses load of sizeT
T −ε
. The whole pro essed load has sizeV
2
= m − 1 +
T
T −ε
. Thus, we haveV
2
V
1
=
mT −ε(m−1)
(m−1)(1−ε)+T
andlim
T →∞
V
V
2
1
= m
.The quality of the results obtained by Algorithm2.6 in omparison to
a)
1
6
11
16
21
26
1E0
1E1
1E2
1E3
1E4
AVG
MAX
b)0.00
0.05
0.10
0.15
1E0
1E1
1E2
1E3
1E4
Alg. 2.5, AVG
Alg. 2.5, WRST
Alg. 2.6, AVG
Alg. 2.6, WRST
Figure2.2: Experimental resultsfor therst setofinstan es (slow ommuni ation). a)
Number of pro essors used by Algorithm 2.6. b) Quality of the solutions obtained by
Algorithms 2.5and 2.6.
between the two algorithms we tested them on sets of random instan es. Ea h
instan e inthe rst set had
m = 100
pro essors, and their parametersA
i
,C
i
,S
i
were hosen randomly from the interval[0, 1]
. For ea h generated set of pro es-sors,5instan eswere reated,withT = 1, 10, 100, 1000, 10000
. Thequalityofthe obtained solutionswas measuredasthequotientV
a
U pBo
,whereV
a
isthe amountof load returned by the tested algorithm,andUpBo
isthe upperboundon thesize of pro essed load, al ulated asP
m
i=1
T −S
i
A
i
+C
i
. The results of the experiments are
presented inFig.2.2. Thenumberofpro essorsusedbythegreedyAlgorithm2.6
depends on
T
(see Fig. 2.2a). Althoughfor ea h value ofT
there were instan es for whi h only one pro essor was used, the average and the maximum numberof used pro essors (denoted by
AV G
andMAX
in Fig. 2.2a, orrespondingly) in reases withT
. Despitethis,the performan eofAlgorithm2.6 doesnot hange mu h withgrowingT
( f. Fig.2.2b),bothonaverage(denotedAV G
)and inthe worst ase (denotedW RST
). This an be explainedby the fa tthat the pro es-sors a tivatedasthe lastonesre eiveonlyverysmallamountsofload. Moreover,when startup times
S
i
are small in omparison toT
, then the amounts of load pro essedbyasinglepro essororaxedgroupofpro essorsin rease roughlyse ond setof instan es (fast ommuni ation),
T = 10000
.Algorithm 2.5 Algorithm2.6
AVG WRST AVG WRST
0.271073 0.061405 0.855082 0.598274
linearly with
T
. Hen e, the quality of the resultsobtained by both algorithmsis almost onstant inrelationtothe upperboundwhenT
grows beyond 100. Note that it is mu h better than the worst- ase estimate1
m
= 0.01
. It an be also seen inFig.2.2b thatonaverage Algorithm2.6 deliverssolutionsabout1.5 timesbetterthan Algorithm 2.5.
The above results an be explained by the fa t that the ommuni ation
pa-rameters
C
i
,S
i
were hosen from the same range asA
i
. The time ne essary to send a hunkof data was quitebigand onlya small number of pro essors ouldbea tivated. Therefore,we reatedanothersetofinstan es,whereparameters
C
i
andS
i
were hosenrandomly fromthe interval[0, 0.001]
. The remaining param-eters were sele ted as in the previous set. Sin e the startup timesS
i
were very small in omparison to all used value ofT
,the quality of the obtained solutions was almostnot hangingwithT
. Therefore,we presentonly theaverage andthe worst performan e of both algorithmsforT = 10000
in Table 2.3. The number of pro essors used by Algorithm 2.6 wasm = 100
for all instan es in this set. Therefore, the dieren e between the results obtainedby Algorithms2.5 and2.6is greater than for the previous set of instan es, for whi h atmost 26pro essors
were used by the greedyalgorithm. The quality ofthe resultsof both algorithms
isbetterthan fortheprevious instan eset. On average, Algorithm2.5 allows for
pro essing load of size greater than 27% of the upper bound and Algorithm 2.6
greater than 85%.
We on ludethatthedieren einthe qualityoftheresultsobtainedby
good. However, this anbetheee t of theused measure ofquality. When
om-muni ation is slow, the upper bound we al ulated may be mu h greater than
the optimum solution. If ommuni ation is fast in omparison to omputations,
then the resultsobtained by both algorithmsget better. The dieren e between
the results of Algorithms2.5 and 2.6 is in reasing and the greedy Algorithm2.6
delivers solutionsof very good quality.
2.5.2 Problem DLS{1Round}-Opt
T
In orderto reateanapproximationalgorithmforproblemDLS{1Round}-Opt
T
, we an, similarlyasin Algorithm2.5, onsider only ommuni ation sequen es oflength 1. This approa h is used inAlgorithm 2.7.
Algorithm 2.7SINGLE-PROCESSOR-OPT-T(
V, m, A, C, S
)j = 1
fori = 2
tom
do ifS
i
+ (A
i
+ C
i
)V < S
j
+ (A
j
+ C
j
)V
thenj = i
end if end for returnσ = (j)
,T = S
j
+ (A
j
+ C
j
)V
Note that if pro essor
P
i
needs timeT
to pro ess the load of sizeV
, then it annotpro ess theloadof sizeV /m
faster thanintimeT /m
. Asintheoptimum solutionatleastonepro essorhas tore eiveloadof sizeatleastV /m
,Algorithm 2.7 returns timeT ≤ mT
OP T
(V )
. Observe that this bound is tight. Consider an instan e withA
i
= 1
,C
i
= S
i
= 0
fori = 1, . . . , m
. In the optimum solution, all pro essors are a tivated and they pro ess loadV
in timeV
m
. In the solution delivered by Algorithm 2.7 only one pro essor is a tivated and it needs timeV
to pro ess the whole load. The runningtime of Algorithm2.7 isO(m)
.In this hapter we analyzed single-round divisible load s heduling in star
net-works. We proposed fully polynomial time approximation s hemes for problems
DLS{
C
i
= 0
}-OptV
and DLS{C
i
= 0
}-OptT
. As a by-produ t, a fully poly-nomial time dual approximation algorithm was designed for the rst problem.We also analyzed the s hedulingproblems in the system with nite bandwidths
(i.e. when
C
i
> 0
). The order in whi h the pro essors should be a tivated was studied as the main obsta lein reating approximation algorithmsfor this ase.Unfortunately, we showed that some lasses of pro essor sequen ing algorithms
annot be used to solve this problem. We onje ture that onstru tingthe
opti-mumsequen e anbe omputationallyhard,andDLS{1Round}isnotasele tion
problem. Finally,weproposedsimpleapproximationalgorithmsgivingtight
rela-tiveperforman e guarantee
m
forproblemDLS{1Round}-OptV
andforproblem DLS{1Round}-OptT
.with Limited Memory
Thesingle-roundorganizationof omputationshasseveraldisadvantages. Firstly,
the ommuni ationdelaysmaybeverylong,whileno omputations anbestarted
untiltherstpro essorre eivesthewholeamountofloadassignedtoit. Se ondly,
inpra ti ethewholeload
V
isoftentoobigtobestoredinthememoriesofworker pro essors at the same time. In su h a ase it is impossible to reate asingle-round s hedule. It would be more protable to send the load in many small
pie es( hunks),sothat omputationsstart earlierand tin omputer memories.
Consequently, omputations ould interleave with ommuni ations.
Inthis hapterwestudymulti-rounddivisibleloads hedulinginsystemswith
limited memory. We analyze the star network topologydes ribed in Chapter 2.
Totakeintoa ountmemorylimitations,weintrodu eone moreparameter
har-a terizing ea h pro essor
P
i
. Namely,B
i
is the size of memory buer available onP
i
(e.g. inbytes). Our goalistond as hedulepro essingtheloadof agiven size inthe shortest possible time. As ea h pro essor an re eive many messages,there are more s heduling de isions to be made than in the ase of single-round
pro essing:
•
The setP
′
⊆ P
of pro essors parti ipating in the omputations must be
hosen.
mu h larger then the number of pro essors
m
.•
The ommuni ationsequen e must be hosen. Formulti-roundpro essing, the ommuni ation sequen e is an arbitrary sequen e whose elements areindi es of pro essorsfrom the set
P
′
.
•
The sizes of the load parts sent in ea hmessage must besele ted.We start our onsiderations with a short summary of the previous work on
multi-round divisible load s heduling. In Se tion 3.2 we des ribe the
mathe-mati al model used in this hapter. As our s heduling problem is known to be
omputationallyhard, wepropose anexponentialBran h&Bound algorithmand
a geneti algorithm in Se tion 3.3. We use the geneti algorithm not only as
a metaheuristi solving the s heduling problem, but also to gather information
about the features of good quality solutions. The results obtained from an
ex-tensive experimental study, as well as some analyti al results, are presented in
Se tion3.4. Basedonthisinformation,inSe tion3.5weproposeseveral lassesof
s heduling heuristi s. Weanalyze and ompare them, exposing their advantages
and weaknesses.
3.1 Earlier Results
S heduling divisible loads in systems with limited memory was rst analyzed in
[37℄. The authors onsidered single-round s hedules only, hen e they assumed
that thewholeloadts inthe memorybuersofthe workers. Otherassumptions
were that all pro essors take part in the omputations and that the a tivation
sequen e isgiven. The ommuni ationdelaymodelwaslinear(
S
i
= 0
for1 ≤ i ≤
m
). A fast heuristi alled In remental Balan ing Strategy was proposed. This algorithmdid not always deliveroptimum solutions,what was shown in [30℄.A more general ane ommuni ation delay model was studied in [30℄. A
givena tivationsequen e. Choosingtheoptimumset
P
′
ofpro essorstakingpart
in the omputations in systems with limited memory and ane ommuni ation
modelhasbeenshowntobeNP-hardin[31℄ andstronglyNP-hardin[4℄. In[31℄
the authors proposed and evaluated experimentally aBran h&Bound algorithm
and several heuristi s for single-round s hedulingwith limited memory.
Multi-round divisible load s heduling with limited memory was rst studied
in [29℄. Only the size of the hunk urrently pro essed by a given pro essor was
subje t tothe memory limit. The sizes of load parts arriving in the ba kground
of omputations were not taken into a ount. A more detailed memory model,
in whi h memory limitsae ted all hunks ofdata existing at agiven pro essor,
wasusedin[26℄. ABran h&Boundalgorithmandageneti algorithmsolvingthe
analyzeds hedulingproblemwereproposed. However,themathemati almodelof
memory managementwassimpliedtomake theproblemmore tra table. It was
assumed thatmemoryo upationisde reasinglinearlyduringthe omputations.
This simpli ation has been removed in [27℄. Wedis uss it inmore detail inthe
next se tion.
3.2 Problem Formulation
Before we present the mathemati al model used in this hapter, let us briey
analyze dierent models of memory management. The simplest approa h is to
assumethatonlyone load hunkmaybepresentinthe memoryofa omputerat
atime [31,37℄. The sizeof apie eof datasenttopro essor
P
i
annotex eedthe limitB
i
( f. Fig. 3.1a). Thus, a pro essor annot perform omputations while re eiving a new pie e of load. This results in long idle times and de reases thee ien y of pro essing.
In [26℄ it was assumed that ea h pro essor an store multiple load hunks