SelectedProblemsofOnlineSchedulingonParallelMachines Xin Chen PhDTHESIS FacultyofComputing

(1)

POZNAN UNIVERSITY OF TECHNOLOGY

Faculty of Computing

Ph D T H E S I S

Xin Chen

Selected Problems of Online Scheduling on Parallel Machines

Supervisor: Malgorzata Sterna

Poznan 2014

(2)

(3)

Acknowledgments

It is a cherish experience in my life to study in Poznan University of Technology, both intellectually and socially. Here, I thank all of the faculty, staffs and students who have helped me and my family during this time.

I am deeply indebted to my supervisor dr. hab. inz. Malgorzata Sterna, whose guidance and support played a fundamental role in the completion of this thesis.

She gave me an interesting topic to study, and made my work fruitful in results.

Moreover, her attitude about research leaded me to be more serious in scientific field. Beyond academic activities, she also cared me everything in daily life, which helped me and my family to have a wonderful time in Poland.

Special thanks are due to prof. dr. hab. inz. Jacek Blazewicz, without whose help I could not study in Poland. He made the chance for me to join the Ph.D.

program, and supported my work. He shared his deep insights about the issue studied by me, and the enthusiastic discussions with him and his team was the most important learning experience during my visiting. I also express my sincere appreciation to him for inviting my family to Poland, making my family staying together.

I also thank dr. Xin Han, who is my first guide in scheduling theory, and gave me the helpful suggestions for my research work. Especially when studying oversea, he gave me lots of encouragements, which helped me to be confident to go further.

Especially, I thank my wife, for her endless support and love. To make family together, she gave up her job and came to Poland with me. She spent all her time looking after me and two kids, even when she was sick. Here, I express my love to her and our two children. I also express my love to my mother and mother-in-law, for their kind help for my family.

This work has been partially supported by "China Scholarship Council (CSC)".

Devote this thesis to my father.

(4)

(5)

Introduction

Contents

1.1 Scheduling Problem . . . . 1

1.2 Problem Categories . . . . 2

1.2.1 Offline and Online Scheduling . . . . 2

1.2.2 Semi-online Scheduling. . . . 3

1.3 Aims and Scope of the Thesis . . . . 4

1.1 Scheduling Problem

Scheduling problem, which is one of the classical problems in combinatorial optimization and operational research, deals with the assignment of limited resources (e.g. machines) to a set of jobs (tasks), with the goal of optimizing one or more objectives [10, 67]. It was inspired by the field of mechanical manufacturing, and then was widely studied in operational research, system science, control science, management science, computer science and so on.

Scheduling could be considered as a decision-making process, which plays an important role in the real world. The "resources" could be machines in a workshop, processing units in a computing environment, quays at a port, or rooms in a hotel, etc. While the "jobs" could be workpieces in an assembly line, executions of computer programs, ships need leave or enter port, or guests in the hotel, etc. Then the

"decision-making" is to find an optimal strategy for assigning proper resources to jobs, taking into account possible time constraints.

Following examples are given to illustrate scheduling problems more clearly.

Example 1.1 Printing and dyeing operations

Printing and dyeing is a traditional technique in the world. Basically speaking, the production process in printing and dyeing factory consists of four phases: (i) pre-processing, which makes the cloths suitable for processing; (ii) printing, which prints the patterns onto the cloths; (iii) dyeing, which dyes the patterns with right color; (iv) re-processing, which makes these cloths suited for using.

In each phase, there are several specialized machines for processing cloths. So the decision-maker should decide on the assignment of devices, the order of cloths, as well as on the start time for each workpiece. The goal could be minimizing the

(8)

total completion time, or minimizing the idle times of equipment, or maximizing the throughput, etc.

Example 1.2 Scheduling tasks in a distributed computing environment

Along with the development of computer science, various kinds of distributed computing environments, such as Cluster, GPU and Cloud Computing, are built for faster or more accurate computation. The whole environment contains one (or more) central processing unit(s) and several sub-processing units, with identical or different computational abilities. When a task occurs, the scheduler (central processing unit) should assign it to one of sub-processing units, or ask it to wait, according to some strategies.

Then, the goal of the scheduling could be minimizing the mean waiting time of the tasks, or keeping load balancing for these sub-processing units, etc.

These examples show that scheduling problems almost originate in the problems which can be met in the real world. Investigating them, researchers could touch two major aspects: one is theory, while the other is practice. Theoreticians propose their models observing the real world and then use mathematics methods to analyse them and to propose algorithms. On the other hand, practitioners utilize the approaches given by theoreticians, and then adjust them to the specificity of particular real cases, using experimental methods to solve the practical problem. This thesis mainly concerns theoretical aspects of scheduling theory.

1.2 Problem Categories

In general, scheduling problems can be divided into two basic categories: offline and online ones, depending on whether the decision-maker knows the description of jobs in advance or not.

1.2.1 Offline and Online Scheduling

To illustrate the difference between so-called offline scheduling and online scheduling, two examples provided in Section 1.1 are recalled.

Taking into account Example 1.1, it seems to be "easy" for a decision-maker (although this problem is already intractable) to build a schedule, because all the information on jobs is available for him. It is known in advance - according to the daily/weekly/monthly/yearly production plans - e.g., how many jobs need to be executed, how much time is needed to finish one job, when could a job start its processing and when it should be completed, what is a job’s importance, and so on.

From this point of view, Example 1.2 looks like a little "harder", since the central processing unit - which can be considered as the decision-maker - could not know the complete information of tasks in advance, because these tasks are submitted by customers dynamically.

(9)

1.2. Problem Categories 3

More formally speaking, in offline scheduling (e.g. [10,39,65]), all the information about the input is given in advance. The scheduler can determine at time zero the entire schedule having all the pieces of information at his disposal. By contrast, in online scheduling (e.g. [31,38,80]), the scheduler does not know in advance how many jobs have to be processed and what their processing times are.

He becomes aware of the existence of a job, as well as of its parameters values, only when this job is released and presented to him.

Within online models, two basic versions of online scheduling problems were proposed by scholars, which are online over list and online over time. In the first version, jobs are released at time zero, but they come to the system one by one over list. The next job appears (and the information on it is provided) only after the previous one was processed. In the second version, each job has a release time, and the information describing it will be known by the scheduler only at this time.

Moreover, basically speaking, there are two strict properties of pure online scheduling problems [80]. The first one is "unknown property", which means that the scheduler knows nothing about jobs unless their coming, so this property results from "the lack of information". The second one is "determined property", which results from the restriction of scheduling process. It means that the assignment of a job to a machine should be determined immediately upon its arrival. The scheduler cannot change this decision after this initial assignment, e.g., reassigning this job to another machine is not allowed.

1.2.2 Semi-online Scheduling

As it was underlined, in offline scheduling problems, the scheduler knows the complete information on jobs, while in online scheduling, he does not know the information at all and his decision on jobs execution cannot be changed any more. However, in the real world, the situation usually is not so strict. Most of time, one can meet neither offline environment nor pure online environment. In other words, these two requirements mentioned above can be relaxed. For example, the scheduler may not know the whole information on majority of jobs, but he does know some of them, or the scheduler cannot reassign all the jobs, but he can reassign some of them. Such kind of scheduling problems, which violate the two above mentioned constraints of pure online scheduling in some extend, are called as semi-online scheduling problems.

Consequently, semi-online scheduling problems can be looked upon as the re- laxations of pure online problems, or as an intermediate state between offline and online models. Several kinds of semi-online scheduling problems were inspired by practice, which can be divided into two basic categories, according to the way in which the two characteristics of pure online scheduling are violated.

In the first category, the scheduler may know some partial information on the future jobs in advance, i.e., the first "unknown property" is relaxed. In the literature, various models have been investigates, assuming that: (i) the total processing time (i.e. the size) of all jobs is known in advance [50] (This model is justified by the

(10)

situation which arises in a shop floor where a scheduler typically accepts orders (jobs) of a targeted total size corresponding to a given time period, such as a day or a week.); (ii) the maximal processing time of jobs is known in advance [43] (This model is inspired by the situation when the largest size of workpieces is limited by the capacity of an assembly line.); (iii) jobs come to the system in non-increasing order of their processing times [71] (Such situation appears, for example, when software evolution or iterative development is considered, the next version of the system usually needs less time to be released than the previous one.); (iv) the optimal offline criterion value is known in advance [4] (This model results from the realistic scenario of remote file transfer where a set of files is stored in a system of m unit capacity servers, and then these files need to be sent one by one to a remote system of other m servers. The only information the remote system has about the files is that they were originally stored in m servers of unit capacity (but without the original assigment of the files). The goal is to store these files, coming online, in the remote system within the minimum capacity.)

The second category of semi-online problems violates the second property - "determined property". These models can be considered to have more "freedom" than pure online ones while scheduling jobs. For example, (i) during the scheduling, a reordering buffer with fixed capacity can be used to store jobs temporarily, so some of jobs need not to be assigned to machines immediately at their arrivals [50,85]; (ii) during or after scheduling, reassigning a limited number or a bounded volume of jobs is allowed, so the initial decision can be changed partially [23, 70, 77]; (iii) during the scheduling, two or more procedures can be used in parallel, and several copies of the incoming jobs can be assigned to machines by each of these procedures, i.e., several solutions are built, and the final schedule, obtained when the input sequence ends, is the best one among them [50]. In the presented models, the scheduler has the chance to make more decisions on particular jobs, compared with the only one decision which should be made in pure online scheduling problem.

1.3 Aims and Scope of the Thesis

This thesis focuses on three selected online and semi-online scheduling problems on parallel machines.

The majority of work is devoted to the goal of minimizing makespan (i.e. the schedule length), which is one of the most popular objective functions in scheduling theory. Many results for this performance measure - in offline, online and semi-online environment - appeared in the literature from 1960, cf. e.g. [1,16,32,33,34,38,39].

Following the literature, this dissertation considers two kinds of semi-online problems with the makespan criterion.

(1) The first one is semi-online scheduling with reassignment. This model concerns a parallel machines environment, and jobs come online one by one over list.

Once a job arrives to the system, it should be assigned immediately to one of the machines. However, after all jobs have been scheduled on the machines, the

(11)

1.3. Aims and Scope of the Thesis 5

scheduler will be informed that there is no further job, and at most K already scheduled jobs can be moved from one machine to the other.

(2) The next topic of this thesis is devoted to online scheduling with a buffer. The parallel machines environment is considered again, where jobs arrive one by one over list. However, during the scheduling, a reordering buffer can be used for storing jobs temporarily. When a new job arrives, the scheduler has two choices:

assigning this job to some machine or storing it in the buffer. But once the job is assigned to a machine, it cannot be stored in the buffer again.

Since minimizing makespan is one of the most popular objectives in scheduling area, the above two topics in this thesis can be considered as some supplementation of this classical domain. However, besides this commonly used objective, many other criteria also aroused wide interest, such as lateness [60], tardiness [26], the number of tardy jobs [62], etc. To the set of the criteria involving time restrictions, late work performance measure belongs [9], which aims to minimize the amount of work executed after a due date.

(3) Although the late work criterion was studied for nearly 30 years, no one considered its online case. Therefore, the thesis focuses on this topic - online scheduling jobs on parallel machines with late work criterion - for the first time.

For each mentioned problem, the lower bound (which is the limit of an online problem showing that no method can get better performance than this bound, cf.

Section 2.3) will be proven, and algorithms with better performance (which is esti- mated by competitive ratio, cf. Section 2.3) than the existing ones will be proposed and analysed or new approaches will be given. Besides, since no one consider the online case of late work scheduling, some groundbreaking work will be done on determining complexity status of the studied problem and developing a proper estimation method for this criterion function.

In more detail, the following research hypotheses can be formulated:

(1) For the first selected topic, i.e. semi-online scheduling with reassignment, two sub-topics can be considered.

(1.1) For the problem on two uniform machines, i.e. on two parallel machines which differ in their processing speed, one machine has speed factor 1 and the second one has speed factor s ≥ 1, several lower bounds depending on the machine speeds can be determined, and some algorithms with better performance can be proposed in comparison to the methods known from the literature, since some related work has already been done before.

(1.2) In two identical machines environment, in which two parallel machines have the same speed, the problem is considered with additional hierarchical constraint. This means that each job and machine has a hierarchy, and a job can be assigned to the machine only if the job’s hierarchy is at least

(12)

as high as the machine’s hierarchy. For this case the tight bound can be given (which means that the upper boudn and lower bound are identical, cf. Section 2.3).

(2) For the second chosen topic, i.e. semi-online scheduling with a buffer, similarly as for (1), two sub-topics - concerning the models without any constraint and under hierarchical constraint - can be investigated, respectively.

(2.1) For the problem in parallel machines environment without any constraint, the previous results available in the literature could be still improved, since the efficiency of the existing algorithms can be increased.

(2.2) For the problem on two identical machines under hierarchical constraint, tight bound can be proven.

(3) For the problem of online scheduling on parallel machines with late work criterion, there should exist an online algorithm to solve this problem, and its performance can be determined based on the new quality estimation method.

The rest of this thesis is organized as follows.

Chapter 2 provides some basic and useful notions and definitions appearing in this thesis. First of all, some notions on computational complexity are given, since scheduling problems are one of the most typical combinatorial optimization problems. The studies on any combinatorial optimization problem usually start from checking its complexity status. Then, the most widely used notation for describing scheduling problems, i.e. three-field notation, is introduced in the next part of this chapter. Finally, the concept of online algorithm is explained and its classical estimation method, i.e. competitive ratio is introduced.

Chapter 3 surveys the work related to this thesis, concerning pure online scheduling for minimizing makespan, semi-online scheduling with reassignment, with a buffer, scheduling under hierarchy constraint, and with late work criteria, respectively.

In Chapter 4, two sub-problems of online scheduling with reassignment with the goal of minimizing makespan are considered. The first one concerns two uniform machines, without any constraint. Several lower bounds based on the value of machine speeds are determined, then two optimal algorithms ("optimal" means that the efficiency of the algorithm matches the lower bound of the problem, cf. Section 2.3) and an algorithm improving results known from the literature are proposed. The second sub-problem investigated in this chapter is the scheduling problem under hierarchical constraint, for which the lower bound and an optimal algorithm are given.

In Chapter 5, the problem of online scheduling with a buffer on parallel machines is studied. For the sub-topic without any constraint, three algorithms are proposed, which improve the previous results given in the literature. Then, in relation to Chapter 4, the problem under hierarchical constraint is investigated again. Finally, the relationship between scheduling with reassignment and with a buffer is discussed.

(13)

1.3. Aims and Scope of the Thesis 7

Chapter 6 considers the problem of online scheduling with late work criterion. It firstly studies the computational complexity of the original offline model, and then builds a suitable estimation method for its online version. Next, the pure online case, as well as two semi-online cases with the known total processing time of all jobs or known optimal offline criterion value, are studied respectively. At the end of this chapter, some relationships between the makespan criterion and late work criterion are listed.

Finally, the work is concluded in Chapter 7.

(14)

(15)

Chapter 2

Notions and Definitions

Contents

2.1 Computational Complexity . . . . 9

2.2 Description of Scheduling Problem . . . . 11

2.2.1 Machine Environments (α) . . . . 12

2.2.2 Jobs Characteristics (β) . . . . 13

2.2.3 Objective Functions (γ) . . . . 14

2.3 Online Algorithm and Competitive Analysis . . . . 16

This chapter collects some useful notions and definitions. First of all, some basic notions of computational complexity are given in Section 2.1. The analysis of computational complexity of different versions of scheduling problem always attract researchers’ interest, since it directs the research on any combinatorial optimization problem. Then, Section 2.2 presents the most popular way of describing scheduling problems - the three-field notation. Finally, basic concepts of online algorithm and its analysis are explained in Section 2.3.

2.1 Computational Complexity

Determining the computational complexity [36,63] can be considered as the first step for investigating any combinatorial problem. Basically, there are mainly two kinds of combinatorial problems: optimization problems and decision problems. In optimization problems, each feasible solution has an associated value, and the goal is to find a feasible solution with the best value. For example, one can schedule jobs on machines to get the completion time minimized. While in a decision problem, one looks only for the simple answer: "yes" or "no" to the question on existence of a solution. For example, one can ask whether there exists a feasible schedule for a scheduling problem with the completion time smaller than 100 time units.

When studying a combinatorial problem, researchers are always interested in the question how "hard" this problem is, i.e. in the question on the computational complexity of this problem. First of all, a decision counterpart cannot be more difficult to be solved than an original optimization problem, where the "decision counterpart" means formulating a question for an optimization problem whether a feasible solution exists with the objective value smaller or larger than a given threshold (for a minimization problem or a maximization problem, respectively).

(16)

To order studies on open problems, the complexity classification of optimization problems and decision problems are proposed, which will be roughly presented.

Decision problems can be divided into the following classes, cf. Figure2.1.

Figure 2.1: Classification of decision problems

(1) Class P (Deterministic Polynomial). This class contains all the decision problems which can be solved in polynomial time, i.e., in time O(n^k) for the input size n and some constant k, by Deterministic Turing Machine (DTM ). Since DT M is the realistic model of computations, equivalent to the existing comput- ers, the problems in class P can be declared as "easy" problems.

(2) Class NP (Non-deterministic Polynomial). It includes all the decision problems which can be solved in polynomial time by a Non-deterministic Turing Ma- chine (NDTM ). Based on the definition of NDTM, this class can be described as the class consisting of the problems which can be "verified" in polynomial time.

Taking into account the fact that any deterministic algorithm for DTM can be executed also by NDTM, class P is a subclass of N P . However, up to now, no one can say confirmedly that P = N P or P 6= N P . There is a set of intractable problems in N P , for which no one has discovered a polynomial time solution, moreover, no one has proven that they cannot be solved in polynomial time. Such problems are called NP-complete problems.

(3) Class NPC (NP-complete). This class contains the problems which are

"harder" than the ones in P , where "harder" means that no one have proposed polynomial time algorithms for these problems so far. More formally speaking, a problem is NP-complete if it belongs to NP and any other NP problem transforms polynomially to it. Consequently, it can be stated directly that if any of NP-complete problems could be solved in polynomial time, then every problem in N P has a polynomial time algorithm. Furthermore, if a problem in N P C can be solved by a pseudo-polynomial time algorithm (which means the time complexity concerns not only input size, but also the maximal number in the input, e.g. O(d · n^k), where d is the maximal value in the input.), it is called as abinary NP-complete problem (NP-complete in weak/ordinary

(17)

2.2. Description of Scheduling Problem 11

sense). Else, the problem should be proven to be unary NP-complete (NP- complete in strong sense), e.g. by pseudo-polynomial transformation from a known strongly NP-complete problem.

(4) Class NPI . Due to the hypothesis that P 6= N P , besides P and N P C, there might exist some problems with intermediate complexity, which constitute the class N P I (NP-intermediate).

As mentioned, the concept of NP-completeness cannot be applied to optimization problems directly, however, the complexity of an optimization problem can be determined according to the classification of its decision counterpart: an optimization problem isNP-hard , if its decision counterpart belongs to N P C. Furthermore, if its decision counterpart is a binary NP-complete problem, this optimization problem is said to be weakly/binary NP-hard . On the other hand, if its decision counterpart is a unary NP-complete problem, this optimization problem is said to bestrongly/unary NP-hard . By contrast, if there exists a polynomial time algorithm for this optimization problem, it is said that this problem is "easy", and its decision counterpart belongs to class P .

Therefore, to investigate the complexity of a problem, researchers can touch two major aspects: (1) finding a polynomial time algorithm for the problem to declare that it is easy; or (2) proving that it is hard, i.e., it belongs to N P C (for a decision problem) or it is NP-hard (for an optimization problem).

For the second aspect, one of the common methods of proving NP-completeness is transformation . Consider a decision problem A, which complexity is to be determined, and another problem B, which is already known to be NP-complete. If an instance (i.e., a particular input) of B (denoted as b) can be transformed into an instance of A (denoted as a), with the following characteristics:

(i) The transformation takes polynomial time;

(ii) The answers are the same, i.e., the answer for a is "yes" if and only if the answer for b is also "yes",

then, A and B belong to the same complexity class, i.e., A is also NP-complete.

Such procedure is called aspolynomial-time transformation (transformation for short).

For an optimization problem, to prove it is NP-hard, the first step is formulating a decision counterpart, and then using transformation method to prove that its decision counterpart is NP-complete (in weak or strong sense), indicating that the original optimization problem is (weakly or strongly) NP-hard.

2.2 Description of Scheduling Problem

Scheduling problem [10, 24, 67] can be formalized as: given a set of jobs J = {J₁, ..., J_j, ..., J_n} and a set of machines M = {M₁, ..., M_i, ..., M_m}, the aim is assigning these jobs to these machines, to complete the whole set of jobs or a part of

(18)

it (in case of scheduling with rejection) under the given constraints, and optimize one or more objectives.

In general, machine environments can consist of single machine, parallel machines or dedicated machines. In the first case, there is only one machine in the system to execute jobs. For the second model, several machines with same or different speeds are available to process jobs. While for the third version, job is divided into several tasks, which should be processed by different machines respectively.

Moreover, job J_j can be described by the following dedicated parameters.

pj: Standard processing time (or size) of job J_j. The exact processing time of job Jj on machine M_i depends on the machine environment, cf. Subsection 2.2.1.

r_j: Release time (or ready time) of job J_j. It is the time at which job J_j arrives to the system, i.e., the earliest time at which job J_j can start its processing.

d_j: Due date of job J_j. It represents the preferred completion time for job J_j (e.g., the date promised to the customer). Sometimes, completion of a job after its due time is allowed, but a penalty will be incurred.

d_j: Deadline of job J_j. Parameter similar to due date, however, delay after this deadline is not allowed.

w_j: Weight of job J_j. It is basically a priority factor, denoting the relative importance of one job with regard to the others in the system.

To describe scheduling problems more clearly, a classical three-field notation α|β|γ [40] can be used, where α presents the machine environment in detail, β describes the details of jobs characteristics, while γ states the objective function.

2.2.1 Machine Environments (α)

As mentioned, there are three basic machine environments studied in scheduling theory, which are single machine, parallel machines and dedicated machines (shop systems). These environments can be presented in the flied α, which includes two elements α = α₁α2.

In detail, element α₁ denotes the machine type, as followed.

1: Single machine environment, where only one machine in the system can be used to process jobs. It can be consider as the simplest case among all possible machine environments and a special case of all other more difficult cases.

α₁ ∈ {P, Q, R} means that a parallel machines environment is considered, which consists of several machines in parallel, but a job J_j requires a single operation only, which may be processed on any one of these machines.

P : Identical machines in parallel. There are several machines in the system with the same speed for executing job. The exact processing time of job J_j is p_j.

(19)

Q: Uniform machines in parallel. Each machine Mi has a speed s_i, so the exact processing time of job J_j on M_i is equal to ^p_s^j

i.

R: Unrelated machines, which is the further generalization of the previous two cases and can be considered as the most complex one in parallel environment.

There are several machines and each machine M_i has different speed for each job J_j, i.e., when processing job J_j, M_i’s speed is s_ij and the exact processing time of job J_j on M_i is _s^p^j

ij.

When α₁ ∈ {F, J, O}, it means that a shop system is studied and a job J_j is divided into n_j tasks T_1j, T2j, ..., Tnjj, which should be processed by different machines, in a fixed or arbitrary order.

F : Flow shop system. There are several machines in series and each job has to be processed on each machine. All jobs have to follow the same route, i.e., they have to be processed first on machine M₁, then M₂, and so on.

J : Job shop system. Similar with the previous case, a job is divided into several tasks which have to be processed on each machine. However, the distinction is that each job has its own predetermined route to follow.

O: Open shop system. The most flexible shop system, where the route for each job J_j on machines is not fixed, which means that the scheduler is allowed to determine a route for each job and different jobs may have different routes.

The second element in the machine environment characteristic α₂ is a number, which denotes the number of machines in the environment, e.g. "P 2" denotes two identical machines and "Qm" denotes m uniform machines. However, the appear- ance of α₂ can be omitted, which means that the number of machines is arbitrary, and it is one of the input parameters of the problem.

2.2.2 Jobs Characteristics (β)

The second element of three-field notation, i.e. field β, consists of several sub- parameters β_i, separated by commas, each of which describes one of the characteristics of the input. For example, if preemption is allowed during the scheduling (which means that it is not necessary to keep a job on one machine from its started until its completion, and that the scheduler is allowed to interrupt the processing of a job at any point in time and put a different job on the machine instead), β_i can be written as "pmtn". β_i should be written as "prec", if precedence constraints are defined for jobs, which indicate that before a new job is allowed to start its processing, one or more jobs must be finished. Specifically, there are three main types of constraints: "chains" means that each job has at most one predecessor and at most one successor, "intree" means that each job has at most one successor, and

"outtree" means that each job has at most one predecessor. Besides, if jobs come

(20)

to the system along with release times or due dates, such symbols must appear in the field β_i, i.e., r_j or d_j, and so on.

Since the varieties of processing characteristics are considered in the literature, only the ones used in this thesis are listed below.

online over list: This parameter indicates that the problem is an online problem, where scheduler knows the information on jobs, one by one over list, not in advance.

reassignment: This parameter means that after all the jobs have been assigned to machines, some of them can be moved from the current machine to another.

buf f er: During the scheduling, a reordering buffer can be used for storing jobs temporarily.

GoS: Scheduling jobs under a grade of service provision is considered, i.e., each job and machine has a hierarchy, and job J_j can be assigned to machine M_i only when J_j’s hierarchy is larger than or equal to the hierarchy of M_i.

d_j = d: Each job J_j has a common due date d.

sum: The total processing time of all jobs is known in advance.

OP T : The optimal criterion value in offline case is known in advance.

2.2.3 Objective Functions (γ)

The last part γ in three-field notation is an (or several) objective function(s), to be minimized or maximized. To present these functions more clearly, more parameters of job J_j have to be introduced, cf. Figure2.2.

Figure 2.2: Job parameters for an exemplary scheduling problem on parallel machines [73]

Cj: Completion time of job J_j, which means the time when J_j is finished.

(21)

Fj: Flow time, which is F_j = Cj− r_j. Lj: Lateness, which is L_j = Cj− d_j.

D_j: Tardiness, which is D_j = max{0, C_j− d_j} = max{0, L_j}.

U_j: Number of tardy jobs, which is U_j =

0, if C_j ≤ d_j 1, if C_j > d_j . Ej: Earliness, which is E_j = max{0, dj − C_j} = max{0, −L_j}.

|L_j|: Earliness-tardiness, which is |L_j| = E_j+ Dj.

Y_j: Late work, which is Y_j = min{p_j, max{0, C_j− d_j}}.

Then the objective functions can be defined as follow.

C_max: Makespan (length) of the schedule, which is C_max = max{C₁, C₂, ..., C_n}.

Minimizing makespan usually implies a good utilization of the machine(s).

L_max: Maximum lateness of the schedule, which is L_max = max{L₁, L₂, ..., L_n}, which measures the worst violation of the due dates.

F or F_w: Mean (weighted) flow time, which are F = _n¹Pn

j=1F_j or F_w =

1 Pn

j=1wj

Pn

j=1wjFj and can be considered as the mean (weighted) response time for a set of jobs (e.g., the time spent by a customer’s requirement in the system, from its release till its finish).

D or Dw: Mean (weighted) tardiness, which are D = ¹_nPn

j=1Dj or D_w =

1 Pn

j=1wj

Pn

j=1w_jD_j denoting the mean delay of all jobs. They can represent the penalties for exceeding a delivery time agreed with a customer.

E or E_w: Mean (weighted) earliness, which are E = _n¹ Pn

j=1E_j or E_w =

1 Pn

j=1wj

Pn

j=1wjEj. On the contrary to tardiness, earliness can be considered as the inventory cost caused by completing an order too early with regard to the time requested by a customer.

U or Uw: (Weighted) number of tardy jobs, which are U = Pn

j=1Uj or U_w = Pn

j=1w_jU_j. This criterion makes the scheduler pay attention to the number of jobs being late.

Y or Y_w: Total (weighted) late work, which are Y =Pn

j=1Y_j or Y_w =Pn

j=1w_jY_j. They consider the penalties related to the late parts (i.e., the parts executed after due date) of the jobs, when these jobs are finished after their due dates.

Based on the above statements, a scheduling problem can be presented clearly by the three-field notation. For example, F 4||C_max means scheduling jobs in a flow shop system with four machines, with the goal of minimizing makespan. It can be consider as the simplest case of the problem presented in Example 1.1.

(22)

Q|online over list|F means online scheduling jobs on uniform machines, where the aim is minimizing mean flow time, which corresponds to the problem sketched in Example 1.2.

2.3 Online Algorithm and Competitive Analysis

Online problems began to draw researchers’ attention in the middle period of the eighties of the twentieth century. The basic character of online problems is "lack of information", comparing with the ones which have the access to the full information, i.e. offline problems.

The algorithms proposed to solve online (or semi-online) scheduling problems are called as online (or semi-online) scheduling algorithms. To estimate the efficiency of these approaches, the most commonly used method is competitive analysis [31, 72], which indicates the difference in solutions’ quality if the input is given online, comparing with the situation when the input is given offline. In competitive analysis, the performance of an online (or semi-online) algorithm is measured by its competitive ratio, which is described as follows.

For a job sequence J and an online (or semi-online) scheduling algorithm A, let C_J^A denote the quality of solution (for minimization or maximization problem) produced by A, while C_J^∗ denote the quality of the optimal solution in offline model.

Then the competitive ratio of A is the infimum r such that for any input:

C_J^A≤ r · C_J^∗ (for minimization problem) or C_J^∗ ≤ r · C_J^A (for maximization problem).

If there is an online algorithm with a competitive ratio r, this algorithm is called r-competitive algorithm.

Generally, an online (or semi-online) scheduling problem has alower bound ρ if no online (or semi-online) scheduling algorithm has a competitive ratio smaller than ρ. To determine the lower bound of the problem, one can use the "adversary method ". This technique means to find a "bad jobs sequence" for which any online algorithm cannot construct a "good" solution (i.e. it cannot have a competitive ratio smaller than the lower bound) when running for this sequence. Such jobs sequence is called adversary sequence.

The classical example for adversary sequence is the one used in the analysis of the problem P 2|online over list|C_max [30]. In this sequence, the first two jobs have the same size of 1. If (Case 1) any algorithm A assigned these two jobs to the same machine, then the jobs sequence stops and there is no further job. Then C_JÂ = 2, while the optimal offline solution has C_J^∗ = 1, by assigning each job to each machine. Otherwise (Case 2), if A assigned first two jobs to the different machines, then the third job with size 2 comes, and it is the last job. In this case, C_JÂ= 3, while C_J^∗ = 2, by assigning two small jobs to one machine and the big job to the other machine. Therefore, for both cases, ^C_C^JÂ∗

J

≥ 1.5, and then this problem (P 2|online over list|C_max) has a lower bound 1.5.

(23)

2.3. Online Algorithm and Competitive Analysis 17

Symmetrically, a problem has an upper bound r, if there exists an online (or semi-online) algorithm for the problem with a competitive ratio r. To prove that a certain algorithm has a competitive ratio r, the analysis of its behaviour should consist of several cases, corresponding to different situations, which can be met by the algorithm. These particular cases should cover all the possibilities and should be independent one of another. For each case, it should be proven that for any input sequence of jobs, the ratio between online solution and optimal offline solution is at most r.

If an online (or semi-online) algorithm has a competitive ratio r equal to ρ, which means matching the lower bound of the problem, it is called optimal or best possible online (or semi-online) algorithm. It is worth to be underlined, that

"optimality" does not mean that the online algorithm can give an optimal solution as the offline method would do, but indicates only that no online (or semi-online) algorithm can achieve the schedule of the quality better than a certain threshold. If the problem has the same lower and upper bound (i.e., there is an optimal online algorithm for the problem), it is said that this problem has atight bound .

(24)

(25)

Chapter 3

State of the Art

Contents

3.1 Pure Online Scheduling. . . . 19 3.2 Semi-online Scheduling with a Buffer . . . . 21 3.3 Semi-online Scheduling with Reassignment . . . . 23 3.4 Scheduling under Hierarchical Constraint . . . . 25 3.5 Scheduling with Late Work Criterion . . . . 27

As it was mentioned in Introduction, scheduling problem, which is one of the earliest studied branch in combinatorial optimization and operational research, has been investigated since 1960s. Lots of scientific models have been proposed based on the experience gathered in practice, concerning e.g., flexible manufacturing systems, supply chain management, transportation planning and so on, and many fruitful results have been achieved for past 50 years, for offline, online and semi-online environments.

This chapter gives concise presentation of state of the art for problems investigated in this thesis, and it is organized as follows. In Section 3.1, the results for pure online scheduling problems are browsed, for making some comparisons possible with semi-online versions described in the next two sections. Namely, Section 3.2 surveys the results obtained for the problem of semi-online scheduling with a buffer.

Then, in Section 3.3, semi-online scheduling with reassignment is discussed. Section 3.4 shows results for the scheduling problems under hierarchical constraint, which makes the models more complex. These four sections are devoted to the scheduling goal of minimizing the makespan, while Section 3.5 discusses a different objective function, i.e. late work criterion.

3.1 Pure Online Scheduling

To show the power of semi-online models, which can be considered as relaxation of the two basic requirements of the online model, the results for pure online scheduling problems should be introduced at first.

It is accepted that the first online scheduling algorithm, named as List Schedul- ing algorithm (LS for short), was proposed by Graham in 1966 [38] for problem P |online over list|C_max. This method uses the greedy approach and always assigns the coming job to the machine, on which it can be started the earliest, i.e., to the

(26)

machine with the minimum current load. In his paper, Graham proved that List Scheduling algorithm has the competitive ratio 2 − _m¹, where m is the number of machines.

Only more than about thirty years after the proposal of Algorithm LS, online problems started to draw more and more attention of various research teams. Several new upper bounds were proven, improving the previous results one by one. The first algorithm which beat LS for m ≥ 4 appeared in 1993 and was proposed by Galambos and Woeginger [35]. It has the competitive ratio 2 −_m¹ − ε, for ε → 0 when m → ∞.

The first algorithm with competitive ratio strictly less than 2 for any number of machines m, equal to 1.986, was constructed by Bartal et al. in 1992 [6]. Then the ratio was improved to 1.945 by Karger et al. in 1996 [49] and later, in 1999, to value 1.923 by Albers [1]. Finally, the best currently known upper bound, equal to 1.9201, was proven by Fleischer and Wahl in 2000 [32].

For the lower bound of problem P |online over list|C_max, Feigle et al. [30] proved that it is equal to ³₂ and ⁵₃, for m = 2 and m = 3, respectively, which indicates that Algorithm LS is an optimal online algorithm for two and three machines. However, for larger m (m ≥ 4), the gap between lower and upper bound is still open. It has been narrowed but not closed. The former lower bound was improved to 1.837 by Bartal et al. in 1994 [7], then to 1.852 by Albers in 1999 [1], to 1.854 by Gormley et al. in 2000 [37], and finally to 1.880 by Rudin in 2001 [69], which is the best currently known result for lower bound.

For uniform machines case (Q|online over list|C_max), the situation is much more complex than for identical machines, because the upper and lower bounds should be the functions of parameters s_i and m, where s_i is the speed of machine M_i for 1 ≤ i ≤ m. For example, when m = 2 (i.e., for problem Q2|online over list|Cmax), Algorithm LS has the competitive ratio

( 2s+1

s+1, 1 ≤ s ≤ ¹⁺

√ 5 2 s+1

s , s > ¹⁺

√5 2

,

where s₁ = 1 and s2 = s ≥ 1, and it is an optimal online algorithm [29]. For arbitrary number of machines m, LS has the competitive ratio

1≤k≤mmin { P_m

i=1s_i+ (m − k)s_m Pm

i=ks_i }[41].

Then, it is natural to raise the question whether there exists an algorithm with constant competitive ratio, and the answer is positive. The first such algorithm was proposed by Aspnes et al. in 1997, with the competitive ratio equal to 8 [3].

Later, this result was improved by Berman et al. in 2000 [8], where the upper and lower bounds were determined to be 5.828 and 2.438, respectively. Recently, the lower bound of this problem was improved to θ by Ebenlendr and Sgall [25], where θ ≈ 2.564 is the solution of the equation R1

0 ln θ

ln(1−θ^−x) dx = −1.

Pure online scheduling, including parallel machines online scheduling presented above, is based on two properties introduced before: "the lack of information" which