sum This is to certify that the thesis entitled EFFECTIVE ALGORITHMS FOR MINIMIZING WEIGHTED AND UNWEIGHTED FLOW TIME IN SINGLE AND MULTIPROCESSOR ENVIRONMENTS presented by 2%? E 55 a? CARL BUSSEMA m a) a a a > CD .SP'E :J c D .9 has been accepted towards fulfillment E of the requirements for the MS. degree in Computer Science V Major Professor’s Signature fiin0?L Date MSU is an affirmative-action, equal-opportunity employer . ..-----a-.-n----.---.-----u..-.-.n.----.--u--v-u---n --—.-.-.---l-c—i-.-.—._---h---a----—o-4—u—‘—‘— PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 9 1 onnn 'V .l; LUUJ 6/07 p:/C|RC/DateDue.indd-p.1 EFFECTIVE ALGORITHMS FOR MINIMIZING WEIGHTED AND UNWEIGHTED FLOW TIME IN SINGLE AND MULTIPROCESSOR ENVIRONMENTS By Carl Bussema III A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Computer Science 2007 ABSTRACT EFFECTIVE ALGORITHMS FOR MINIMIZING WEIGHTED AND UNWEIGHTED FLOW TIME IN SINGLE AND MULTIPROCESSOR ENVIRONMENTS By Carl Bussema 111 We consider two problems of online scheduling of jobs to minimize different metrics in different environments. First, we attempt to minimize the Lp norms of flow time and stretch on m identical machines. This is a hard problem for which no constant competitive online algorithms are known. We consider the case where the online al- gorithm is given slightly faster machines or slightly more machines than the optimal offline algorithm. Using resource augmentation, we show that the greedy algorithm Shortest Job First (SJF) is (1 + €)-speed O(1)-competitive for minimizing the un- weighted Lp norms of flow time and stretch on m identical machines. We also extend this result to apply to the Shortest Remaining Processing Time (SRPT) algorithm and for the case where SJ F and SRPT are given extra machines. From that result, we derive that the greedy Highest Density First (HDF) algorithm is (1 + €)-speed O(1)- competitive for minimizing the weighted Lp flow time on m identical machines. This is the first result which proves a constant factor competitive ratio for this problem with arbitrarily small resource augmentation. Second, we also analyze the related algorithm, Highest Remaining Density First (HRDF) in a uniprocessor environment, and show that it is Q(P)-competitive for weighted flow time without augmentation. TABLE OF CONTENTS 1 Introduction to Scheduling ................. , ....... 1 1.1 Scheduling Overview ........................... 1 1.2 Analysis Techniques ........................... 2 1.3 Definitions and Notation ......................... 4 1.4 Definition of Algorithms and Problems ................. 5 1.5 Previous Results and Related Work ................... 6 1.6 Our Results ................................ 8 2 Multiprocessor algorithms with resource augmentation ....... 10 2.1 Unweighted Lp norms of flow time and stretch ............. 10 2.1.1 Notation .............................. 10 2.1.2 Proof Structure .......................... 11 2.1.3 An Association Scheme for Bounding F P(A, a) ......... 12 2.1.4 S] F ................................ 15 2.1.5 SRPT ............................... 18 2.1.6 Extra machines .......................... 19 2.2 Weighted Lp norms of flow time ..................... 21 2.3 Conclusions ................................ 23 3 Single machine without resource augmentation ............ 24 3.1 Introduction ................................ 24 3.2 Overview .................................. 24 3.3 Proof .................................... 25 3.4 Conclusions ................................ 26 Bibliography ................................... 27 iii Chapter 1 Introduction to Scheduling In this chapter, I will give a broad overview of the problem of scheduling, explain and define notation and terminology for the parameters that influence particular problems, define the specific problems I will cover in this thesis, summarize prior work in this field, and finally give the results that will be explored in depth in the remaining chapters. 1. 1 Scheduling Overview Scheduling is a fundamental problem that is both theoretical and practical. It is stud- ied in many fields: computer science as it is here, math for the algebra and problem solving aspects, business for the application of managing human or machine resources efficiently, and logistics for operational environments and applications. Each field takes a slightly different approach and emphasizes a different aspect of the problem, but at the core, all study the same problems. A schedule involves a set of jobs, or work to be done, one or more machines or processors, and an assignment for each machine of which job to run at any given time. In the theoretical model for studying scheduling problems, time may be abstracted to a series of unit-length intervals, during which each machine may work on only one job. The minimum time required to finish any given job is some number of such intervals. The speed of a machine, typically abstracted to “1”, determines how much work each machine accomplishes during each time interval. The most important variable in a problem is whether your schedule is “offline,” where you have full knowledge of all jobs that will be released in the future before starting, or “online,” where a schedule can only use information about jobs released at or before the current time step, and has no knowledge of jobs in the future. It is easy to see that, all else being equal, an offline algorithm, having more information, would be able to do at least as well as an online algorithm for a given problem. However, the offline version of many scheduling problems, including the ones we will study in this paper, are NP-hard. Some online algorithms operate under the assumption that the time it will take to complete a job is known to the algorithm as soon as the job is released, a concept known as “clairvoyance,” while a non-clairvoyant algorithm does not know how much time is required to complete a job. The clairvoyant setting is similar to an environment found in many applications: a file server could consider the size of the file to be delivered and the speed of the network and disks; a database server could estimate the size of the query by looking at the tables involved; a web server with dynamic pages might estimate based on a sum of a file transfer calculation and a database calculation. In all cases, online is vastly more appropriate than offline, since rarely do servers operate by taking all requests at once and then closing to new requests. All algorithms we study in this paper will be online and clairvoyant. More formal definitions can be found in Section 1.4. 1.2 Analysis Techniques Schedules can be evaluated in many different ways, but these all divide into metrics that are machine-centric and seek to Optimize the server resources, or metrics that are client-centric and seek to optimize the experience for clients requesting the service. Historically, machine-centric criteria like makespan (the total time required between the start of the first job and the end of the last job) were more commonly studied, but recently client-centric evaluations have been increasingly presented as alternatives. Among client-centric criteria, one of the most Obvious ones to use is the total flow time metric, also called wait time or response time, where the flow time of a job is its completion time minus its release time. However, algorithms that are optimal for minimizing total flow time often starve individual jobs, which is undesirable. To alleviate that problem, some recent work has focused on the Lp norms of flow time, and so we also examine this metric in more detail later. Related to flow time is the metric of stretch, which is the flow time of a job divided by a job’s processing time (the actual time required to complete the job). Another client-centric metric is the weighted flow time, where a job’s flow time is multiplied by the weight, or priority, of a job. As with unweighted flow time, recent work has begun to study the Lp norms of weighted flow time as well. In this paper, we will study only client-centric scheduling criteria. The flow time may be intuitively considered the time a job spends waiting between release and completion. A large job necessarily has a large flow time, since it must wait at least as long as it would take to run the job completely. For that reason, the stretch is also considered, which may be thought of as the perceived wait time of a job: a user would expect their long job to take a long time; if that expectation is directly pr0portional to the length of the job, then stretch adequately encapsulates that perception. Weighted flow time measures the severity of the delay by allowing a low-priority job to be delayed a long time with minimal penalty, while a high-priority job must be run quickly to avoid falling behind. In order to fairly evaluate any online algorithm, theorists measure an algorithm’s performance against OPT, a theoretical optimal offline algorithm. OPT is assumed to have perfect knowledge of all jobs beforehand; i.e., OPT may act as if it were offline and clairvoyant, although it still may not schedule a job before its release time or use more machines than the algorithm being measured. We may not know exactly what OPT is or how it runs, but we can often determine properties about how it must behave in a particular situation, or at least determine certain minimums that it must meet. An algorithm’s score for a metric is divided by OPT’s score for the same metric, and this ratio is said to be the competitive ratio of the algorithm. For example, an algorithm that could always earn a flow time score no more than 10 times what OPT earns would be lO-competitive for flow time. These ratios are usually simplified using asymptotic notation. Recently, theorists have begun studying the concept of “resource augmentation,” whereby the algorithm being evaluated is allowed to use extra speed or extra machines compared to OPT. Typically, this is measured by some small augmentation factor, 6 < 1, and the algorithm is then said to be, for example, 1 + 6 speed. The advantage of studying this concept is threefold. First, there may be no constant-competitive algorithm without augmentation, but with just 6 extra speed, we can reach a bound of 0(1 / e), which is constant, so we now have an idea of a good algorithm. Second, not all algorithms are capable of utilizing the extra speed effectively: two algorithms with identical unaugmented performance may be differentiated by augmentation; perhaps only one of the two algorithms can reach constant-competitive with augmentation. Third, it may be realistic to Operate under the assumption that a server does have extra speed available, since most servers in deployment are not constantly 100% utilizied, they have extra speed available, in effect. 1.3 Definitions and Notation We will use the following definitions and notation throughout the paper. An input instance I consists of n jobs where job J,- is characterized by its release time, ri, its weight or priority 11);, and its size or work pi. The density of a job dz- = wi/pi. The jobs will be scheduled on m identical machines where no job can be assigned to more than one machine at any time. We assume that jobs can be preempted and migrated from one machine to another with no penalty. Finally, the remaining size of a job at any time t is its initial size minus the amount of work done on that job up to time t. Given a scheduling algorithm A, we denote the completion time of job J,- as ci(A). The flow time of job Ji, denoted F,(A), is ci(A) — ri. The stretch of job Ji, denoted am), is ale/p. We then define PPM. I) = mater. 5PM. I) = ale-(Ant and WFP(A, I) = Z,- wz-(Fz-(A))p. The Lp norms of flow time, stretch, and weighted flow time are then (FP(A, 1))1/1’, (Sp(A, 1))1/1’, and (WFP(A, 1))1/1’, respectively. To simplify notation, we will often drop the A and I when the algorithm or input instance is clear from context. Let M (A(I )) denote the cost of algorithm A when applied to input instance I when we focus on metric M where M may be the Lp norm of flow time, stretch, or weighted flow time. An online algorithm A is said to be c-competz'tz've for some metric M if M (A(I )) g c - M (0PT(I )) for any input instance I. An algorithm A using resource augmentation in the form of faster processors is said to be s-speed c-competitive if the competitive ratio c can be maintained when A’s processors are 3 times as fast as OPT’s (note that c may be defined in terms of s and is not necessarily constant). Analogously, an algorithm A is said to be s-machine c—competitive if the competitive ratio c can be maintained when A has 5 times as many processors as OPT. 1.4 Definition of Algorithms and Problems Shortest Job First [SJ F], Shortest Remaining Processing Time [SRPT], Highest Den- sity First [HDF], and Highest Remaining Density First [HRDF] are greedy, preemp- tive, migratory algorithms that always schedule the m unfinished jobs with highest priority where the priority metric differs for each algorithm. For SJ F, the priority is the initial size of the job, pi; for SRPT, the priority is the remaining size of the job; for HDF, the priority is the density of the job, di; and for HRDF, the priority is the remaining density of the job, which is the weight, w; divided by the remaining size of the job. Remaining density strictly increases as a job is processed. In all algorithms, ties are broken first in favor of a running job, then arbitrarily. Furthermore, jobs are held in a central queue until assigned to a machine, and a preempted job is returned to the queue. We study three minimization problems: first, the unweighted Lp norm of flow time and Stretch on multiple machines, using SJ F and SRPT with 6 resource augmentation; second, the weighted Lp norm of flow time, using HDF with 6 resource augmentation; and third, the weighted (unnormalized) flow time on a single machine, using HRDF without augmentation. We study these problems because many currently-used scheduling algorithms are used because they perform well on server-centric criteria. While server-centric algo- rithms may utilize server resources well, clients may receive poor performance. By focusing on a client-centric criterion like flow time, we change the emphasis from simply “use resources efficiently” to “use resources fairly while being as efficient as possible.” Adding the Lp norms to the analysis further improves the fairness to clients, since no individual job may be sacrificed for the sake of all other jobs. A multiprocessor environment is becoming increasingly common, even in home desktop systems that now feature dual- or quad-core CPUs. Finally, studying the weighted flow time case allows for a more interesting environment as might befit a corporation that may have work assigned to various tiers of priorities. 1.5 Previous Results and Related Work In [5], Bansal and Pruhs persuasively argue that the conflicting criteria of minimizing the total wait time of an algorithm without starving individual jobs can be formalized as the weighted Lp norms of flow time. However, they consider only the uniprocessor server scheduling problem. The multiprocessor setting is important as enterprise web and file servers often are composed of multiple machines in order to handle the large volume of requests they need to serve. In this paper, we show that HDF is an effective algorithm for the multiprocessor server scheduling problem. A key step in our proof is showing that the SJF and SRPT algorithms are effective at minimizing the unweighted Lp norms of flow time and stretch on multiple machines. Bansal and Pruhs showed that no online algorithm can be constant competitive for the problem of minimizing the Lp norms of flow time or stretch for 2 S p < 00, even in the uniprocessor environment [4]. Thus, to achieve constant competitive ratios for the problem of minimizing the weighted Lp norm of flow time, we are forced to consider the resource augmentation technique popularized by Kalyanasundaram and Pruhs [10]. Bansal and Pruhs show that several algorithms including SJF and SRPT are (1 + e)-speed 0(1)-competitive algorithms for minimizing the Lp norms of flow time and stretch on a single machine [4]. Chekuri et al. then show that simple uniprocessor scheduling algorithms such as SRPT combined with simple load balancing algorithms are also (1+ e)-speed 0(1)-competitive for minimizing the Lp norms of flow time and stretch on m identical machines [8]. In particular, they consider an algorithm that immediately dispatches jobs to machines and allows no migrations introduced by Avrahami and Azar [1]. Bansal and Pruhs then show that HDF and a nonclairvoyant algorithm are (1+ e)-speed O(1)-competitive for the problem of minimizing weighted Lp norms of flow time on a single machine [5]. If we ignore the Lp norms and focus only on total flow time, stretch, or weighted flow time, more results are known. On a single machine, Bansal and Dhamdhere give an O(W) algorithm for minimizing weighted flow time where W is the ratio of the maximum job weight divided by the minimum job weight [3]. Note that this algorithm is constant competitive if there are only a constant number of distinct job weights. Similarly, Chekuri et al. described an algorithm that we have named HRDF and gave a lower bound of JP, where P is the ratio of the largest job size to the smallest job size [9]. In the same paper, they proved that if P is known a pm’om‘ to a biased greedy algorithm, it can achieve a 0(log2 P)-competitive ratio. With resource augmentation, Becchetti et al. show that HDF is (1+e)-speed 0(1+ 1/e)-competitive in the uniprocessor setting and (2 + e)-speed O(1+ 1 /e)—competitive in the multiprocessor setting [7]. Phillips et al. show that an algorithm they named Preemptively—Schedule-By-Halves is a 2—speed 1-competitive algorithm on a single processor [14], and Becchetti et al. observe that this analysis also applies to HDF [7]. Bansal and Dhamdere then show that a nonclairvoyant algorithm is also (1+ €)-speed 0(1 + 1/€)-competitive on a single machine matching HDF’s performance [3]. If we consider total flow time, it is a well-known result that SRPT is an optimal algorithm in the uniprocessor setting, and Leonardi and Raz showed that SRPT is O(min{log n / m, log P})-competitive in the multiprocessor setting where P is the ratio of the maximum processing time of any job divided by the minimum processing time of any job [11]. They further show that this is optimal within constant factors for any online algorithm, randomized or deterministic. Muthukrishnan et al. show that SRPT is constant competitive for minimizing stretch for the multiprocessor environment [13]. Many algorithms without migration and then with immediate dispatch were developed that had essentially the same competitive ratios within constant factors [1, 2, 6, 9]. With resource augmentation, Phillips et al. first showed that SRPT is a (2 —1/m)- speed 1-competitive for minimizing total flow time on m identical machines [14], and McCullough and Torng improved this result to show that SRPT is in fact an s-speed 1/s-competitive algorithm for minimizing total flow time on m identical machines when 5 Z 2 —1/m[12]. 1.6 Our Results We first extend the work of Bansal and Pruhs [4] and Chekuri et al. [8] by showing that SJF and SRPT are (1 + e)-speed 0(1 + 1/e)-competitive for minimizing the Lp norms of flow time and stretch on m identical machines. These results extend to extra machines. Furthermore, the constant hidden in the Big Oh notation is somewhat better than the constant for the immediate dispatch algorithm as analyzed by Chekuri et al. We then use our SJF extra speed result for flow time to extend the result of Bansal and Pruhs [5] by showing that HDF is a (1+ e)-speed 0(1+1/e2)-competitive for minimizing the weighted Lp norms of flow time on m identical machines. For the most part, our proofs are simple extensions of those found in [4, 8, 5]. Lastly, we present a lower bound of Q( P) for HRDF on a single machine without augmentation, to show that the weighted flow time problem is significantly more difficult than the unweighted problem. This improves the bound of (lb/P) presented in [9]. Chapter 2 Multiprocessor algorithms with resource augmentation In section 2.1, we will explore the popular online algorithms SJ F and SRPT as aug- mented by faster processor or more machines, proving that each of the four algorithms is 0(%)-competitive for the Lp norms of unweighted flow time and stretch. All these proofs will use a common framework that we outline in subsections 2.1.2 and 2.1.3. Following that, section 2.2 will extend the results for SJF to show that HDF is 1 2 O (E) -competitive for weighted Lp norms of flow time when augmented with faster processors. 2.1 Unweighted Lp norms of flow time and stretch 2.1.1 Notation We first define the following notation. Let A denote any algorithm and t denote any time. Let U (A,t) be the set of unfinished jobs for algorithm A at time t, and the number of such jobs. Which meaning of U (A,t) will be clear from context. The total flow time of a schedule can be computed as f U (A,t) dt. For any job J,- in U (A, t), let Age,(t) = t— r,- and SAge,(t) = Agei(t)/p,-; informally this is a “stretched age.” Let Agep(A,t) denote the sum over all jobs J,- E U (A,t) of Age,(t)p‘1, and let SAgep(A, t) denote the sum over all jobs J, E U (A, t) of SAgez-(t)p‘1. Bansal and Pruhs observed that Fp(A) = ngep(A, t) dt and that Sp(A) = fSAgep(A,t) dt [4]. We use ng(t) to denote the sum of remaining sizes, or Volume, of all unfinished jobs of initial size at most It for SJF or SRPT at time t released prior to time t. We similarly define V§k(t) for OPT. We use Psk(t), to denote the sum of initial sizes, or Processing Times, of all unfinished jobs of initial size at most It for SJF or SRPT at time t released prior to time t. We use Psk(t1, t2) to denote the sum of initial sizes of all jobs of initial size at most It released during interval [t1, t2). 10 2. 1.2 Proof Structure Our strategy for proving competitive ratios for SJ F and SRPT will follow that of [8] which is a modification of the uniprocessor strategy of [4]. We focus on the Lp norms of flow time as the proof for stretch is essentially identical. Given that FP(A) = f Agep(A, t) dt, we would prove our result if we can bound AgeP(A, t) relative to Agep(0PT, t) for all times t. Unfortunately, as observed in [8], this is not possible as online algorithms can “fall behind” OPT so that there can be times where Agep (A, t) > 0 while Agep (OPT, t) = 0. To compensate for this problem, at any time t, we restrict our attention to unfinished jobs that are sufficiently old as was done in [8]. More formally, we define Ua(A,t) = (J; E U(A,t) | Age,(t) _>_ 049,-} where a > O is a constant to be defined later. a serves as a delaying factor; a job that is not at least (1 times as old as its processing time is uninteresting to the schedule, since it is not yet old enough to contribute significantly to the flow time. We then define Agep(A, t, a), the pt“ norm of Age with respect to a, to be the sum over all jobs Jz- E U a(A,t) of Age,-(t)p—1 and the pt“ norm of flow time with respect to a, FP(A,a) = ngep(A,t,oz) dt. Finally, we define DIF(A) = FP(A) — FP(A,a), or the difference between the actual flow time of A and the flow time with respect to a. We can then bound Fp(A) relative to Fp(0PT) if we can individually bound both FP(A,Oz) and DIF(A) relative to FP(OPT). For DIF(A), we first observe that FP(OPT) 2 (prfl/p as each job j,- requires at least pi time to finish. On the other hand, DIF(A) S (2(api)p)1/p as Fp(A, a) accounts for the entire time that each job is in the system except for at most the first apz- time units. Combining these inequalities and simplifying, we observe that DIF(A) g an(OPT). 11 2.1.3 An Association Scheme for Bounding F ”(A, a) For FP(A,a), we observe that for any time t, jobs that are in both U a(A,t) and U (OPT,t) contribute equally to Agep(A, t, a) and Agep(0PT, t). Thus, the reason that Agep(A, t, a) may greatly exceed Agep(OPT, t) are the jobs that are in U 0‘(A, t) but not in U(OPT, t). We define the dz'fi‘erence set, D(t) as Ua(A,t) — U(OPT,t) where the specific algorithm A will always be clear from context. D(t) represents the old jobs that A has left to finish at time t but which OPT has already completed. To simplify notation, we will often use D to represent D(t). We then define the pt“ norm of the Age of the difference set, Agep(D, t) to be the sum over all jobs J,- E D(t) of Age,-(t)p_1 and observe that Agep(A,t, a) g Agep(OPT, t) + AgeP(D, t). We now define an association scheme SB similar to that of [4] to show that Agep(D,t) is bounded relative to Agep(0PT, t). Given 0 < [3 < 1, for each job Jj in U (OPT, t), we define a set of jobs 53(3) with the following pr0perties: 1. For all J16 Sfi(j),Age,(t)S1/fiAgej(t). 2. For all J,- E Sfi(j), pj 5 Pi- With this constraint, the argument easily extends to Lp norms of stretch as well, and we will not cover those details here. 3. ISs(j)l _<_ I/e. 4- D(t) = UJjeU(0PT,t) 5B0) This association requires that every job in the difference set is associated with at least one job that OPT has not finished (condition 4), and since not more than 1/fl jobs from D can be associated with the same j E U (OPT, t) (condition 3), the existence of such a scheme will bound the number of jobs in D. Further, since the jobs in D are within a 1/fi factor of the age the jobs they are associated with in U (OPT, t), we can bound the total Age of D, as follows: If we can create such an association scheme 33 for all times t, then AgeP(D,t) g (1/fi)p Agep(OPT,t) and 12 thus F p(A, a) S (1 + 1/fl)FP(OPT). Combining this with the earlier bound for DIF(A), we get that Fp(A) S (1+ a +1/fi)Fp(OPT). We now prove that an association scheme Sfl exists for all times t and #3 sufficiently large relative to e. For any time t, index the jobs in D(t) in increasing order of size pi. Given time t and 2' S [D(t)|, define t(i) = t — D(t — Ti). This time index, t(i.), allows us to find jobs that have an Age at least 6 times the age of the corresponding job 3”,. To prove that an association scheme 5;; exists, it sufficient to prove that we can always allocate to J,- E D(t) a 6p; amount of work from QM“) from jobs with release time at most t(z') that was not already allocated to a lower indexed job in I)(t). We use AV(t, 2'), or Available Volume, to denote the amount of work from V*(t) that can be assigned to the first 2' jobs in D(t). Lemma 1. For any time t and anyz’ S [D(t)|, AV“: 2) Z Vgpi (Ti) + P313102.) t0» — "1(t_ Ti)“ Proof. OPT’s unfinished work at time t for jobs of length at most p,- released before t(2') is at least: OPT’s unfinished work of jobs of this length as of r,- plus the total work to be done for jobs of this length arriving in [r], t(z')) minus the maximal work done by OPT during [13, t). [:1 Define W(t, 2') to be the amount of work that must be allocated to the first 2'. jobs in D(t). Lemma 2. For any time t and any i _<_ [D(t)|, W(tei) S fi(777«(t — Til + Psp,(7‘z')) Proof. Consider the first 2' jobs in D(t). Some of these jobs are released at time r,- or 13 later. By definition of D(t), OPT must finish these jobs by time t, so their total size can be at most m(t — ri). The other jobs must be released before time r,- and are unfinished by A at time ri. Thus we can bound their processing times by Pspim) and the result follows. C] It suffices to show that the amount of work required for the allocation is no greater than the amount of work available. Our goal is to show that for any time t, any i S [D(t)] and an appropriate value of 6 that W(t, t) S “m(t - Ti) + PSpi(Ti)) S V§,.,(n) + Psz-(Tie t('i)) - m(t - 7‘2) S AV(t, 2.) We restate our goal as follows. Theorem 1. Consider the online algorithms SJF and SRPT augmented with either (1 + e)-speed processors or (1 + e)m 1-speed processors. For any time t and any 1 S i S [D(t)|, there exist values of H and a such that: (1+ film“ — Ti) + fing,(Tz') S VgpiO‘z‘) + ng,(r.-. t(1')) (2-1-1) Corollary 1. SJF and SRPT are (1 + e)-speed 0(1/e)-c0mpetitiue and (l + c)m- machine 0(1/e) -c0mpetitiue. Proof. Our proofs in the next sections show that the following values for B and a make Theorem 1 true. For SJF or SRPT with 1 + e-speed processors or (1 + c)m machines, 6 = e/ (6 + 66). oz varies as follows: for SJF augmented with (1 + 6)- speed processors, (1 = 4/e + 4; for SRPT with (1 + e)-speed, a = 6/6 + 6; for SJF with (1 + e)m machines, 0: = 4/e + 8 + 66; and for SRPT with (1 + e)m machines, a=6/e+12+6e. Cl We prove Theorem 1 in the next sections, first considering SJ F with faster ma- chines. 14 2.1.4 SJF We seek to prove Theorem 1 for SJF and faster machines. The first step in the proof is to take advantage of our augmentation by deriving a bound on the volume of remaining work for OPT at any time u, Vl:(u). This derivation requires new arguments than those from [4, 8]. Lemma 3. When SJF is given (1 + e)-speed processors, for any time u and any job size k, e 1 P 1+6 S“(u)+l+e ;k(u) Z ngW) — mk (2.1.2) Proof. We first focus on SJF’s work. Define time u’ < u as the most recent time (possibly 0) such that for some positive interval of time just prior to u’, SJ F had an idle machine or was running a job with initial size > It. By definition of u’, SJ F has at most m — 1 unfinished jobs of initial size at most k at time u’ that were released prior to time u’. Thus, V5k(u') S (m — 1)k < mk. Now, by the definition of u’, every machine must remain busy until u working on jobs with initial size at most It. Thus, SJF does at least (1 + e)m work on jobs of initial size at most It during the interval [u’, u). Finally, the amount of work of initial size at most k that arrives during the interval [u’, u) is PSk(u’, u) This leads to the following observation: Fact 1. At any time u, for u' relative to u, and any job size It, the amount of unfinished work left for SJF is bounded: Vsk(u) S mk + Psk(u',u) — (1+ c)m(u — u’) (2.1.3) We now consider OPT. OPT receives Psk(u’, u) work from jobs of size at most It in interval [u’, u) and can do at most m(u — u’ ) work during interval that interval, 15 which gives the following: Fact 2. At any time u, for u’ relative to u, and any job size It, the amount of unfinished work left for OPT is bounded: EM“) 2 ngW, u) — m(u - ul) (2.1.4) I Rewriting (2.1.3) as —m(u—u’) > 1'412‘(ng(ul—mk_ng(ua u)) and substituting E into (2. 1. 4) yields V§k(u u)> figPSHu, ' u) + fivngl- 1 T—Cmnk To complete the proof, we then observe that PS k(u , u) 2 Psk(u)—mk, as at most mk of the unfinished work for SJF at time u could have come prior to time u’, by its definition. El Ffom here, we use (2.1.2) to substitute into (2.1.1), changing variables (u = r,- and k = p,). We then replace P_<_p.(rz-) with V9.03) which is obviously smaller. Combining like terms and restricting B_ < Y— gives the following goal. (l-fi)V _ . 2 22 2 2 ce—3ce+ce —2ce +6 ( C2 )(t — Ti) 2 2(1+ €)p2' 2 c e — 3ce (T) (t * Ti) 2 2(1 + 01% (t - Ti) 2 (i—Z—i—g—E) Pi Note that the 3rd goal implies the second goal when c 2 2 as C262 — 2ce2 +62 2 0 when c 2 2. We can satisfy the last goal as long as we choose a 2 (2c + 2ce)/((c — 3)c). We now choose a value of c that minimizes (1 + 1/fi + a). For SJF, selecting c = 6 yields 6 = W and a = % + 4 giving a total competitive ratio of 15—0 + 11 = 0 G). We now compare our bounds for SJ F with bounds for IMD. In [8], l3 was chosen to be am, giving an a of % for IMD with extra speed. A better bound can be 17 6 achieved by selecting fl = m to yield a competitive ratio of 574 + 1&9. While neither bound is necessarily tight, this comparison gives us some insight into the power of migration for this problem. 2.1.5 SRPT We now prove Theorem 1 for SRPT and faster machines. This proof is very similar to that for SJF and faster machines with only a few differences. We first observe that we can prove the same result as Lemma 3 for SRPT, but the proof is different because of the differences in the algorithms. Lemma 5. When SRPT is given (1 +e)-speed processors, for any time u and job size It. c P gel“) _ 1 Proof. We first focus on SRPT and define time u’ < u similar to before: the most recent time (possibly 0) such that for some positive interval of time ending at u’, SRPT had an idle machine or was running a job with remaining size > k. From this, there can be at most m unfinished jobs released prior to time u’ with remaining size at most It at time u’. For the sake of argument, say m1 of these jobs have p,- > k while the other m2 5 m — m1 jobs have pi S k, making the total work left in the system for short jobs ng(u’) _<_ mgk. Now, by the definition of u’, every machine in the interval [u’, u) must be busy processing jobs of remaining size at most k. In particular, any jobs that arrive in the interval [u’ , u) that are processed must have initial size at most k. Thus, the only work done during the interval [u’, u) on jobs with initial size larger than k is any work done on the m1 unfinished jobs at time u’. SRPT can do at most mlk work on these jobs during [u’, u). Thus, SRPT does at least (1 + e)m - mlk work on jobs of initial size at most It during [u’, u). The amount of work for jobs of initial size at most It 18 left at time u can be no more than the work left at u’, plus the work for new jobs arriving during this interval, minus the work done during the same interval. Fact 3. At any time u, for u’ relative to u, and any job size k, the amount of unfinished work left for SRPT is bounded: Vsk(u) S mgk + Psk(u',u) — ((1+ €)m(u —— u’) — mlk) S mk + PSk(u',u) - (1 + e)m(u — u’) The amount of work OPT has left is unchanged, since its solution is independent of the algorithm being tested. We use the above inequality, Fact 2, and the same observation that Psk(u’, u) 2 P5k(u) — mk to arrive at (2.1.8). B As before, we bound the work accomplished by SRPT during [ri, t(i)) for jobs of initial size pi. Lemma 6. For any job J,- E D(t), we derive the following bound: ngffil + P_<_k(rz-,t(i)) _<_ (1+ e)m (t(i) -— r; — 11-): e) — mp,- Proof. The maximal amount of work done is (1 + e)m(t(i) - ri). Ffom this amount, deduct the potential work applied to large jobs with short remaining times and the maximum potential idle time. [:1 Finishing the proof the same way as for SJ F, we select 3 = fife) and a = g + 6, for a ratio of 1%- + 13 = 0 (1). Note again that this bound is significantly better 6 than the bound for IMD. 2. 1.6 Extra machines We prove Theorem 1 for SJ F and SRPT with extra machines. In this scenario, SJ F-M or SRPT-M uses (1 + e)m l-speed machines while OPT uses m 1-speed machines. 19 The proof structure is the same as before. Two facts change: the work SJ F-M and SRPT-M have left at any time u is increased slightly, and the minimum work either algorithm does in any interval from [r23 t(i)) is slightly diminished. Modified versions of Facts 1 and 3 (combined here) and Lemmas 4 and 6 are presented below. Fact 4. For any time u, with regard to jobs of initial size at most k, when SJF-M or SRPT-M has (1 + e)m 1-speed processors: ngWl 3 (1+ e)mk + P5k(u’,u) - (1+ e)m(u - u') Proofsketch: Since we have (1 + e)m machines instead of m machines, when we define u’ as before, we get ngl’u’) _<_ (1 + e)mk instead of V5k(u’) g mk. The rest follows as before. Lemma 7. Consider SJF-M. For any time t and any job .1,- E D(t), we derive the following bound: Viral”) + P5123733 t(2')) S (1 + €)m(t(i) "* 7‘2' * 192') Proof. SJF-M could not have worked on job Jz- during [r23 t(i)) for p,- time units or else Jz- would be completed at time t and thus not part of D(t). Thus, SJ F-M must devote all (1 + e)m machines for at least t(i) — ri — pi time units to other jobs of size at most p,- during the interval [r23 t(i)). C] Lemma 8. Consider SRPT-M. For any time t and any job Jz- E D(t), we derive the following bound: ngU‘i) + nglrz', t(i)) S (1+ €)m(t(i) - Ti - 2192') Proof. SRPT-M could not have worked on job Jz- during [ri,t(i)) for p,- time units or else Jz- would be completed at time t and thus not part of D(t). Thus, SRPT-M 20 must devote all (1 + e)m machines for at least t(i) — r,- — p,- time units to other jobs of remaining size at most p,- during the interval [r,-,t(i)). Of these jobs, at most (1 + e)mp,- of this time could be devoted to jobs of initial size > p,- and the result follows. [:1 Putting everything back together, for SJF-M, we select fl = W and a = g + 8 + 6e, so the total ratio is 129 + 15 + 66 = O G). For SRPT—M, we select fi = 7—56 116 and a = g+12+6e, giving a ratio of g +19+66 = 0%) These compare to the best values for I M D with extra machines in [8] of B = file—+57 and a = 25'7' + 539, for a total ratio of i:- + 133. 2.2 Weighted Lp norms of flow time We now consider the weighted Lp norms of flow time. Recall that each job j,- is now said to have a positive weight w,, where a higher weight represents a higher priority, and the weighted flow time of a job is wifi, where fz- is the job’s unweighted flow time. We follow the proof technique used in [5] for a single machine, extending it now for multiple machines. We adopt their notations of OPT(M,I, S) as meaning the value of the optimal solution for metric M on input instance I given S—speed processors, and M (A,I , S) as meaning the worst-case solution for metric M under algorithm A on input instance 1' with S—speed processors. Given an input instance I, we define I’ as follows: for each J,- E I, create w,- identical jobs in I’, each of size pi/wi, weight 1, and release time ri. Denote these as J;1,J;2, ...J! .. and let X,- = {J;1...J{wi}. 211,2 Lemma 9. Given an input instance I and related instance I’ as defined above, OPT(FP,1",1) g 0PT(WFP,I, 1) Proof. Let S be a schedule which minimizes the Lp norm of weighted flow time for 21 I. Given S we schedule 1" as follows: for any time t, for a given machine a' under OPT(I’), work on a job in X 2- if and only if J,- is being processed on the corresponding machine a under S at t. Thus all jobs in X,- are finished when J,- finishes, so no job in Xi has a higher unweighted flow time than Ji, which by definition is wiff. The w,- jobs in X,- each contribute a maximum of ff to flow time, for a maximum total of wifzp, and OPT can do no worse. El From Theorem 1, we know that SJF is (l + e)-speed 0(1/e)-competitive for Lp norms of unweighted flow time on multiple machines. Thus, Fact 5. Fp(SFJ,I',1 + e) = 0 (E) OPT(FP,I', 1) Now we relate HDF on I with (1 + 6) speed to SJF with 1 speed. Lemma 10. For every job J,- E 1' and time t, if J2: is alive under HDF with (1 + 6)- speed, at least ”IKE jobs in X,- E I’ are alive att under SJF with 1 speed. Proof. Suppose t is the earliest time when Jz- is alive under HDF and there are less than {£156 jobs from X,- alive under SJ F. Since J,- is alive under HDF with (1 +e)-speed processors, HDF has spent less than 1%; time on J15, whereas SJF has spent more than that amount of time on jobs in Xi. So there must exist some time t’ E [r23 t) when HDF was not running J,- on any machine and SJ F was processing at least one job in X,- on some machine. Therefore, there exist m jobs Jj1"'ij with strictly higher density than Jz- being processed at t’. Thus, the X 1'1“ij jobs corresponding to JJ- would all have smaller sizes than those in Xi. Since SJF works on X,- at time t’, it must have finished all the jobs in X 31 ...X jm by t', and since le ...J are alive under HDF at t’, this contradicts the jm assumption of the minimality of t. [I] 22 Corollary 2. HDF is a (1 + e)-speed 0(1/€2)-competitive algorithm for minimizing the weighted Lp norms of flow time on m identical machines. Proof. We get WFP(HDF,I,1+6) S (1+1/e)pr(SJF,I’, 1) from Lemma 10. From Fact 5, we see that WFP(HDF,I, 1+ 6) = O (212)p0PT(WFp,I,1), and the result follows. El 2.3 Conclusions Our results indicate that SJF, SRPT and HDF are reasonable algorithms for mul- tiprocessor server scheduling applications given that they are (1 + e)-speed 0(1)- competitive algorithms for the scheduling problems considered. Furthermore, com- bining our results with those from [8] provides some insight into the power of migration for the unweighted Lp norms of flow time and stretch. However, several Open ques- tions remain. For example, can an algorithm that does not use migrations (and with immediate dispatch) be effective for the weighted Lp norms of flow time? Also, can new proof techniques prove a tighter bound on the competitive ratio for HDF? In particular, is HDF (1 + e)-speed O(1/e)-competitive for minimizing the weighted Lp norms of flow time? 23 Chapter 3 Single machine without resource augmentation 3. 1 Introduction The problem of Optimizing weighted flow time is in many ways related to the problem of Optimizing unweighted flow time. In the latter problem, SRPT and SJ F are similar algorithms, but SRPT is Optimal for minizing unweighted flow on a single machine and SJF is not. Our goal was to prove that a natural analogue of SRPT, namely HRDF, or Highest Remaining Density First, is superior to a natural analogue of SI F, namely, HDF, or Highest Density First for minimizing weighted flow on a single machine. As a reminder, HRDF is a greedy algorithm that always runs the job with the highest remaining density (weight divided by remaining processing time) that has been released at or before the current time step. Ties among jobs are broken arbi- trarily. In contrast, HDF does not consider the remaining time Of a job, and always runs the job with the highest density. We define W as the ratio of the highest weight among all jobs to the lowest weight among all jobs; likewise we define P as the ratio of lengths between the longest job and the shortest job. It is known that HDF is f2(P)—competitive and HRDF is Q(\/P)- competitive for minimizing weighted flow on a single machine [9], so we sought to prove that HRDF is actually O(\/P)-competitive, but found that HRDF is actually Q(P)-competitive instead. 3.2 Overview In broad terms, HRDF can be forced to repeatedly make bad decisions about which job to run when given a choice. This proof has similarities to the proof that SRPT 24 can be arbitrarily bad for flow time on multiple machines without augmentation. The idea will be to release 2 jobs with nearly-identical densities, one “short” job and one “long” job with twice the length and a slightly higher density than the short one. HRDF will prioritize the long job, but OPT can finish the short job and be ready to start another job when HRDF is only half done with the long job. This new job can have a density that is nearly double the initial density of the initial “long” job and HRDF will still keep running the initial job. As soon as OPT finishes this new job, another job can be released with even higher density, and so on. If all the jobs except the initial “short” job are the same length, HRDF will always be about half a job behind OPT. Eventually, a stream of extremely dense jobs can be released when OPT finishes a high-density job, leaving HRDF to carry a very high weight job while OPT only carries the initial low weight long job. 3.3 Proof At time 0, release two jobs, jg with length P/ 2 and weight 1 and jl with length P and weight 2 + e, where e is an arbitrarily small positive constant. Since HRDF will work on jl, at time P/ 2, jl will have a remaining density of (4 + 25) / P. At this point then, another job (jg) of size P and weight 4 + 6 (density (4 + e) / P) can be released. OPT, which ran jo to start, is immediately ready to run jg. HRDF continues the higher remaining density job, j1, finishing at time P and begins jg. At time 3P/ 2, OPT finishes jg, so we can release another job. Again we make this job denser than before, since HRDF is half done with jg, this new job may be 8 + e/ P dense. This pattern can repeat as long as there are higher-weight jobs available, i.e., until a job jy with weight W = 2y + e is released. The release schedule for the first phase is summarized in Table 3.1 on the next page. At time t = PY — P/ 2, OPT will only have job jl with weight 2 unfinished (we drop the 6 terms here as insignificant). Meanwhile, HRDF will have jg with weight 25 job time size weight jo 0 P/2 1 jl 0 P 2 + E jg P/2 P 4 + 6 j,- P/2 + P(i — 2) P 22' + e jy P/2+P(Y—2) P W=2Y+e Table 3.1: Summary of release schedule for phase 1 1 and job jy with weight W unfinished. Job jo will be unstarted while jy will be half-finished. Starting at time t, a stream Of very dense jobs with weight 2W/ P + e and length 1 are released at unit time steps. Each job is immediately run by both algorithms. This stream continues for X time units, where X is a positive constant. With an appropriate choice of Y and X, it is easy to see that the dominant term for both algorithms will be their flow time from t to t+ X. During this period, HRDF always has W weight in unfinished jobs, while OPT always has 2W/ P weight in the same. Thus it follows that HRDF is Q(P)-competitive for weighted flow. 3.4 Conclusions Since we have shown HRDF to be at best Q(P)-competitive, a ratio matched by a simple greedy algorithm that always runs the heaviest job without regard to size [9], we begin to see that it is significantly harder to come up with a simple greedy algo- rithm when there are multiple criteria influencing the evaluation metric. Finding a good upper bound for HRDF, possibly with resource augmentation, remains an Open problem. As SJF and SRPT are both 0 (%)-competitive for unweighted flow time on a single machine, and HDF matches that for weighted flow time on one machine, might HRDF also be proved to be 0 (%)-competitive on a single machine? Without augmentation, it is easy to show that SRPT outperforms SJF, but is the same true for HRDF with respect to HDF? 26 Bibliography [1] Nir Avrahami and Yossi Azar. Minimizing total flow time and total completion time with immediate dispatching. In Proc. 15th Symp. on Parallel Algorithms and Architectures (SPAA), pages 11-18. ACM, 2003. [2] Baruch Awerbuch, Yossi Azar, Stefano Leonardi, and Oded Regev. Minimizing the flow time without migration. In Proc. 313t Symp. Theory of Computing {STOC}, pages 198—205. ACM, 1999. [3] Nikhil Bansal and K. Dhamdhere. Minimizing weighted flow time. In Proc. 14th Symp. on Discrete Algorithms (SODA), pages 508—516. ACM/ SIAM, 2003. [4] Nikhil Bansal and Kirk Pruhs. Server scheduling in the LP norm: A rising tide lifts all boats. In Proc. 35th Symp. Theory of Computing ( S TOC), pages 242—250. ACM, 2003. [5] Nikhil Bansal and Kirk Pruhs. Server scheduling in the weighted 8p norm. In Proc. 6th Theoretical Informatics, Latin American Symposium (LATIN), pages 434-443, 2004. [6] L. Becchetti, S. Leonardi, and S. Muthukrishnan. Scheduling to minimize average stretch without migration. In Proc. 11th Symp. on Discrete Algorithms (SODA), pages 548—557. ACM / SIAM, 2000. [7] Luca Becchetti, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Kirk Pruhs. Online weighted flow time and deadline scheduling. In RANDOM- APPROX, volume 2129 of Lecture Notes in Computer Science, pages 36—47. Springer, 2001. [8] Chandra Chekuri, Ashish Goel, Sanjeev Khanna, and Amit Kumar. Multi- processor scheduling to minimize flow time with 6 resource augmentation. In Proc. 36th Symp. Theory of Computing (STOC), pages 363-372. ACM, 2004. [9] Chandra Chekuri, Sanjeev Khanna, and An Zhu. Algorithms for minimizing weighted flow time. In Proc. 33rd Symp. Theory of Computing (STOC), pages 84—93. ACM, 2001. [10] Bala Kalyanasundaram and Kirk Pruhs. Speed is as powerful as clairvoyance. Journal of the ACM, 47:214—221, 2000. [11] Stefano Leonardi and Danny Raz. Approximating total flow time on parallel machines. In Proc. 29th Symp. Theory of Computing (STOC), pages 110—119. ACM, 1997. 27 [12] Jason McCullough and Eric Torng. SRPT Optimally uses faster machines to minimize flow time. In Proc. 15th Symp. on Discrete Algorithms (SODA), pages 343-351. ACM/ SIAM, 2004. [13] S. Muthukrishnan, R. Rajaraman, A. Shaheen, and J. E. Gehrke. Online schedul- ing to minimize avarage strech. In Proc. 40th Symp. Foundations of Computer Science (FOCS), pages 433-443. IEEE, 1999. [14] Cynthia Phillips, Cliff Stein, Eric Torng, and Joel Wein. Optimal time—critical scheduling via resource augmentation. Algorithmica, pages 163—200, 2002. 28 .r, .., .... ... .. . .. 1....1. . .,. . .. ...y . . . . ..... ... ... .. . . . .o . . . . .. . 4 . . . I .. H l 5 .. . . . . .. . . 3 . . 1 . . . . .1” 2 . . . . . . .