— _— ‘— = —_ — — — — '— — — — ‘— — — lIBRARY Michigan State University This is to certify that the thesis entitled ”we/L {76‘5de leg/IM/r'; Ana? 5')" presented by I? ”I“ has been accepted towards fulfillment of the requirements for Megan in 0,ij M64, *- SéMWCSV/(L: . 7%]: K Major professor Date 2/§./gg ’ 0-7639 MS U is an Affirmative Action/Equal Opportunity Institution MSU LIBRARIES —:_. RETURNING MATERIALS: Place in book drop to remove this checkout from your record. FINES will be charged if book is returned after the date stamped below. NETWORK SYSTEM RELIABILITY ANALYSIS By FENG HSU A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Statistics and Probability December, 16 1987 ABSTRACT NETWORK SYSTEM RELIABILITY ANALYSIS By Feng Hsu The reliable performance of a network system for a mission under various conditions is of utmost importance in many industrial, military, and everyday life situations. Main effort of this thesis research have been devoted to develop algorithms and optimization methods for network reliability problems. In chapter 1, two algorithms so-called MDT and DBM methods are presented respectively for qualitative and quantitative evaluation of large networks and the relevant theorems are proved. The three sections in chapter 2 are spent on the optimility problems of network reliability, in which a useful optimal redundancy allocation method is presented and in addition, two useful mesures called the maintenance importance ‘ ( MIs,t(Xi) ) and the diagnosis importance ( DIs,t(Xi) ) are presented for deriving optimal policies used in network reliability maintenance and failure diagnosis. Examples and comments are included correspondingly to each of the topics in the thesis. To my parents, Mr. Ziwei Xu, and Mrs. Shensu Xie, and a special gift on the 60'th birthday of my father ACKNOWLEDGMENT I would like to express my heartly thanks to my major advisor, professor Martin Fox for his greatly appreciated guidance, suggestions, comments, carefully readings and crrections, which has made it possible for me to complete this research project. Thanks are also expressed to my graduate committee members, professor R. Schlueter and professor Soumen Ghosh for thier very helpful comments and suggestions. ii Abstract Chapter Section 1.1 1.2 Section 2.1 2.2 Chapter Section 1.1 1.2 Section 2.1 2.2 2.3 2.4 REFERENCES CONTENTS Network reliability analysis Qualitative analysis through fault-tree and MDT method Theorems and definitions Algorithms and examples Quantitative analysis by DBM method DBM method and its application Algorithms and examples Some optimal policies for network system reliability Redundancy policy for series-paralell system under resource constraints using L-M method Problem formulation and solution Example Some optimal policies used for reliability maintenance and failure diagnosis without cost constraint Maintenance importance Diagnosing importance Example Comments and More Examples on Failure Diagnosis Policy 10 1A 17 22 27 27 28 3O 33 33 35 36 Al 45 Chapter 1 Network Reliability Analysis Section 1 Qualitative Analysis of Network Reliability Through Fault-tree and MDT Method In this section an efficient approach is presented for qualitative analysis of network reliability through PTA (Fault- tree Technique) and finding MCS (Minimal Cut Sets) by the MDT (Modified Dual-graph Transformation) method. It is generally accepted that for the purpose of qualitative analysis for networks (or any complex system) one must establish its Fault- tree by first finding its MCS. Therefore the enumeration of all MCS seperating a specified node pair is a fundamental step in reliability evaluation of network systems which are frequently encountered in communications, electronics, computers and power systems, etc. It is well know in graph theory [1-2] that the algorithms existing for enumerating MCS are not satisfactory so far, while the algorithms for finding MPS (Minimal Path Sets) are much more time efficient. Hence, based on the principle of graph theory, a so called MDT method is presented below for enumerating the MCS of networks by employing any of the existing efficient MPS algorithms [2, 3-5]. P0 1.1 Theorems and Definitions Definition 1 If, for any two vertices in a graph G, there is at least one path between them, then G is said to be a connected graph. Definition 2 A network is a connected graph, which has a finite number of vertices and a finite number of edges. Let C(V, E) be the notation of a network, where V and E are the sets of vertices and edges in the network, respectively. Definition 3 Consider a network G(V,E) drawn in the plane in such a way that each vertex vi 6 V is represented by a point, each edge is represented by a continous curve connecting the two points which represent its end vertices and no two curves, which represent edges, share any points except in their ends. Such a drawing is called a Plane Network. Throughout the reminder of this chapter, we assume that all networks discussed are plane networks. Definition A Definition 5 Definition 6 Definition 7 Let C(V,E) be a connected plane graph, K be an edge set. Then K C E is called a cutset if it is a minimal seperating set of edges, ie, the removal of K from G interrupts its connectivity, but no proper subset of K has this property. In other words, a cutset seperates G into two connected components. The network C(v, E) is said to be the Dual of a connected grapn C(V, E), if there is a l - l correspondence f: E a E, such that a set of edges T forms a simple circuit in G if and only if f(T) ( the corresponding set of edge in C ) froms a cut set in C. A point V1 is a Boundary Point if there exists a curve connecting vi to the exterior of the convex hull of the network which does not intersect the network. An edge of a network G is a Boundary Edge if each point on the edge is a boundary point. Definition 8 A cycle C of a network G is a Boundary Cycle if at least one boundary edge exists in C. Theorem 1 Let G be a network and x be a boundary vertex of G and assume that 3 at least one cycle through x. Then, 3 at least one boundary cycle through x. Proof Suppose the theorem is false. Take an arbitary .cycle CO through x and let e be an edge in O CO. Then 3 a point yO 6 e0 which is not a boundary point. Every curve connecting yO to the exterior of the convex hall of the network intersects the network. Then, 3 a cycle C1 ¢ Co through x which includes e0 and a subset of this set of intersections. This construction can be repeated, each time creating a new cycle by taking en an edge of Cn and 01’ C2, ...Cn distinct. Thus, 3 an infinite sequence of cycles leading to a contradiction. Definition 9 PI 01 Let C(V, E) be a plane network and let C(V, E) be a dual of G. If s and t are two boundary vertices of G, (s, t E V) then, by drawing two half curves from s and t respectively, the exterior of the network G is divided into two parts. Let s' and t' be points,, one on each side. of the two parts of the exterior of G given above, and let C(V, E) be a dual of G with s' 6 v . Form a network 6*(V*, E*) as as follows: 3% ._ V = V U {t'} Find a boundary cycle through 5' with a e' ~ . boundary edge x - s', e' E E and replace such e' e' x - s' by x - t', so that the new graph is a .plane graph. Repeat 2 for as many cycles of C as possible without destroying any path from s' to t' which * already existed. The new network C is called a Modified Dual Network of G. a . Figure 2 G is a modified dual graph of C (Figure 1): S a" i *9 $\ 1, 5 . , . t 0£ 7) o; 9(- 'f, i L 18 Ix; 2‘; - * C Figure l C Figure 2 * Construction of the above modified network G from G is illustrated in Figure 3. r-- "-‘ (c) 3rd stage Figure 3 In the following theorem a notation s will be used to represent the equivalency between any two edge sets. Theorem 2. Let S be the set of all MCS seperating s and t in the network G. Let 6* be a modified dual network of G and P' be the set of all MP3 from s' to t’ in C*. Then we have S a P'. Proof . Let S be the set of all MCS's and T be the set of all cycles in a usual dual network C of C. Thus, each Sk E S is equivalent to some rk 6 T , That is, Sk a rk e T , where rk passes through at least one vertex v' with v' E V and v' in the interior of C. Morever, S and t are also seperated by r see k' (Figure 4). If we suppose that M is the set of all such rk 6 T , we then have: M s S (1) Let P’k to be any minimal path of 6* which goes from s’ to t'. If s' and t' are merged into one point say 3', then any path from s' to t' will become a cycle of C through 5' and all edges containing 5' are edges contained in each MPS * of C . That is P'k is equivalent to a cycle rk of G. C) Thus, P k E rk E T (2) Since rk a Sk E S, it follows that a P k 6 M For V P'k E P', it follows that P' E M (3) 0 On the contrary, if each r, e M and s and 1 )‘c . t' are the source and sink of G , respectively, then we can break each cycle rk such that both 3' and t' are to be connected by edges of each r without disturbing the 1 - l correspondence k of edges between E and E' as given in Definition 5, and meanwhile, either 5 or t of G is no longer inside any cycle rk of G Hence, r is a minimal path of Gx by which k s and t is separated. Hence, rk E M I I = I and P k E P = rk ~ P k Q P = P'k e M Q P' So, M Q P' (4) is proved to be true. the theorem then follows from (1). (3), (4) # ‘0 'X 'x . —>- 5 X: t 7: 1" (Figure 4) Definition 10 A Fault-tree is a system logical diagram composed of logical operational symbols which explicitly shows the logical interactions between each basic components of the system, and gives all the possible system failure information that may be existed in the system. [6-10] By applying the above Theorems, the problem of finding a given network's MCS simply becomes a problem of enumerating the minimal path of its modified dual graph. Therefore, any of the efficient path-finding algorithms can be used to find the minimal cut sets of a given network and, then, the Fault-tree of such a network can be easily obtained by simply plugging all the MCS into a tree with each of its leaves represent a possible failure event, which equivalently corresonding to a MCS. (see Figure 5) lo 1.2 Algorithm and Examples Algorithm * (1) Obtain the modified dual network G of G. (2) Enumerate all the path sets P' between s' and t' .in G}. and then get the MCS S of C by theorem 2, ie, S = P’. (3) Drawing the fault-tree of the network by applying all the MCS obtained in step (2). The construction of fault tree stated in step (3) can be very time consuming if a large complex network is encountered. In recent years much effort has been spent to develop algorithms and techniques, for computer-aided automatic synthesis of fault trees of large networks or any complex systems, and several algorithms have been presented [11-13]. However, for a network with small number of components, the corresponding fault tree can be easily drawn. That is, as is shown in Figure 5 we can first start out from the source (vertex 5') and let 5' be the 'root' of the tree, and then draw a tree rooted at s' by representing all the edges in each of the MCS as an individual path from s' to a 'leaf' (ie, vertex t') respectively. ll For example, the MCS and the fault-tree (Figure 5) of network in Figure 2 is obtained by using above algorithm: L \ %- (7t7t35) }; ,, X; ’ (Xx. web) ('3': 3(5 X; XI) M 03.0)19) 7‘1 (Xaflfii’l‘e’h) (XI. Xt, Yr) ' (mum (Figure 5) The above fault-tree is a simple version which uses node and 'branches' instead of using logical OR , AND symbols respectively. A formal fault-tree of the same network is given below. see (Figure 5'): 12 O . L/ g».— ‘ Figure 5' The above fault tree (Figure 5') is logically equivalent to the simple version (Figure 5), the only difference is that Figure 5' uses the logical AND, OR symbols (see Figure 5") to represent the relationship between components instead of using nodes and 'branches' in Figure 5. Since the formal fault tree is easier to construct by using a computer, it is more widely used than the simple version. (he—s..- (AnB-=C) (AUB-=C) AND gate OR gate Figure 5" 14 Section 2 Quantitative Analysis by DBM Method A meaningful masure of the reliability and availability of a network system is the terminal reliability between a given pair of vertices, defined as the probability that there exists at least one path between these two nodes. This parameter depends on .both the network topology and the reliability of all elements composing the network and is relevant inot only for network analysis but also for synthesis of reliable networks.[14] In recent years much effort has been spent to develop algorithms [15-17] for computating the network reliability. However the difficulty is that the complexity of the algorithms remains exponential with the increase of number of elements, therefore the need remains to widen the class of tractable networks. It is well known that for the quantitative evaluation of system failure probability (reliability), one must transfer the structure function (a Boolean S-O-P form) into the disjoint-S-O- P (sum of product) form. J. M Cargal has shown [18] that the crucial problem of determining the network reliability is to assess the desired probability with much less effort than is currently extended and higher algebra is most likely the avenue for such a task. To attain the same object by trying the DBM - a 15 Disjoint Boolean Manipulation method in this section, we have the very helpful approach of calculating the network reliability. The substance of the DBM method is the minimization principle of a Boolean function in disjoint sense. In other words, the advantage of using the DBM method is that it minimizes the number of Boolean terms of sytem's logical expression while eliminating any logical redundencies. The ordinary minimization principle of Boolean functions is to minimize the total number of symbols of intersection and union manipulations. Using DBM we can not only simplyfy the computation of network reliability but also make the design of switching circuits much efficient and easier. For example, by the ordinary Boolean principle A U B is minimal for an OR operations with two inputs, but A U A'B or B U B'A is much more convenient for probability computations. The difference between these principles seems to be very little when n is small, but it may become very large eventually when n is large. In fact the expression of system reliability given by the sum of events X + X + ... + Xn (these X l 2 1 MP8 in the network or any compex system that can be constructed can express a MCS or logically by Fault-tree technique) may contain 2n_1 terms, but that of disj01nt sum event X1 + X1 X2 + ... + X 1...X n-lxn contains only n terms. Rosenthal has shown [19] that computing the probability of a desired event (system success or failure) for an arbitry system structure function, like computing the reliability of a general network, is NP difficult, ie, the amount of computation increases exponentially with the number of components in the network. 2.1 DBM Method and Its Application (1) Nomenclature UB(X)=UB(X1,...Xn) denotes a usual Boolean expression, an event-expression for system success or failure that contains only Boolean variables and operations. DB(X)»DB(X1,...Xn) denotes a disjoint Boolean expression that can be used directly for computing the system reliability. X denotes an event. denotes the indicator of Xi. (2) Assumptions: 1. A network system S which consists of n-independent elements could be expressed as: S = {el,e2,...en ). We assume that each of these elements has two states: l7 functioning and failure. The state of ei could be characterized by the binary variable xi as following: 0 (failure) 1 (functioning) Thus, x1 is the indicator of the event that the ith component functions. Definition 11 The system structure function denoted by 9(x1, x2,...xn) is a Boolean function which is the indicator of the event of system functioning. Definition 12 a system failure function denoted by W(x1, x2,...xn) is a Boolean function which is the indicator of the event of system failure. Let xri be the indicator of the event that the ith component of the rth MP5 is functioning. The x are x .,x and, for ri n 1... r # s, it is possible to have x . a x for some i and j. r1 sj If we enumerate the MCS and MP3 of S, the structure function and the failure function will be written, respectively, as: ¢(x) - ¢(xl,x2,....xn) = Z eri xri e rth MPS r1 r(x) = w(xl,x2 ..... xn) = E gxqj xqj e qth MCS Let P = (p1,p2,...pn) denote the reliability vector of components 18 e1,e2,...en. Because of the assumption that the components are independent of one another, the domain of g and of W are extended by setting: 9(P) = El¢1 W(P) = E[W(X)] Thus, ¢(P) - ¢(pl,p2,....pn) = X Ilpri is the'probability of the r i system functioning. W(P) - \I!(pl,p2 ..... pn) = X Hqu is the probability of the qj system failing. Where, fiCP) = 1 - 0(P) since ¢(x) = 1 - W(x). 2 In graphs only arcs may be faulty. Nodes have 0 failure probability. The reliability of each arc is given. 3. Component (arc) failures are independent. 4. Given a network and two of its particular nodes, 3 and t, all simple paths from (or all minimal cut sets seperating) s and t are known or can be found. (3) 19 Operations (UBM): Logical interaction 1. AND operation 2. OR operation 3. NAND operation 4. NOR operation Using Venn diagrams OBM rules AB A+B (AB)'= A'+B’ (A+B)'= A'B' the OBM and interpreted as following: ...-M-..— - -—.—...--—_— . _ ~—-—~_--— — -— ..——.--__ __ ...—...c‘ . ~, - - -..—...— ..--—--. ...—- -——..--._--.~..q L c- a--. —— -... q“...- —- ...—H—— -_. .-—-- ‘._—_. A+B by UDM B -- uu_¢..~ 0...... o -w-... ...-.7 (Figure 6) Rules & comparision between DBM and usual DBM rules AB 'A+A'B or B+B'A (Fig 6) (AB)’- A'+AB' or (AB)'= B'+BA' (Fig 7) (A+B)'= (A+A'B)'= A'B’ or (A+B)'= (Bl-B'A)'=== A'B' UBM can ‘be rules A+B by DBM (Sa+S ”Sb shows the reTation of area) ’7) FJ (AB)' by UDM (AB)' by DBM (Figure 7) According to the addition and multiplication theorems of probability, it is easy to see that: The rules of the two types are equivalent completely for the calculation of probability. Althrough the expressions of DB algebra are non- unique, they are equivalent to each other. Using DBM to decompose the logical structure of system, the disjoint Boolean function can be obtained directly from the logical structure of the network (or any systems). By extending, absorbing and summarizing the structure functions using the logical identities such as: A + AB - A AB + AB = AB AA - A AB + AB'= A AA'= 0 the disjoint S-O-P expression of the system can be eventually obtained. 2.2 Algorithm and Examples (4) Examples: A given network is shown in Figure 8 and its fault-tree shown in Figure 9: (Figure 8) (Figure 9) 1. Using Fratta's algorithm: [20] OB(X) a ABB + ABE + ABCD + ABDF + AHB + AHE + AHDC + AHDF - AB + AHE + AHDC + AHDF. Simplify the above structure function into the disjoint-S-O-P form by the following steps: a. (AB)'(AHE + AHDC + AHDF) a (A'+ B')(AHE + AHDC + AHDF) = B'AHE + B'AHDC + B'AHDF 23 b. (B'AHE)'(B'AHDC + B'AHDF) = (B + A'+ H'+ E')(B’AHDC + B'AHDF) : E'B'AHDC + E’B'AHDF c. (E'B'AHDC)'(E'B'AHDF) = (E + B + A' + H' + D' +C')E'B'AHDF = C'E'B'AHDF Finally, the disjoint S-O-P form is: UB(X) = AB + B'AHE + E’B'AHDC + C'E'B'AHDF (l) 2. Using the DBM method: The disjoint S-O-P form can be obtainted directly (in one step) from the system structure diagram (in either network or Fault-tree form): DB(X) = A(B+B'H)[B+B'E + B'E'D(C+C'F)] = ABB + ABB'E + ABB'E'DC + ABB'E'DC'F + AB'HB + AB'HB'E + AB'HB'E'DC + AB'HB'E'DC'F = AB + AB'ME + AB'HE'DC + AB'HE'DC'F (2) Obviously, (l) a (2) It is easy to see that in the DBM approch the amount of computation is reduced significantly, and the computer implementation of calculating the reliabilities between any given pair of boundary vertices of a network,_can be more easily performed in a shorter & unique program. Suppose the element reliabilities of the given network are as follows: pa=0.9 pbnO.7 pch.8 pdw0.97 Faro-95 ph=0.7 pf=0.8 Then the system reliability of the network is: RS = O.9*O.7 + O.9*O.3*O.7*O.95 + 0.9*O.3*O.7*0.05*O.97*O.8 +0.9*0.3*0.7*0.05*O.97*O.2*O.8 - 0.818 ' # Combining section 1 and section 2 we can form an algorithm for reliability evaluation (quantitatively & quantitatively) of the network systems: Algorithm * 1. Apply the MDT algorithm to given network and get G of C and all the MCS in the system. 2. Construct the system structre function ¢(x) using all the MCS information obtained in step 2 (or by applying the DBM method). 3. Reduce ¢(x) first by general boolean identities ie: I") (fl X+XY=X XY+XYuXY XY+XY'=X XX=X XX'=O . Apply the DBM rules directly to the simpfied ¢(x) from 3. Any more redundent terms? if yes, go to 4. if no go to 6. Input all the failure probabilities of every component Xi' Calculate the reliability directly for the whole system and then stop. Chapter 2 Some Optimal Policies for System Reliability Section 1 Redundency Policy for Series-paralell System Under Resource Constraints Using L-M Method This section is devoted to discuss the problem of allocating redundent components subject to 'resource constraints so as to optimize some measure of system performance. Since the 19503, many models and solution procedures have been developed for various of these problems. The solution procedures can be divided into two categries: heuristic and exact methods. While useful heuristics are easy to implement and have modest computational requirements, they do not guarantee optimal solutions. All known methods which guarantee optimal solutions are enumerative, such as integer and dynamic programming. These exact methods have computational requirements that grow exponentially with the size of conponents. Tillman & Frank [21] provide an excellent survey of research on the problem. The optimization problem presented here is a rather simple and useful method in practice, which gives approximately the optimal solution for minimizing the whole system cost under the constraint that the whole system must meet the designed reliability requirement. This approach rather than just 27 maximazing the system reliability subject to element constraints, is most commonly used [22—24]. 1.1 Problem Formulation and Solution Suppose there are n different stages (subsystems) coneeted serialy with each other, . where the ith (i=1 ..... n) subsystem consist of mi identical conponents connected paralelly with each other. Assume also the compoent failures are independent. Given that the failure probability is qi for any element in the ith subsystem and that its cost is Ci’ the problem is to find an optimal allocation of the elements for each‘ subsystem which will minimizing the total system cost under the constraint that the whole system must be at least as reliable as is required by the designer. Thus let R0, be the minimum reliability requirement the ' whole system must meet. That is: Evaluate the decision variables mi such that n Minimizes f(mi) = min E: Cimi (1) =1 n . Subject to R = |”“| (1 - q. )2 R (2) and m , m ,...m are inte ers. l 2 n g where Ci’ qi and R0 are given. Since, the higher the system reliability required, the more element redundencies need to be added on to each of the subsystems. so the solution (m1,m2,...mn) which minimizes the total cost must also meet the requirement that R 2 R0. However, a little increase in R0 will result in a big cost for the system, Therefore, to make the problem simpler we can get an approximation for the problem by making the following two assumptions: a. Use equality (m) in (2) rather than inequality (2) b. Take each ”i as a continous variable (vector) and the logrithm in both sides of (l) and round off solutions at the very end. c. Following assumptions (a) & (b) we can easily apply the Lagrange multiplier method to solve the problem. [‘3 LC) Let H(ml,m2 ...... mu, 7) n mi __ X clu + 7 (21nd - qi ) - 10%) 1=1 m. 1 6H -qi lnqi 5;. = Ci + 7 ( —————— a? ) = O (l = l, 2, n) i l l - qi mi Ci 9- (3) i Ci+7lnqi Ci Then, m1 = 1n ( C.+7lnq. ) / lnqi (A) i i From (2) & (3) and (A) we get: 7“ -{ R lhI(C +71nq )/|_|1nq } = 0 (5) 0 . i i . i 1 l we can get the solution 70 of (5) and then by (A) we geti ) / lnqi (6) 1.2 Example The system is given as shown in (Figure 10), it consists of three subsystems, and each of m different which is composed of ml, m2, 3 elements parallel. (components in a Solution: 30 a subsystem aSsumed all the same) i . z , . _.__._::::::}”__- li_{“‘_:::}... ..ir-’*-j___ (Fig 10) The failure probabilities of the three kinds elements are assumed to be 0.02, 0.01, 0.03 and their cost are $3000, $4000, $5000 respectively. The best policy is sought for allocating those elements so as to minimize the total cost under the condition that the system reliability must no less than 0.99. q1 = 0.02, q2 = 0.01, q3 = 0.03. CI = 3 a 4, c3 - 5 (unit) find (m1, m2. m3) (l) (2) 31 st. min{3ml + Amz + Sm3} _ mi subject to | | (1 - q. ) 2 0.99 i=1 By (5). 13 + 30372 - 2977 + 94 = O solving the equation yields 70 = ~99.806. Then, by (6), ml = 1.24), 1112 = 1.032, m3 = 1.215 Rounding up, we finally get: m1 = m2 = m3 a 2 system reliability is 2 2 2 [1-(0.02) ][l-(0.01) }[l-(0.03) ] = .9986 > .99 80 (ml, m3) = (2, 2, 2) is the optimal policy for m2, allocating the three different kinds of elements to the system, which yields a total cost of (3 + 4 + 5)*2 = 24 (1000 dollars) to the system. This can be verified as following that reduction any of the elements mi will violate the system reliability constraint : If m2 = 1, system reliability is [l-(O.02)2][l-(0.Ol)1][1-(0.03)2] = 0.988 < 0.99 If m3 = 1, system reliability is I) [l-(0.02)“][1-(0.01)2][1-(0.03)1] = 0.969 < 0.99 On the other hand, suppose we want the total cost be $22 which is less than $24. let m1 = 3, m2 = 2 m3 = 1, 32 Then the system reliability becomes [l—(0.02)3][l-(0.01)2][l-(0.03)1] a 0.9699 < 0.99 which violates the R0 constraint. This has verified that the privous solution for the problem is truly a optimal solution. Section 2 Some Optimal Polices Used for Reliability Maintenance And Failure Diagnosing Without Cost Constraint Based on the results from chapter 1, once we have the disjoint S~0-P form of system failure function (or success function), we can also develop some useful mesures which can be used in determining the optimal polices for system reliabiliy maintenance and repairing diagnosis. In this section two measures that may be useful in network reliability maintainence and failure diagnosis will be discussed [25-26]. 2.1 Maintenance Importance: { MIS t(xi) } For a functioning network system, the primary interest for the system operator and the maintenance engineer is to know which one of the components among the system is more important to the performance of the whole system. In other words, we want to know the optimal policy which gives a best time scheduling (or priorities) for each of the components in a routine maintenance project which will substantially reduce the risk of system break-downs. However, without considering the cost, the main effort is to determine the dependencies of the network reliability on the reliability of a individul element. Therefore it is quite natural that the maintenance importance of the components should be defined as follows: HIS t(Xi) = Pr{ system failed I component Xi failed ) This is the conditional probability of system failure (no path from s to t exists anymore) given that component Xi has failed. As already described in chapter 1 (section 2), HIS t(Xi) is the expectation of the failure function given that xi=0, ie, ~ Y a n = M15 t ii) E[&(x) | xi 0] (2) Similarly, 0(x) = 1 - W(x) so that, E[o(x) | xi=0] = 1 - HIS t(Xi) (3) 2. 2 34 Diagnosing Importance: { DI (Xi) ). For repairing a failed network system, one must first find out which one of the elements is most likely to be the failed one which has caused the break-down of 'the whole system. In other words, in order to fix the system in a lowest cost and shortest time, one must know the optimal priority ordering for cheking those components. Therefore, a so-called Diagnosing Importance DIS t<1 - p > / Mp) s,t a = 0. 549 DIS t(B) = M15,t(B)(l - pb) / W(p) = 0.372(1-0.7)/0.182 = 0.613 DIs,t(C) = MIs,t(C)(1 - pc) / w(p) r 0.183(1-0.8)/0.182 = 0.201 Similarly, DIs,t(D) = MIs,t(D)(1 - pd) / W(p) = 0.0314 . g - T = / DIs,t(E) MIs,t(E)(1 pe) / V(p) 0.0984 DIs,t(H) = MIs,t(H)(l - ph) / 0(p) = 0.609 = . - r = DIs,t(F) MIS,t(F)(l pf) / W\p) 0.201 From the above results: (a) If the system is functioning, the maintenance importances for the components are listed as follows: 37 (ie, list of MIS t(Xi)from largest to smallest) D lst : component A 2nd : component B 3rd : component H 4th : component E 5th : component D 6th : component C or F Obviously, for the functioning system the optimal maintenance scheduling for each of the components should follow the priorities as below: lst maxinumH-‘II 01.)} ie, component A . s,t l 1 After finishing examination and repair of component A the next one will be: maxi mum ( M I Xi)’ given A is ok} which can be i s,t( evaluated as maximum(W(xi=0, xa=l)} = 0.3025 i ie, component B After finishing component B, and so on we finally get 3rd component E 4th component H 5th component D 6th component C or F It can be noticed that this actual optimal maintenance scheduling for the given system is just slightly different from the list of maintenance importances obtained before. If the system has failed, the diagnosis importances for each .of the components are listed below: (ie, list of DIq t(Xi) from largest to smallest) lst : coponent B 2nd : component H 3rd : component A 4th : component C or F 5th : component E 6th : component D From above list, we can decide that component B ( with largest DIS (Xi) ) . should be the one to check first. However, the optimal diagnosis policy for checking out the failed component is more complicated than that of the maintenance cases described above, and which will depend on different conditions of the system. Suppose that on checking component B it was found that it did not fail. Then the next one (or the only one for this particular example) to check is: maximum{ DI i s,t (Xi’ given B is ok) } which is obviously component A . That is, for this particular example if system has failed and first found out B did not failed, then we can immediately decide that component A must be the failed one which has causted the failure of the system. However, for diagnosing the failure of a general network, if we first find out component Xj did not failed, then the next component needs to check will be: ‘maximum{ DIs,t(xi’ given kj is ok) } (1 # j) max:mum{ DIs,t(Xi I xj=l) } maximum { Pr(Xi failed I system failed, Xj 0k) } i Let in denote the event that component Xi failed F denote the event that system failed Then, DI (x. I x.=1) s,t i j Pr( FX | F , xj=l } i s Pr( F ., F , x. = l } x1 s jg Pr( F , x. S J l } fl Pr{ F3 | F ., x. l } P{ F ., x. = 1 } x1 3 xi 3 Pr{ F5 , xj = 1 } Pr( F8 | in, xj = 1 } Pr( in) Pr{ F5 | xj a 1 } M1s t( xi | xj = 1 ) ( 1 - pi) 7 W . = l ( X3 ) W( Xi = 0, xi = l ) ( 1 - pi ) 0(x. = l) J 40 Thus, by repeatedly using formula (5) we can always find a priority list for the purpose of system failure diagnosis. That is: maximum{ DIS i a. L. (Xi’ given Xj is ok)‘} (i # j) W(xi-0,xj=1)(1-Pi) = maximum( } (i ¢ j). i @(x. e l) J 2.4 Comments and More Examples on Failure Diagnosis Policy Since the profit related optimal diagnosis policy for a certain system depends on the actual conditions of the system, it is not easy to determine an optimal diagnosis strategy which yields a minimal expected total cost. The following two examples are employed to show that no simple optimal diagnosis strategy exists for general network systems. Example 1 Consider a two components (A1 and A2) series system given below ( Figure 11): ,4“ 5’, A, t’. W'm Figure 11 Here, p., denote the reliability and diagnosing cost of i ci component Ai respectively. Let F denote the event that 41 system failed and Ai denote the event that component Ai failed. Then, we have ql P(Al I F) = q1 + plq2 q P(A2 I F) n 2 (11 + plq2 Suppose that the system has failed and let the possible diagnosis policies be: Policy 1: check component A1 first Policy 2: check component A2 first Let CP1 and CPQ denote the expected total costs involved in policy 1 and policy 2 respectively. Then we have: c 0 2‘2 CPl _ c1 + + q1 p1‘12 C q l l CP2 — c2 + + q1 p1q2 C q ‘ C q 2 2 l l and, CP1 - CP2 = C1 - c2 + ql +p1Q2 = (clplq2 - C2pqu) / [q1 + plqzl. C1 p2q1 Obviously, if -—+ < , then CP1 < CP2 and C2 pqu policy 1 is better. Otherwise, policy 2 is better. Example 2 For the three component system given below (Figure 12), 5 C.) I r—-———— A . C, I “J“? __lon-{"”“'“}~___fi_, plm_lMlmn Figure 12 q P(A | F) - 1 ; q1 + p19233 . q (q + p q ) Pq2q3 CPl w c1 + q1 + plq2q3 Policy 2: Check component B first. Then, C 9 ,, 1 l CE2 c2 + . q1 + p1q2q3 Policy 3: Check component C first. Then, C q l l r) = 013 C3 + (11 + plq2q3 Let Cm = min( c2, c3 ), and c q :7. C + ...-1;-.. m CPm = min( 0P2, CP3 ) . q1 + p1q2q3 Then, the numerator of CP1 — CPm is: C1p1q2q3 + (C2 + C3>q2q3 ' Cm(ql + p1q2q3)° If Cm = c2, the numerator becomes: C1p1q2q3 '-62q1(1 ‘ qzq3) + C3q2q3' Set c3 = kc2 ( k 2 1) Then, this quantity is: Clp1q2q3 ‘ C2[q1(l - q2q3) - kq2q3] = f(ci’pi’k)' If f < 0 than policy 1 is optimal. 44 Otherwise, policy 2 is optimal. These examples show that we cannot find a general easy algorithm for the failure diagnosis of general networks. [5} [6] REFERENCES Barlow R. E., Fussell J.B., Singpuwalla N.D. (ed) "Reliability and fault tree analysis", SIAM, 1975 Shimon Even, Graph Algorithms, Computer science press, Inc. Rockville, Maryland 20850,]985 K.K. Aggorwal, Suresh Rai, "Reliability evaluation in computer communication network", IEEE Trans. Reliability, R-30, 1981 Nakagawa. H. Baysian, "Decomposition method for computing the reliability of an orented network", IEEE Trans. Reliability, R-25, No.2, 1976 John A. Buzacott, Clement C. Feng, "An algorithm for symbolic .reliability computation with path-sets or cut-sets", IEEE Trans. Reliability, vol R-36, No.1, 1987 R. G. Bennetts, "On the analysis of fault trees", IEEE Trans. Reliability, vol R-24, No.3, 1975 Dean B. Wheeler, "Fault tree analysis using hit manipulation", [10] 0"! ’4 [13} [14] [15] [16] IEEE Trans. Reliability, vol R-26, No.2, 1977 Barlow R. E., Proschan F., Statistical Theory of Reliability and Life Testing, N.Y. Holt, Rinehart and Winston, 1975 Shi Dinghua, "A unified algorithm for computer-aided fault- tree analysis", Scientia Sinica (Series A) vol 27, No.1, 1984 Feng Hsu, Shi Dinghua, "Problems and development of fault tree analysis", Report on 4th National China Symposium of Mathema- tical Theory of Reliability, Shanghai, Nov. 1-6,1983, Acta Automatica Sinica, No.4, 1984 P. Camarda, F. Corsi, "An efficient simple algorithm for fault tree automatic synthesis from the reliability graph", IEEE Trans. Reliability, vol R-27, No.3, 1978 S.A. Lapp, C.J. Powers, "Computer aided synthesis of fault- trees", IEEE Trans. Reliability, vol R-26, 1977 Feng Hsu, Shi Dinghua,”Computer-aided fault tree construction" A selected Paper on 2nd China national conference on reliabi- lity, China science press, vol.2, Hay, 1985 H. Frank, I.T. Frisch, "Analysis and design of survivable networks", IEEE Trans. Communication Technol., vol COM-18, pp501-519, October, 1970 C. Singh, S. Asgarpoor, "Reliability evaluation of flow networks using delta-star transformations", IEEE Trans. Relia- bility, vol R-35, No.4, 1986 Luig, Fratta, Ugo G. Montanari, "A recursive method based on case analysis for computing network terminal reliability", IEEE Trans. Communication, vol COM-26, No.8, 1978 [17] [18] to H I." 127-} {231' [25] Liao Jiongsheng, "A new approach for fault tree analysis", Scientia Sinica (Series A), vol. 25, No.9, 1982 Cargal, J.M., "An alternative fault-tree algebra", IEEE Trans. Reliability, R-29, No.33, 1980 Rosenthal, A., "Reliability and fault tree analysis", SIAM, 1975, ppl35-152 Luigi Fratta, "A boolean algebra method for computing the terminal reliability in a communication network", IEEE Trans. Circuit Theory, vol CT-20, No.3, May, 1973 Frank A. Tillman, “Optimization techniques for system relia- bility with redundancy - a review", IEEE Trans. Reliability, vol R-26, No.3, 1977 Yuji Nakagawa, Kyoichi Nakashima, "A heuristic method for determining optimal reliability allocation”, IEEE Trans. Reliability, vol R-26, No.3, 1977 Robert L. Bulfin, "Optimal allocation of redundant components for large systems", IEEE Trans. Reliability, vol R-34, No.3, 1985 Yoshio Hattori, "Reliability optimization with multiple properties and integer variables", IEEE Trans. Reliability, vol R-28, No.1, 1979 Martin Fox, Dorian Feldman, NOTES ON PROBABILITY, Department of Statistics and Probability, Michigan State University, August, 1986 Frederick S. Hillier, Gerald J. Lieberman, Introduction to Operations Research, Stanford University, January,1980 "'TIIIIIIIIIIIITHIIIEIIHIIIIIIIEs