A MATHEMATICAL ANALYSIS OF THE EFFECTS OF COMPONENT FAILURES ON SYSTEM PERFORMANCE By Richard Charles Dubes AN ABSTRACT OF A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering 1962 ABSTRACT A MATHEMATICAL ANALYSIS OF THE EFFECTS OF COMPONENT FAILURES ON SYSTEM PERFORMANCE by Richard Charles Dubes The requirements of modern technology necessitate extremely large and complex systems. As the number of components in- creases, so does the chance that the system's Operational worth will be degraded because of component failures. This thesis proposes a mathematical model, called the General Model, which analyzes the effects of such failures on a system's operational worth, as measured by reliability and utility functions. This model differs from existing models in that it includes the effects of both drift and catastrophic failures on the per- formance of redundant systems. The General Nodel separates the analysis into a set of drift problems by defining states for a system. A system changes states when a component fails catastrophically. Since a system's drift properties change when a component fails, a drift analysis is made for each state. A system is classified as either loaded or non-loaded, depending on whether or not the catastrOphic failure of a :redundant component changes the failure rates of other components. {Techniques for finding the probability distribution of the system's Richard Charles Dubes states are presented for both cases. The components can fail catastrophically in either one or two modes. Two algorithms are presented for finding reliability expressions when a system is non-loaded and drift is neglected. These algorithms are applicable to any active redundant system for any component failure distribution. The first algorithm applies when components have one failure mode; the second, when two failure modes are allowed. A procedure for simpli- fying the resulting reliability expressions is also presented. Two types of component drift are discussed. In the first, the amount of drift depends on the length of time the component operates; in the second, the drift is a function of the states the system passes through and how long it remains in each state. Techniques for determining these drift proper- ties and for relating them to system drift prOperties are provided. A MATHEMATICAL ANALYSIS OF THE EFFECTS OF COMPONENT FAILURES ON SYSTEM PERFORMANCE By Richard Charles Dubes A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering 1962 ACKNOWLEDGMENTS The author wishes to express his gratitude to Dr. H. E. Koenig for his guidance in the preparation of this thesis. He also wishes to thank Dr. J. H. Stapleton, Dr. M. B. Reed, and Dr. G. P. Weeg for the fruitful discussions concerning this thesis. ii CONTENTS Page ACKNC‘JLEDGIHENTS e o o o e o e e o e e o o e e e e e o o e 0 ii LIST OF ILLUSTRATIONS . . . . . . . . . . . . . . . . . . . iv LIIS'J.1 OF APPEN DICES o e o o e e o e e o e e e o c e e e o o e v Chapter I O IIJTRODUCTIOIQ O O O O O O O O O O O O O O O O O O O O 1 Component Failures Classification of Systems Neutralizing Failures II, {ATHEHATICAL RODELS OF FAILURE ANALYSES . . . . . . 8 Foundations of the General Model Existing Models III. CATASTROPHIC FAILURES . . . . . . . . . . . . . . . 27 One Catastrophic Failure Mode Two CatastrOphic Failure Modes IV 0 DRIFT EFFECTS O O O O O O O O O O O O O O O O O O 0 70 The Drift Problem Distributions of System Parameters at One Time Time Variations Conclusions LI 8T C‘F REFEIQLYI‘JCES O O O O O O O O O O O O O C O O O O O O O 123 Figure 1. 2. 9. 10. 11. 12. 13. 14. LIST OF ILLUSTRATIONS Example of a State Table and Corresponding Reliability Diagram . . . . . . . . . . . Examples of Reliability Diagrams. . . . . . Reliability Diagrams. . . . . . . . . . . . Example of a State Table, Transition Chart and Reliability Diagram . . . . . . . . . Example of a Reliability Diagram and State Table 0 O O O O O O O C O O O O O 0 Markov Transition Chart Corresponding to Fig. 5 O O O O O O O O O O O O O O O 0 Partial Reliability Table for Fig. 3A . . . Tabulations of a Simplification Procedure for Fig. 7. O O O O O O O O O O O O O O 0 Example of a State Table and Corresponding Reliability Diagram 0 e o e e o e o e e 0 Example of a Reliability Diagram. . . . . . Example of Step 1 of the Two-Failure Algorithm 0 O O O O O O O O O O O O O O 0 Example of Step 2 of the Two-Failure Algorithm................ Histogram of Measured Data. . . . . . . . . Power Supply System and Corresponding State Table and Transition Diagram. . . . iv Page 31 32 33 38 40 41+ 53 54 6O 67 68 69 85 87 Appendix A. B. C. LIST OF APPENDICES Some Basic Definitions from Probability Theoryeeeeceoeoeeeeeoeoeo Continuous Parameter Markov Chains with a Finite Number of States. . . . . . . . . . . The Application of Markov Chains to the Analysis of Loaded Systems . . . . . . . . . Justification of the Two-Failure Algorithm. 0 O O O O O O O O O O O O O O O O Derivation of an Approximation to the Mean and Variance of a Function of Random Variables . Page 98 101 110 115 120 (I H A I” T ER I INTRODUCTION The technology that sends man into space and propels missiles from continent to continent has an insatiable appetite for more complex and more 50phisticated systemsl. National security neces- sitates the gains in operational worth that result from larger systems. However, these gains are offset by an increase in the chance of system failure, a consequence of using more components. The need for more reliable systems requires no further elaboration. Any scheme that is directed at decreasing the chance of. system failure is, by common usage, termed a reliability pro- gramz. An interesting history of the efforts in this direction has been presented by Ryerson (6). As discussed by Chorafas . (7 Chapter #0), a successful reliability program begins with specifications and proceeds through design, development, manu- facture, and field testing. Component failure data is fed back from each stage in a continual re-design cycle. 1For example, Koenig (1) has stated that the SAGE con- tinental defense system contains over 1,252,000 individual tubes, (tiodes, resistors, capacitors, inductors, and printed circuits. IPerry (2) reports that the NIKE-HERCULES missile system consists of about 1,500,000 individual parts. 2A wide variety of reliability programs have been proposed 111 the literature, e.g., by Saltz (3), Greene (4), and Dreste (5). 2 A most important step in any reliability program is the failure analysis which predicts, on the basis of component failure datal, whether the system meets its reliability objectives. This thesis presents a mathematical model for accomplishing such a failure analysis analytically. For purposes of a failure analysis, a system is presumed to have been optimized on the basis of criteria prescribed by speci- fications, such as the mean-squared error for a control system or the signal-to-noise ratio for a transmitter. Suitable measures of these performance criteria are assumed to be defined so that a system's operational worth can be judged quantitatively. Sufficient data on the failure tendencies of the components are also assumed to be available. The failure analysis presented in this thesis uses these data, along with information on component connections, to determine the measures of Operational worth. Component failures Component failures are ordinarily separated into two classes (e.g., see Feyerherm (9)) called catastrophic, or chance, and drift, or degredation, failures. The distinguishing features of these classes are discussed in this section. Connor (10) describes catastrophic failure as follows. ”Chance (catastrophic) failure will include those physical 1The precision of any prediction of system reliability :rests on the amount and accuracy of component failure data. Statistical sampling plans for obtaining this data will not be presented in this thesis. An excellent exposition of such plans has been given by Epstein (8). 3 occurrences (without a central tendency in time)which are beyond the engineering resolution of normal techniques for product design, manufacture and Operation, and cause cessation of perfor- mance requiring corrective maintenance. When a component fails catastrophically, its characteristics change suddenly, a discontinuity in operation occurs, rendering the component operationally useless. The term "failure" usually refers to catastrophic failure which is the only type of failure considered in the majority of books and papers on reliability. A component's chance of failing catastrophically is determined by two factors-~the component's inherent strength and the environ- ment. The former is built into the component by the manufacturing process. Uncontrollable variations in raw materials and fabrication techniques cause variations in the susceptibility of presumably identical components to failure. This inherent strength can change with time as the component deteriorates with age. The environment is the totality of all random and deter- ministic factors in the Operating medium which can cause changes in the characteristics of a component. If failure data is to be used to predict failure tendencies, it must be taken under the same environment that will be experienced in actual use. in contrast to the spontaneity of catastrOphic failure, drift is a gradual transition to less desirable component pro- jperties. The point at which a component fails because Of drift depends on the sensitivity of system performance to that com- ponent's properties, on the drift characteristics of other components, and on the amount of drift in system performance that 1+ can be tolerated. The drift failure of a system, which is determined in terms of prescribed tolerance limits, can be caused by a relatively slight drift in a group of components in the same direction or by extreme drift in a single component. Also, two components might drift in Opposite directions so as to compensate for one another. For these reasons, the drift failure of a component cannot be uniquely related to that for a system. Deleterious drift in one situation can be tolerable in another. Classification of systems The mathematical techniques which are applied to a failure analysis depend to a great extent on the manner in which the system is used. Systems are classified, by usage, as intermittent or continuous and as repairable or non-repairable. A continuous system is one which is turned on and allowed to operate until it fails and in which components can fail catastrophically at any time. That is, the failure-causing environment is present whenever the system is operating. Examples Of continuous systems are airplanes and missiles during flight, the control systems driving a scanning radar antenna, and the generators supplying a power distribution system. An intermittent system, on the other hand, is one whose periods of use and non-use alternate, usually in a random fashion. Such systems are Often subsystems which Operate at the command of a larger system. Examples are the braking system in an automo- bile and the arithmetic units in a digital computer. An excellent failure analysis of intermittent systems has been presented by Flehinger (ll). 5 In a repairable system, components are replaced when they fail or according to a prescribed maintenance policy. The time needed to locate, replace, and repair the failed component is included in the failure study. In a non-repairable system, each component operates until it or the system fails; both component and system failures are permanent. In this thesis, all systems are assumed to be continuous and non-repairable. Neutralizing failures The severity of component drift can often be reduced by proper design, especially in electronic circuits. The two standard design procedures for effecting this reduction are called the "worst case" method, discussed by Ashcraft and Hochwald (12) and Suran (l3) and "marginal checking", examined by Chorafas (7 p. 370), Patterson (14), Brown and Dennis (15), and Drenick (16). In both methods, those component parameters to which the system is most sensitive are set so as to effect system performance in the worst possible manner. The system is designed to operate satisfactorily under these hostile conditions. However, as pointed out by Hellerman and Racite (17), these methods have two serious shortcomings. First, the event that all critical- component characteristics will simultaneously attain limiting values in actual operation is extremely small so that the conmmnents have unnecessarily tight tolerances; second, they 'provide no measure of relative reliability so that comparative lcosts for different situations cannot be weighed against in- creases in Operational worth.‘ A \,l A non-redundant system is one in which the catastrOphic failure of any component causes system failure; i.e., every component is essential to prOper Operation. The effects of catastrOphic failure in a non-redundant system can be reduced in either of two ways: (1) by using more reliable components, (2) by using redundancy. With systems containing a large num- ber of components, the first is Often ineffective. however, Davis (18) cites a case when redundancy is impractical because of space, weigtt, and cost considerations. On the other hand, Luebbert (19) contends that redundancy is the only realistic approach to the failure problem. Redundant components are components which are extraneous if all other components are working properly, but which can replace catastrophically failed components without inter- rupting the operation of a system. Three types of redundancy are distinguished, called active, standby, and majority redundancy. If a redundant component is used continuously from the time the system is energized until failure, the system contains active redundancy. If a failed component is automatically replaced by a new component, using a detection and switching apparatus, the system contains standby redundancy. Flehinger (PO) has presented a very thorough analysis of standby redundant systems. These two types of redundancy are not both applicable to all situations. For instance, the total resistance of a set of paralleled resistors changes when the separate resistors fail. However, two electrical generators can form an active redundant 7 system, if they are paralleled and both have the proper loading capacity. Alternately, they can be used in standby, if a detection and switching mechanism is available. Both active and standby redundancy are used with continuous, repairable or non- repairable, systems. Majority, or voting, redundancy was introduced by Von Neumann (21) and extended by Depian and Grisamore (22). The primary application of majority redundancy is to gate circuits in digital computers, which are intermittent systems. A single gate is replaced by a set of identical gates whose binary outputs are polled by a majority device. The output of this device is designed to agree with the majority of inputs. One of the few design procedures for constructing a redundant system was presented by Moore and Shannon (23, 24) in their classical study of "crummy" relays. Active redundancy is used, but the systems are intermittent. Only active redundant systems are treated in this thesis, although the failure analysis can easily be extended to standby redundant systems. CHAPTER II MATHEMATICAL LODELS OF FAILURE ANALYSES Every analytical paper treating the influence of component failure on system performance introduces, at least implicitly, a reference frame, or set of "ground rules" which acts as a model and upon which the paper's computations are based. In the great majority of such models, failure types, distributions, or component interconnections are specified, and are used in the computation of numerical figures of merit. A comprehensive mathematical model which contains these detailed models as special cases is outlined in the first section. This broadly based mathematical model is termed the General MOdel. It embraces many types of failure analyses yet permits direct computation of figures of merit. with this model, three features are incorporated into figures of merit which are not .apparent in any model available in the literature. First, both the drift and catastrophic failure tendencies of a system are cnombined; second, the drift characteristics Of components in :redundant systems are included; third, the changes in the failure-causing environment of a component caused by the failure «of other components is taken into account. The salient aspects of three important models presented in 'the literature and their relationships to the General Model are presented in the second section. Foundations of the General Model The General Model contains a set of definitions which provides the framework for statistically describing the components of a system, for relating these descriptions to system behavior, and for evaluating corresponding figures of merit. Definitinns of the mathematical terms used to construct the General Model are given in the first part of this section. In the second, some measures of operational effectiveness are defined. The procedural technique for using the General Model to carry out a failure analysis is presented in the third part of this section. Definitions Part_parameter For the purpose of analysis, a system can be broken into units, usually called subsystems and components. The most basic units which can be defined are called irreducible multiterminal componentsl, each of which is completely characterized mathe- matically by a terminal representation. Any coefficient in the corresponding terminal equation is called a part parameter if it satisfies the following two conditions. i) It is a random variable with a non-trivial distribution. ii) It is not a function of other random variables. Any through or across variable which satisfies these two conditions is also termed a part parameter. The set of part parameters defined for a system must exhibit all the inherent statistical characteristics of the system's lThis notation has been introduced by Koenig and Blackwell (95, p. 109). 10 building blocks. Each part parameter is represented by a stochastic processl, written (Bi(t), t 6 Ti) where T is a linear point i set representing usage times. The valuesof Bi(t), for any t, are separated into two mutually exclusive sets, labelled Ci(t) and D.(t). 1 The set Ci(t) is assumed to contain at most two points, 0. and c which are extreme values Of B.(t), both of which 11 12 l connote complete breakdown or catastrophic failure. A part parameter is said to have failed at time t if Bi(t) E‘ Ci(t)° Since many part parameters may be associated with a component, the relationship between failure in the real sense and a part parameter assuming an extreme value is often nebulous. Thus, a physical interpretation of part parameter failure will not be made. Failure is permanent so that if Bi(t) = c then ij’ Bi(T) = c. . for all T > t. 13 The set Di(t) contains all possible drift values. If Bi(t) 6‘ Di(t)’ the part parameter may or may not be operating properly, depending on tolerance limits. The set of all part parameters is represented by the vector E(t) = (Bl(t),...,Bk(t)). Parameter state variable A convenient format for expressing the distribution of Bi(t) is afforded by an auxiliary random variable Xi(t) = X(Bi(t)), called the parameter state variable, or the state variable for the part parameter Bi(t). The range of this random variable is 1The definition of a stochastic process and the notation used are given in Appendix A. 11 defined in E0. (1). Xi(t) = 0 if Bi(t) 6‘ Di(t) (l) = 1 if Bi(t) = cil = 2 if Bi(t) = ci2 The numerical values assigned are, of course, entirely arbitrary. The distribution of Bi(t) can then be prescribed by the cumulative distribution function FB (b;t) in Eq. (2). i 2 (2) F (b;t)==:E: 3‘ (b/x;t) p (x;t) “ . - ‘ _ In Eq. (2), FBi/Xi(b/x,t) - p(Bi(t)._ b / xi(t) _ x) and pX (x;t) is the density function of Xi(t). i The convenience of this notation becomes apparent when the individual terms are considered. FB /X (b/0;t) represents the i i drift characteristics of the part parameter. (b/l;t) and FBi/Xi FBi/X1(b/2;t) are step functions, each with a saltus of one. The density factor expresses the relative probability of catastrophic and drift values. System state variable If the drift characteristics of a redundant system are to be studied at a certain time, both the part parameters which have not failed and their connections must be known. This information is supplied by an auxiliary random variable, called the system state variable, denoted by X(t). Depending on the order in which the part parameters fail and the type of redundancy, different drift situations can arise. A one—to-one correspondence is established between the values of the system state variable and 12 all possible drift situations, or states. Specific values for the system state variable are assigned in later sections when active redundancy is considered. In the following discussion, the system state variable is assumed to have distinct values x1,...,xm. System parametegg A system parameter is any function of part parameters used to make a quantitative judgement on the excellence of performance. Each system parameter is represented by a stochastic process, (Ai(t)’ t €~q§)’ The set of all system parameters is denoted by the vector X(t) = (A1(t),...,An(t)). The relationship between the ith system parameter, the set of part parameters E(t), and the system state variable X(t) is given in functional notation by Eq. (3). Ai(t) hi(B(t), X(t);t) (3) (E(t)) for all t such that X(t) = j. hij The form of Eq. (3) indicates that each system parameter can become a function of different part parameters when the system changes its state; i.e., when part parameters fail. The (continuous) density function for the joint distribution of the system parameters is expressed by Eq. (4). m (it) f-E(a;t) =2 j=l fX/X(a/xj;t) pX(xj;t) The cumulative distribution function corresponding to Eq. (A) is denoted by FX(§;t). If a large system is divided into a set of components, the 13 contribution of each component to over-all operation is expressed by a set Of component parameters. They play the same role for the components as the system parameters play for the system. The system parameters for the large system are then expressed in terms of these component parameters. If some of the components are labelled as subsystems, their parameters are called subsystem parameters. Value functions All system characteristics to be evaluated quantitatively are contained in the system parameters. The means for computing relative numerical values for these characteristics are provided by value functions. These functions are the figures of merit by which the system's operational effectiveness is judged, using tolerance sets as criteria. A tolerance set, denoted by Si(t) for the ith system para- meter, is that set of values of system parameter Ai(t) such that the statement ”Ai(t) E“ Si(t) at time t" means that that portion of system performance judged by Ai(t) is completely satisfactory at time t. The tolerance sets are defined in terms of design Objectives and limits imposed by specifications. The possible value functions are many and varied. This thesis is concerned with two of these, called reliability and 'utility which are defined below. Reliability The standard definition of the reliability of a system has beenl given by many authors (e.g. Bazovsky (26, p. 11)). "Reli- ability is the probability of a device performing its purpose 14 adequately for the period of time intended under the Operating conditions encountered.” Adequate performance is taken to mean performance within the tolerance sets. Then, the reliability, R(to,tl), of a system for the time interval (tO,tl), t0 .= t1, is defined by Eq. (5). (5) R(to,tl) = P(Ai(T) € si(T), all t0 5 T 5 t1 and all 1) Utility The drift of a system parameter just outside its tolerance set might be much less serious than an extreme value. This effect is included by defining a utility function, written ui(Ai;t), for each system parameter. The range of each parameter utility function is arbitrarily taken as [0,1] with the maximum value shown below. ui(Ai;t) = 1 if Ai(t) 6‘ Si(t) This assigns maximum utility when the system is Operating as well as can be expected, as far as the ith system parameter is concerned. When Ai(t) assumes an extreme value, ui might be taken as zero. The values of u for system parameter values between 1 these limits might be taken as a non-increasing function Of A.(t). 1 Some system parameters might reflect more essential properties than others. This is taken into account with a weighted sum of parameter utility functions, called the system utility function U(A}t) defined in Eq. (6). n (6) U(A;t) =2 w.(t) u.(A.;t) i=1 1 l l 15 The following conditions are assumed to be satisfied in Eq. (6). 2E:n w.(t) = 1 and w.(t) E: 0, any t and i : l,...,n. i=1 1 1 The weights are made functions of time so that the relative importance of the various parameter utility functions can be changed in time. The system utility function has range [0,1]. Measures of the distributions of the parameter utility function and of the system utility function, such as the mean and variance, serve as numerical values of utility. Failure analysis using the General Pbdel A simple illustration of a multi-loop control system will show how the Operation of a system is viewed in terms of the General Fodel. Both the forward and feedback loops might contain redundant components. A set of system parameters such as response time and gain margin, are defined to measure the Operational effectiveness of the system. These parameters are functions Of component charac- teristics such as plant constants, amplifier gains, and gear friction. All such characteristics which are assumed to be lmnldom variables are designated as part parameters. The system is turned on at time zero. The part parameters drinft from their initial distribution as the system operates, (mulsing changes in the distributions of the system parameters. Ifiher1.a part parameter fails, the system changes states. The efftxzt of this failure could be minor, if proper redundancy is 16 available, or it could remove a feedback loop. This, in turn, could simply degrade system performance or could cause instability and system failure, depending on the type of failure and number of feedback loops. The amount by which the system's operational worth is decreased depends on the tolerance sets assigned to the system parameters and on the choice of value functions. By a failure analysis is meant the determination of value functions whose numerical values depend on usage time and on the distributions of part parameters. The five steps below outline such a failure analysis on the basis of the General Model. 1. Select the part and system parameters. Because of analytical complexity, all component characteristics may not be designated as part parameters. The choice depends on the in- herent variability in the component characteristics and on the sensitivity of system performance to these variations. The set of system parameters must express all facets of system operation which are to be examined, as dictated by design objectives. 2. Determine the distribution of the system state variable. Both the range and distribution of the system state variable must be found. This step is the main topic of Chapter III. 3. Perform a drift analysis for each state. The joint distribution of the system parameters for each state of the system is found. Since the system changes states when a part parameter fails, the forms of the system parameters change, as shown in Eq. (3). Chapter IV covers the execution of this step. #. Combine the drift and catastrophic failure effects. The results of Steps 2 and 3 are combined, using Eq. (4), to find the joint distribution of the system parameters. l7 5. Determine the value functions. The joint distribution derived in Step # is used to form the reliability and utility functions, both of which depend on usage time and part parameter distributions. In summary, the cornerstone of a failure analysis using the General Model is the concept of separating the analysis into a set of problems, each of which can be solved independintly. The system state variable provides the means both for this separation and for the recombination of these problems into the over-all analysis and applies the prOper weight to each result. That is, drift failures in more probable situations are weighted more heavily than those in less probable situations. The catastro- phic failure tendencies of the part parameters determine the distribution of the system state variable. Thus, both types of failures are packaged together so that Operational effective- ness can be judged on the basis of all pertinent factors. Although the five basic steps are used in all failure analyses, their execution can take on many forms. The entire analysis may be accomplished analytically or a computer-simulated sampling plan can be set up. The computational details depend on the interaction between part parameters, the types of failure distributions, and the forms of the system parameters. Also, the analysis might be simplified into finding certain measures of the distributions of system parameters. It should be emphasized that the General hodel is not limited to reliability and utility value functions, but provides the foundation for calculating any analytical value function. 18 Existing models In this section, the salient features of three mathematical models from the literature having wide applications are presented, along with their relation to the General Model. However, before these three models are discussed, some other work on model building deserves mention. Two papers by Rosenblatt (27, 28) present, somewhat quali- tatively, some features which are desirable in a model. A series of simple models are given along with methods for ex- pressing physical events in probability language. An early approach to an over-all failure study was presented by Benner and Meredith (29). The assumptions made include normally distributed performance parameters, series systems, and inde- pendently failing components. This frequently-referenced paper presented ideas which were quite novel at the time of its publication. A theoretical exposition of the factors causing component failure has been given by Birnbaum (30). The physical condition of a structure is considered as a stochastic process whose prOperties depend on environment, usage, and observed samples. However, the model is not directly applicable to the calculation of value functions. Many facets of a failure analysis are discussed in a paper by Drenick (16). Most of the techniques presented are applications of renewal theory. However, cost minimization procedures are given for the marginal checking technique. Several figures of merit are used. A rather quantitative method for handling drift 19 is considered. This paper is more a collection of techniques than a comprehensive working model. Renewal models A brief outline of the aims of renewal theory applicable to failure analyses is presented below. The following assumptions are made. 1. All system components must be operating properly for the system to be operating properly. 2. Removal of a system component is caused either by failure during operation or by a predetermined maintenance policy. 3. Upon removal, a component is immediately replaced by a new and statistically identical one. According to Drenick (31), the system may be viewed as a number of sockets into each of which a component is inserted. At each socket, a renewal process develops, i.e., an unending sequence of removals and replacements. Corresponding to each socket, there is a parent population from which a component is drawn each time a replacement is required. The over-all failure pattern for the system is then a superposition of many such re- newal processes. The derivation of this failure pattern's propmrties is the main objective of a renewal study. Two functions, called the renewal rate and the expected Innnber of replacements, relate the renewal process at each socket euui the system removal pattern. These functions are found from time solution to differential equations and depend on the para- meters of the component failure distributions. The above assumptions imply that a component failure can be 20 detected, located, and corrected in a negligible period of time; i.e., the system is always operating. Implied here are the assumptions that system failure can be traced to one particular component and that replacement of that component restores the system to prOper working order. Failure caused by component drift is not allowed, so that this theory is directly applicable only to catastrophic failures. The case when all components have the same lifetime distributions has been studied in some detail by Flehinger and Lewis (32). One interpretation of this is that all system components have the same parent population. Detailed solutions to the resulting renewal equation, using mean-time-to-failure and survival probability as figures of merit, are also presented for various hazard functions. The system failure pattern when the renewal processes for all the sockets are not identical has been studied by Drenick (31) in an approach motivated by the central limit theorem. An asymptotic expression for survival probability of the form exp(-t) is derived under certain conditions. An interesting feature of this approach is that the asymptotic expression becomes more accurate as the number of sockets increases. A restrictive assumption is that of independence of the renewal processes for the various sockets. It is stretching a point to say that the renewal model is a special case of the General Model. If each part parameter is allowed only two values, and if the three assumptions are applied, :1 study of the General Model implies a study of the renewal 21 model. However, both the mathematical techniques used and the aims Of the investigations are different. Although some features of the renewal model at the component level will be incorporated into the General Model, the viewpoint taken is that the two models cover different areas of study and should not be combined. Markov chain models If drift is neglected and if all component failure rates are constant, the theory Of stationary Markov chains1 can be applied directly to a failure study. A most interesting analy- tical model using this approach has been presented by Barlow and Hunter in two papers (33, 34). The first paper (33) states the mathematical and probabilistic basis Of the model. The second (34) is a special case which covers a type of optimal redundancy for series-parallel systems along with maintenance and checking policies. The Optimal redundancy study has also been published in a separate paper (35). The main features of the Barlow-Hunter model are discussed below. The essential description of a system is given in terms of a stochastic process which describes the system's state. The number Of states is determined by the number Of components, their connections, and the number of states each component can assume. In all examples presented, each system component can, at any time, be in one of two or three states. One corresponds to "in use"; the others, to "in-repair”, where both open and short circuit types Of failures are allowed. Time-tO-failure 1An outline of the properties of continuous parameter Markov chains is presented in Appendix B. 22 and time-tO-repair are both assigned exponential distributions so that the well-developed theory of stationary Markov chains can be used. Independence Of component failures need not be assumed. this technique is particularly applicable to repairable systems containing active redundancy in which drift is neglected. Two figures Of merit are used, called reliability and efficiency. Both are defined in terms of a gain function g X(t) whose domain is the state space of the stochastic process and whose range is contained on the real line. Speaking quali- tatively, the value of the gain function should be relatively large for favorable states of the system and relatively small for unfavorable states. The following definition of reliability is used by Barlow and Hunter. R(t) = E[g(X(t))] = jg X(t,w) dP(w) A particularly useful gain function is expressed in terms of a class A of favorable states of the system. G X(t) 1 if x(t) E A 0 otherwise With the gain function G, the R(t) function becomes the probability that the system is in a favorable state at time t. R(t) =f dP(w) A The second figure of merit, called the efficiency and denoted by Eff, is defined below. 23 Eff = If gX(t,w) dP(w) dF(t) In this expression, F(t) is the distribution function for a time- based distribution which could reflect the presence Of an environ- mental threat in time. If this distribution is continuous and uniform on [0,T], Eff is the time average of R(t) over that interval. Two Observations should be made concerning the definitions of reliability used by Barlow and Hunter. 1. Since reliability is here defined as the expected value of a random variable, it is not, in general, a probability. NO restrictions have been placed on the range of g[X(t)], so its value is not limited to the unit interval, or even to positive values. If the gain function given above is always used, this Objection is eliminated. 2. Even with the special gain function, the R(t) function is not reliability in the usually-accepted sense, since it is not a function of a time interval. The Objection here is not to the usefulness of the figure of merit, but to its name. The Barlow-Hunter Model is a special case of the General Model. Assuming that each component can be characterized by a singfle part parameter, the two models are related by prOperly interpreting the two notations. If drift is neglected, the parameter state variables completely describes the failure tenden- cies of the part parameters. Thus, the parameter state variable and the state variable for each component used by Barlow and 24 Hunter are synonymous. The three values assigned to the parameter state variable in Eq. (1) correspond to the allowable states for each component in the Barlow-Hunter Model. The states of the system are the same in both models. The gain function and the utility function serve the same end. The R(t) function and the efficiency are then measures of utility. Another approach using Markov chains has been presented by Rohn (36). Standby, rather than active, redundancy is the main concern Of this paper. The figure of merit used is called "fractional up-time" which is the fraction of time the system is operable during a time interval. Drift failure is neglected and only one mode of catastrophic failure is allowed. Variations in the Operational worth of the various standby components are included. Both time-tO-failure and time-to-repair are assigned exponential distributions. Ashar (37) has also studied this problem using Markov chains, but from a more general viewpoint. Roberson's model One of the only models in which both catastrophic and drift failures are included in a unified analysis has been presented by Roberson (38). The essential features Of this model are discussed below. System performance is measured by a vector E Of scalar quantities representing variables such as voltage, pressure and (counter readings. Ideal, or perfect, system performance is (defined by a value of E, denoted by if. System error, e, is (defined as the norm of (E - E*) and measures the system's cmperational worth in terms Of a value function which is a 25 non-increasing function of system error. The value function to which most of Roberson's paper is devoted is called "p-error", and is defined as the solution to the following equation. P(eEep)=p (051351) The distribution of the system error is defined in terms Of the zero-failure probability R, the inherent accuracy, A(e'), and the failed accuracy, F(e'). R = P(no component failure) A(e') P(e f. e', if no component failure) F(e') P(e :5 e', if one or more component failure) The zero-failure probability R equates system failure to failure of any component. The parameters Of A are called accuracy parameters and describe the drift properties Of the system when all components are Operating properly. Different degrees Of operational worth are inserted for varying degrees of system failure by means Of the parameters Of F, called failure severity parameters. The p-error satisfies the following equation. p = R A(ep) + (l-R) F(ep) The value of p establishes a performance level. If a value of p is specified, the value of eP can be found from the above equation. Thus, the relationship between probable system error and the system performance level can be derived. Roberson dis- cusses in some detail the existence of and bounds on solutions for ep. The main limitation on this model is that it is applicable 26 only to series systems. This is implied by the zero-failure probability R which includes all catastrophic tendencies. This measure also allows for only one mode of catastrophic failure. Roberson's model is a special case of the General Model. The vector ; corresponds to the part parameters; system error e serves as the system parameter. If norm (2 - §*) is identically zero when ; =I;*, the tolerance set on e is the single point zero. Using a single set of values as ideal performance provides a direct relationship between the tolerance set on the system parameter and limits on the part parameters. However, it seems more desirable to speak Of a range of values as satisfactory performance. The distribution constants for the distributions of the part and system parameters serve the same purpose as the accuracy and failed-severity parameters used by Roberson. The p-error and the utility function measure the same characteristics of system behavior. CHAPTMR III CATASTROYHIC FAILURES Methods for determining the range and distribution of system state variables for systems containing active redundancy are presented in this chapter. As a by-product of this study, techniques for finding reliability expressions for active redundant systems in which drift is neglected are derived. Two results are presented which have not been given in the literature. First, an algorithm for deriving reliability expressions for non-loaded systems is given which is independent of failure distributions and component connections. When properly interpreted, this algorithm can be used when the part parameters have one or two failure modes. Second, a method for determining the reliability of any loaded system composed of exponentially failing part parameters is described. In the first section, systems in which each part parameter has one catastrophic failure mode are discussed; in the second, two failure modes are allowed. One catastrOphic failure mode As was discussed previously, the values Of the part parameter Bi(t) are segregated into two sets, Di(t) and Ci(t)’ which are interpreted as drift and catastrophic values, 27 28 respectively. In this section, Ci(t) is assumed to be a i' The value C1 is either of the two extreme values allowed for Bi(t)' For notational convenience, these two one-point set c extreme values are referred to as the Open and short modes Of failure. Range of the system state variable Before the states of a system, or values of the system state variable, can be enumerated, the concept Of catastrophic system failure must be defined. A system has failed (catastro- phically) when it is considered to be Operationally useless. This state could correspond to a system parameter value far outside its tolerance set or to an extreme value of a value function. When in this state, the system is shut down; no further part parameter failures can occur. A physical inter- pretation of system failure can be made only on a particular system. The states Of a system are designated in terms of the parameter state variables. A numbering system for accomplishing this is presented below. The base r representation of any (base 10) non-negative integer N can be written as M M ----Ml if Eq. (1) is 10 k k-l satisfied. k (1) N10 :2 Miri-l , o : Mi :5 r-1 1:]. In particular, if r=2 and if Eq. (2) is satisfied, then MkMk_1---M1 is the binary representation of N10. 29 E k i 1 (2) N = M.2 ' , M = o or 1 IO i=1 1 i The range of a parameter state variable Xi(t) was defined 11 = C1 and 012 is a set of measure zero. Assuming that the system is turned on at t = O, in Eq. (11.1). In this section, c at which time no part parameter has failed, state numbers are assigned by the following two rules. 1. If, at time t, the system has not failed, let M1 = Xi(t). Let the number Nlo Obtained from Eq. (2) be the state number at t. 2. Consider all numbers formed as in Rule 1 for which the system is failed. Let the minimum of these be the state number corresponding to system failure. The binary representation of each state number will be written as a k-digit number, assuming the system has k part parameters. If a system is in a non-failed state at time t, such a representation exhibits the values of the parameter state variables at t; the left-most binary digit, or bit, is the value of Xk(t), the second from the left, Xk_l(t), and so forth so that the right-most bit is the value Of X1(t). Two questions must be answered before the two rules can be used to enumerate the states of a particular system: (1) which sets of part parameters can fail before the system fails; (2) which sets of part parameters can be in the failed state at the same time? This information can be presented in either of two forms--by a state table or by a reliability diagram. 30 The state table exhibits all sets of failed part parameters and the corresponding state numbers in tabular form. A state table has k+1 column headings, one for each parameter state variable and one for the system state variable. Although the headings can be listed in arbitrary order, it is convenient to write them in the order Xk(t) Xk_l(t) --- X1(t) X(t). The The numbers 0 to 2k-1 are listed in binary form under the first k headings and the state numbers are determined by the two rules. If the system is operational for the values in a particular row, the state number of the system is the base 10 equivalent of that row's binary number. All rows which represent impossible states are omitted. An example of a state table and corresponding reliability diagram is given in Fig. l in which all failures are assumed to be Of the Open variety. The state table can be generated in the following manner. Consider any set of values for the parameter state variables. If Xi(t) = 0, let Bi(t) be its nominal value; if Xi(t) = 1, let Bi(t) = c For each possible set of part parameter values, 1' determine if the system has failed. Then, apply the two rules for finding state numbers. This is repeated until the state table is completed. A set of values can be ignored if the corresponding event has measure zero; i.e., the failure of one part parameter might effectively remove other part parameters from the system by shutting down some subsystem. This process can be programmed on a digital computer. A reliability diagram contains the same information as a X,+(t) iii—t: X2(t) Xl(t) 5:) o O o O o 9 o o o 1 1 o o 1 o 2 O o 1 1 3 2 / o 1 o o 3 o 1 o 1 3 O 1 1 O 3 1 o o O 3 3 1 o o 1 3 1 o 1 o 3 Fig. l.--Example of a state table and corresponding reli- ability diagram. state table, except that it does not number the states ex- plicitly. It merely exhibits schematically the relation between part parameter and system failures. A step-by-step procedure for constructing a reliability diagram will not be given. Rather, a set of rules by which an existing diagram can be interpreted is presented, followed by some examples. As used in this thesis, a reliability diagram is a col- lection of blocks, one for each part parameter, and lines drawn between two vertices. The lines connect the blocks so that each block is contained in at least one path between the vertices. The block for each part parameter is labelled with that para- meter's subscript. The reliability diagram is interpreted by means of the following rules. 1. If Bi(t) = c1, i.e., X1(t) = l, the ith block is Q 32 either erased from the diagram (Open failure) or replaced by a solid line (short failure) depending on the single mode of catastrophic failure assigned to Bi(t). 2. The system is in the failed state either if a solid line exists between the two vertices, or if there is no path consisting of blocks and lines between them. Otherwise, the system is Operational. Some simple examples are shown in Fig. 20 , , 1., <> 1 Fig. 2.--Examples of reliability diagrams In Fig. 2A, system operation ceases if any of the three part parameters fails Open, or if all fail short. The situation is reversed in Fig. 2B; all must fail Open or any one can fail short for system failure. Fig. 2C is a type Of bridge diagram. If all part parameters can fail in the Open mode, system operation ceases if all part parameters in any of the following sets are failed: (31,32), (34,35), (31.33.35), (B2,B3,B4). (B1,B2,B3), (B3,B#,B5). It should be noted that, for example, the event (Bi(t) = c1, all i=l,2,3,4) has measure zero. If all part parameters in Fig. 2C can fail only in the 33 short mode, the catastrophic failures of all the part parameters in any of the following sets would cause system failure: (81,84), (B2,B5), (B2,B3,B4), (B1,B3,B5), (BZ,B3,B5), (Bl,B3,B#). The diagrams in Fig. 2 have one feature in common--only two lines are connected to each block. Such diagrams will be called "two-line" diagrams. However, all active redundancy cannot be represented by this type of diagram, as shown in Fig. 3. (1]— U" 5 1, 5 e I q ‘3 2 / 3H£221 F Fig. 3.--Reliability diagrams At first glance, Fig. 3A and Fig. 3B appear to represent the same system. However, blocks Q and S are in parallel in Fig. 3B, but are not in Fig. 3A. Also, if all part parameters except B1, B5, and B6’ failed Open, the system represented by Fig. 3B is Operable, while that represented by Fig. 3A has failed. It might seem that the diagram in Fig. 3A can be transformed into an equivalent two-line diagram, such as that in Fig; j3C, since they have the same blocks in the paths between the vertices. However, the two diagrams do not produce the same 34 reliability number. The type of diagram drawn in Fig. 3A is, then, more general than a two-line diagram. It sometimes has a closer relationship to block diagrams for physical systems and should be easier to construct. However, in general, there is no direct correspon- dence between the topology of a reliability diagram and those of signal flow graphs, linear graphs, or schematic diagrams. The primary difference between the variety of reliability diagrams proposed in the literature (39, 40) and those used in this thesis is the interpretation given each block. In this thesis, the blocks represent part parameters, while in the usual diagrams, they represent components. Distribution of the system state variable The concept of a lifetime random variable is presented first, followed by a discussion of loading effects. The process of state changing is then discussed. This subsection is concluded with a review of the analytical techniques available for deriving the density function for the system state variable. Lifetime random variables The catastrophic failure tendencies of a part parameter are Often exhibited by means of an auxiliary random variable, called the time-to-failure, or mortality, or lifetime random variable, denoted by L. The range of each lifetime variable is the usage time of the system. The event [L1 = t] means that the ith part parameter fails at time t. The range of the parameter state variable for part parameter Bi(t) is expressed in terms of the lifetime variable in Eq. (3) 35 (3) Xi(t) 0 if Li >' t (no failure up to time t) 1 if Li S t (failure at or before t) The alternate expression for the range Of a parameter state variable in Eq. (3) is introduced to show how the well- known concept of time-tO-failure ties in with the General Model. The lifetime variable is especially useful when the reliabilities of systems in which drift is neglected are studied. The relationship between the distributions of part parameters, their state variables, and lifetime variables is given in Eq. (4). (43) P (Li S t) P(Bi(T) = c some T .<. t) = P(X1(T) = 1, some T S t) i, P(Xi(t) = 1) = pxi(l;t) ll 0 (4b) P(Li > t) = P(B1(T) i ci, a11T S t) = P(X1(T) a11T S t) = P(Xi(t) = o) = pXi(O;t) Loading,effects The technique used to find the distribution of a system state variable depends to a great extent on whether the lifetime distributions change after the system's state changes. When a part parameter fails, the stresses on the remaining part para- meters might increase, thus increasing the chance Of their failing. The amount by which a part parameter is weakened could depend both on which states the system has passed through and on how long the system has remained in each of these states. Such effects are referred to as loading effects. 36 A system is classified either as non-loaded or as loaded. The set of all states for which the system is Operational is denoted by z If, for all x“ 6‘ 20, the joint density function 0. pf(;;t) of the parameter state variables is given by Eq. (5). the system is said to be non-loaded. The binary representation # ' t_-_ t of x is written as xk x1. _ k — *0 _. t. (5) pX(x ,t) - .TT px (x1,t) i=1 i It should be noted that the parameter state variables in a non-loaded system are not statistically independent, since no part parameter can fail after the system has failed. The operation of a non-loaded system can be viewed in terms of the following illustration. Consider each part parameter as being Operated at a separate test station, rather than in the system. However, let the failure-causing environ- ment Of each part parameter be the same as would be experienced in the system. As the part parameters fail, consult either the reliability diagram or the state table to determine when the system fails. Shut Off all test stations at the instant of system failure. A non-loaded system can be considered as one in which failures are caused mainly by the system's external environment such as vibration, humidity, and temperature. In a loaded system, failure is also caused by the task the system is performing, e.g., by the currents, forces, or pressures associated with the various part parameters. 37 The failure pattern in a loaded system is obviously more complicated than in a non-loaded system. When loading is a factor, the illustration cited above is changed. The environ- ment at each test station varies with time and depends on which part parameters have failed, the order in which they failed, and how long each part parameter had Operated. The state-changing process It will be assumed that each lifetime variable is of the continuous type. Then, the event in which two or more part parameters fail simultaneously has measure zero. Thus, a system can go from state i to state j without passing through inter- mediate states if and only if the binary representations of i and j differ by one bit, assuming that neither state i nor state j is the failed state. Since repair is not allowed, the bit must be a O in i and a l in j, which means that the part parameter corresponding to that bit has failed. If state j is the failed state, its binary representation may differ from that of i in any number Of bits because a common state number was assigned to all situations implying system failure. For example, in Fig. 1, state 1 goes into state 3 if any one of its 0 bits changes to a 1. When finding a system state variable's distribution, it is Often convenient to record all possible inter-state transitions on what will be called a transition chart. A one-tO-One cor- respondence is established between the possible states of the system and a set Of nodes. If a one-step transition from state i to state j is possible, a directed line is drawn from node i to 38 node j. All possible transitions are so recorded. in example of a state table and transition chart is given in Fig. 4 in which all failures are in the Open mode. X4(t) x (t) X2(t) X1(t) X(t) _;;___ _____ o O O O O 2 o O O 1 l l 3 O O 1 O 2 / c o 1 1 3 I O 1 O O 4 o 1 O 1 3 Reliability Diagram 1 O o O 3 1 O O l 3 1 O 1 O 3 1 1 O O 3 State Table Transition Chart Fig. 4.--Example of a state table, transition chart and reliability diagram. Distribution of the system state variable for non-loaded systems A technique for deriving the density function Of the system state variable for non-loaded systems is given below. The set Of all state numbers for which the system is not failed is denoted by Z ' the failed state, by Z The density function 0’ 1' of the lifetime random variable for the ith part parameter is 39 denoted by fi' * . --- t o e . . If x E“ Z0, where xkxk_1 x1 is its binary representation, the value of the system state variable's density function, PX’ is given at x‘ by Eq. (6), using Eq. (5). _ k *0 -— — *0 .. i. (6) pX(x ,t) - pX(x ,t) — £7; pxi(xi,t) The individual factors in Eq. (6) are related to lifetime distributions by Eq. (7) on the basis of Eq. (4). t (7a) pX (l;t) - P(Li f t) J{ fi(Z) dz 1 0 so (7b) pX (O;t) P(Li=> t) J[ fi(z) dz 1 t The value of the density function at the failed state Z1 is given in Eq. (8). k (8) pX(Zl;t) = l - Z 077’ pX (xi;t) Equations (7) and (3) prescribe the distribution of the system state variable for any non-loaded system, regardless of the form of the lifetime distributions or of component connections. As an example, the density function of the state variable for the non-loaded system whose reliability diagram and state table are shown in Fig. 5 will be computed. All part parameters are assumed to fail in the Open mode. The distribution Of the ith lifetime random variable, i = 1,2,3, is assumed to be 4O exponential; the density function fi is as shown in Eq. (9). (9) fi(Z) = Hi exp(-Hiz), if z >- O and the system is Operational = O , if 2 <= 0 or if the system is failed The constant Hi is called the hazard or the failure rate. is. 312. 51 3‘. j) 0 O C C O C l 1 / 3 O l L 2 O l l P 1 O O 4 2 l O l 2 l 1 L P Fig. 5.--Example of a reliability diagram and state table The following expressions are Obtained from Eq. (7). pXi(O;t) = exp(-Hit) pxi(l;t) = l - exp(-Hit) Equation (6) is used to find the density function of the system state variable at the points Of Z0 2 (0,1,4). pX(O;t) (O;t) : exp[-(hl+n2+h3)t] p (O;t) p (O;t) p X3 X2 X1 pX(l;t) e pX3(O;t) pX2(O;t) pX (1;t) exp[-(H2+H3)t] - expTe(Hl+H2+H3)t] px(4;t) = pX (l;t) pX (O;t) pX (O;t) 3 2 exp[-(H1+H?)t] - expf-(H1+H?+H3)t] The last value Of the density function is found from Eq. (8). 41 pX(2;t) = 1 + exp[-(H1+H2+H3)t] - exp[-(H2+H3)t] - exp[-(H1+H2)t] Distgipption of the system state variable for loaded systems Determining the distribution of the state variable of a loaded system can be a very difficult problem, since the failure times Of the part parameters depend on a variety Of factors. Only the case described by the following conditions will be considered. 1. At any time, the lifetime distributions for all non- failed part parameters depend only on the state the system is in at that time, and not on any state the system was in prior to that time. 2. As long as the system has not failed, the lifetime variable for each non-failed part parameter has the expon- ential distribution shown in Eq. (9). The hazard for each such distribution can change when the system's state changes. These conditions reduce the problem of finding the system state variable's distribution to an application of stationary Markov chains. The notation and terminology used in Appendix B will be translated to fit the present situation. The system state variable is described by a continuous parameter, stationary Markov process [X(t), O :5 t.§ 0°] having a finite number of states. The states, although non-negative integers, do not necessarily include all numbers in the sequence O,l,2,...,2k. At time zero, the system is in state 0, since no part parameters are assumed to be failed when the system is 42 turned on. The distribution at t = o is thus given by Eq. (10). (10) px(x;0) = P(X(O) = x) II |'-" x = O = O , x i O This implies that, at any time, the value of the density function at any state 3 is expressed by Eq. (11). (11) px(j;t) = p03(t) As explained in Appendix B, there are essentially two methods for finding the required transition probabilities poj(t). The first is to use the algorithm implied by Eq. (B.14). Equation (11) then becomes Eq. (1?). (12) px(j;t) =:E: O np0j(t) 1']: Either of the separate induction processes given in Eq. (B.12) or Eq. (B.13) can be used to find the terms in Eq. (12). This method is particularly useful for simple topologies, such as those in Fig.'s 3, 4, and 5. The second way to proceed is to use either the backward system of equations in Eq. (B.17) or the forward system in Eq. (B.18), tOgether with the initial conditions of Eq. (10). The solutions to either of these sets of differential equations are the transition probabilities. This method is more appli- cable in large systems or where repair is allowed. Either of these systems of equations can be programmed directly for the digital computer since they are in normal forml. Analytical 1Digital computer solutions to differential equations in normal form have been studied by Wirth (41). 43 solutions can be obtained by using Laplace transform techniques and matrix algebra, or signal flow graphs. To use either of these two methods for finding transition probabilities, the q's in Appendix B must be expressed in terms of the lifetime distributions of the part parameters. This is discussed in Appendix C and the results are summarized below. The lifetime density function, fir’ for the part parameter Br(t) when the system is in state i is given by Eq. (C.l). _ _ >- _ (0.1) fir(2) - Hirexp( Hirz) if z ._ O and i 6‘ Z and Xr(t) _ O O _'7 _. ifz 77 px (icing) ‘ Z0, i=1 i The summation in Eq. (17a) extends over all sets of parameter state variable values which are binary representations of state numbers in ZO. k FF p (x.;t) 2 i=1 Xi 1 (17b) pA(l;t) = l - pA(O;t) = ii: 1 In Eq. (17b), the sum extends over all sets of parameter state variable values which are not binary representations of state numbers in Z0, even if some of these sets of values have measure zero. An algorithm for computing either pA(O;t) or pA(l;t), called the one-failure algorithm, is now presented. The one- failure algorithm is based on Eq. (17) and exploits the analogy between a reliability diagram and a combinational switching circuit schematic. The following three steps define the one- failure algorithm. 1. Construct a reliability table. The form of the reliability table, which corresponds to a truth table, is the same as that of a state table, except that the heading for the system state variable is replaced in a reliability table by a heading for A(t). All 2k sets of values for the parameter state 50 variables are listed, even though some are sets of measure zero. 2. Compute one term of Eq. (17a) from each row of the reliability table for which A(t) = O, or one term of Eq. (17b) from each row for which A(t) = 1. If one of these rows is x' —-- x' the corresponding term in Eq. (17) is given below. k 1’ k g2; pxi(x£;t) In this computation, either all A(t) = 0 rows or all A(t) = 1 rows are used, not both. 3. Add all terms computed in Step 2. If the A(t) = 0 rows were used, the expression for pA(O;t) is found from Eq. (17a); if the A(t) = 1 rows were used, pA(l;t) is computed as indicated by Eq. (17b). A general method for simplifying the transmittance ex- pression of a two-terminal combinational switching circuit has been discussed by Caldwell (42, p. 145) and is called the Quine-McCluskey method. The variation of this technique presented below is used to simplify the expression for pA(O;t) or for pA(l;t) obtained from the one-failure algorithm. The Quine-McCluskey method starts with a truth table representing the standard sum form of a transmittance and derives its prime implicants. In the variation presented below, an expression for pA(O;t) or pA(l;t) is obtained from the relia- bility table which is simpler in the sense that it involves fewer terms and fewer factors in some terms than the expres- zaion resulting from the one-failure algorithm. This method is 51 based on the following identity. pxi(0;t) + pxi(l;t) = 1, any i = l,...,k The analogous expression from Boolean algebra is Y + Y' = l. The Quine-McCluskey method also used the identity Y + Y = Y, for which there is no analogy in terms of density functions, so that the method cannot be applied in toto. This explains why a complete analogy cannot be drawn between a switching circuit schematic and a reliability diagram. That is,1n a switching circuit, the transmittancesp2(1)pl + p5(o>p,p2(1)pl p5(0)p3(0)p2(1)p1(1) + p5(l)p4(0)p2(o)pl(1) p5(l)p4(0)pl(0) + p5(0)p2(0)] 54 000000 x 00000_ x 000_o_ x 00__0_ 000001 x 000_lo 001_0_ x 000010 x 00010_ x 010__o 000100 x 00100_ x 001000 x 0100_o x 010000 x 00_011 000011 x 00110_ x 000101 x 010_01 000110 x 0101_0 x 001001 x 001010 001100 x 010001 x 010010 x 010100 x 001011 x 001101 x 010101 x OlOllO X Fig. 8.--Tabu1ations of a simplification procedure for Fig. 7 The purpose of the one-failure algorithm and simplification procedure is to provide an almost rote method, which is always applicable, for finding the reliability of a non-loaded system when drift is neglected. As with most universal techniques, short-cut methods exist for certain connection sehemes. An example of such a scheme is the series-parallel diagram discussed ‘by Radner (40), Lipp (43), Moskowitz (41), Bazovsky (26) and 55 others. These authors also use a form of Bayes' theorem which can reduce the work involved in obtaining reliability expres- sions. A different approach has been taken by Weiss and Kleiner- man (45). This early work studies the set of all paths in a reliability diagram which is drawn with directed lines. If the corresponding "path reliabilities" are combined prOperly, the system's reliability can be found. This technique seems to have been discarded by later works in the area, probably because of the unnecessary labor involved. This work also lets the reliability of each component be a random variable, which is not in accord with the standard definition. The analogy between reliability diagrams and switching circuit schematics has been noted by many authors, particularly by Moscowitz (44). This paper utilizes concepts similar to signal flow graphs and equivalent circuits from a non-proba- bilistic viewpoint. Components are represented by lines and their connections, by nodes. Such a representation is inherently a two-line diagram. Each line is assigned a constant, cor- responding to an average reliability over a time interval. A technique is then given for finding an average value for the system's reliability. As Moskowitz recognizes, each line on his diagram must correspond to a different component for his technique to be valid. This rules out an equivalent diagram for non-two-line diagrams such as that in Fig. 3C. Reliability of loaded systems Some of the results for non-loaded systems are now extended 56 to loaded systems. Drift failures are again neglected so that each part parameter can assume only two values. The discussion of loaded systems diverges from that for non-loaded systems after Eq. (14) since, in a loaded system, the joint distribution of the parameter state variables cannot be expressed as the product of marginal distributions. If the expression for the density function of the system parameter A(t) given in Eq. (II.4) is combined with Eq. (13), Eq. (18) is obtained. (18) pA(0;t) = E px(x;t) z o The sum in Eq. (18) extends over all values x 6‘ Z0, the set of non-failed states. As discussed previously, if the lifetimes of all part parameters have exponential distributions of the form shown in Eq. (C.1), the theory of stationary Markov chains can be used to find the density function of the system state variable. Under these conditions, Eq. (11) can be used with Eq. (18) to produce Eq. (19). The reliability of loaded systems is found from Eq. (14) and Eq. (19). (19) pA(0;t) = E poj(t) z 0 Of course, Eq. (18) is also valid for non-loaded systems. Ithever, the one-failure algorithm does not explicitly use the system state variable and is much easier to apply. 57 In an interesting paper, Balaban (46) has discussed, among other things, the effects of loading on systems containing active redundancy. This paper is one of the very few which explicitly discusses these effects. However, this study applies only to two elements in parallel, both of which have exponential failure distributions. Although formulas in terms of general failure densities are given, the failure process must have the stationary property before the formulas are valid. Only exponentially distributed lifetime variables possess this property. Some simple reliability diagrams with exponentially failing components are also discussed by Bazovsky (26, p. 134) in which loads are shared by the components. Two catastrophic failgre modeg The results of the previous section are now extended to systems in which part parameters can fail catastrOphically in either of two modes. In this section, the set of catastrophic values Ci(t) for at least one part parameter in a system is taken as a two point set. However, a part parameter cannot take on both catastrophic values during one period of operation. ., all T 2t for That is, if Bi(t) = c 13 ij’ then Bi(7') = c j = 1,2. This section is subdivided in the same manner as was the previous section. Since two failure modes are distinguished for part para- rneters, two catastrophic failure modes are also distinguished for'the system. For convenience, these modes are called the Open 58 and short modes of system failure. The range of the system state variable As before, the range of a system state variable is defined in terms of parameter state variables. If r = 3 in Eq. (1), and if Eq. (20) is satisfied, MkMk_1 ... M1 is the base three representation of N10. k i 1 (20)N =E M3”,0§Ms2. lO i=1 i 1 Assuming that all part parameters are operational at t = 0, when the system is turned on, state numbers are assigned by the following rules. 1. If, at time t, the system has not failed, let M1 = Xi(t)' Let the number N10 obtained from Eq. (20) be the state number at t. The set of all such numbers is denoted by Z0. 2. Let y1,..., y3 be all numbers obtained as in Rule 1, for which the system is in the failed-open state. Then Z11 = min. (y1,..., yj) is the state number when the system has failed in the open mode. 3. If the word "open" is replaced by "short" in Rule 2, then Z = min. (y1,..., yj) is the state number when the system 12 has failed in the short mode. The set of values, ( Z12) is denoted by Z 211* 1° The base three representation of a state number in ZO’ 'when written with k digits, exhibits the values of the parameter state variables. 59 All possible states of a system can be conveniently displayed in the form of a state table. The only change from the state table discussed in the previous section is the base in which the state numbers are written. This information can also be recorded schematically on a reliability diagram which is interpreted in the same manner as was discussed previously. An example is given in Fig. 9. The distribution of the system state variable Lifetime random variables are discussed first, followed by techniques for obtaining the required distributions. Lifetime randog_yariab1es The possibility of three values for each parameter state variable implies that the distribution of Xi(t) cannot be specified in terms of Li alone. The mode of failure is also important. This additional information is included by defining the two functions, Kil(t) and K12(t) in Eq. (21). - — ‘ O (21) Kiln) - P(Bi(t) .. °i1 / Li _ t), _ - < K12“) .. P(Bi(t) .. c:12 / Li .. t) If Li 5 t, the ith part parameter must have failed in one of the two modes so that the following conditions must be satisfied. Ku(t) + Kiz(t) = 1; (t) 2 0, all t, j = 1,2. Kij The relationship between a parameter state variable, lifetime variable, and values of a part parameter are given in Eq. (22) in which the (continuous) density function of L1 is denoted by f1. ll - 12 ' 2 2 t) 60 X 2 (t) 2 2 xl(t) O 1 1 x(t) O l arrsrsrrmnm )4 14 \o c» N O 21 L, 20 20 20 Fig. 9.--Examp1e of a state table and corresponding reliability diagram. 61 (22a) pX (O;t) -_- P(Bi(T) € D1(T), o 5 T :5 t) 1 (>0 P(Li=> t) =-/; fi(z) dz (22b) pX (l;t) 1 P(Bi(t) = cil / L. < t) P(Li f. t) 11‘— t Kil(t) ”I; fi(z) dz (22c) pX (2;t) = P(Bi(t) = oi2 / Li :5 t) P(L1 E t) i t K12(t) .j; fi(z) dz As an example, the distribution of a parameter state variable when the lifetime random variable has the exponential distribution in Eq. (9) will be found. The relative frequency of short and Open failure are assumed to remain constant with time so that the total failure rate, H in Eq. (9) is the sum of two constant failure values. 19 Hi = Hi1 * H12 The 1 and 2 subscripts imply open and short failure rates, respectively. Then, the K functions are shown below. Kil(t) = H11 / Hi ; K12 = 312 / Hi Under these conditions, the density function in Eq. (22) can be written as in Eq. (23)l. 1Care must be taken when combining two failure rates. A conceptual error, such as that made by Bazovsky (26, p.138) in his discussion of switch failures, can easily creep into a develop- ment. Bazovsky lets a switch have a constant fail-open rate H and a constant fail-short rate H . He then makes the following statements. 8 go = Probability of failing open = 1-exp(-hf) 62 (23a) pX (O;t) = exp(-Hit) 1 (23b) pX (lgt) 1 (H11 / H1)[ 1 - exp(-Hit)] (23.) px (2;t) (H12 / Hi)[1 - exp(-Hit)] i The state-changing process The state-changing process is essentially the same as that discussed in previous section. The system leaves state i, i E Z0 if a 0 digit in the base three representation of i changes to a 1 or to a 2, which implies that the corresponding part parameter has failed. The hystem cannot leave either state in Z1, nor can it return to any state. Transition charts are drawn in the same manner as in the previous subsection. Distripution of the system state variable for ng£;1oaded systems For state numbers in Z the density function of the system 09 state variable can be found from Eq. (6) by using Eq. (22) for the separate factors. The value of this density function for the two states in Z1 is not as easy to compute. The sum of these two terms must satisfy Eq. (24) k (24) Z pX(x;t) = 1 -Z 7T pX(xi;t) " Z i=1 “1 0 q8 = Probability of failing short = l-exp(-Hst) r = Probability of no failure = l-(qO+qs) = exp(-H0t) + exp(-H t) - 1 Now, if t >min.(1n 2/H , 1n 2/H ), then r ¢ 0, which, ofscourse, violates the definition of the pgobability of an event. The error here is in the assumption that the two possible failure events are independent and mutually exclusive. 63 One is tempted to compute the separate terms on the left of Eq. (24) from formulas similar to Eq. (6), using those rows in the or Z However, the sum of the 11 12' two terms computed in this manner might not satisfy Eq. (24). state table for which X(t) = Z If all 3k possible rows were entered on the state table, regardless of whether some are sets of measure zero, Eq. (24) would be satisfied, but these is no assurance that the values of the density function so obtained would correspond to the actual failure probabilities. At present, the density function's values at the shut down states can be found only when each parameter state variable has the distribution given in Eq. (23). This situation is discussed with loaded systems. Of course, for simple cases, such as all blocks in the reliability diagram in series and parallel, the density function can always be computed. Eistgibution of the system state variable for loaded systems As in the previous section, only a special case is dis- cussed, namely when the state variable of each part parameter has the distribution given in Eq. (23). The failure rates H in this distribution may change when the system changes states. Again, the theory of continuous parameter, stationary Markov chains is applied directly to this problem. The initial condition of the process are given by Eq. (10). The density function can be evaluated from Eq. (11) by either of the two techniques discussed previously. The relationships between the q's, which determine the transition probabilities, and the lifetime distributions are shown below. The density function fii of the lifetime variable . . A .th , . . ass001ated with the 3 part parameter when tne system is in state i is assumed to be of the form shown in Eq. (25). fij(z)=0, z<0ori€zl (25) =0if Xj(t) £Oandi € :50 = Hij exp(-Hijz) otherw1se. In Eq. (25), Hij = + 2, where Hi. 15 the open Hijl Hij 31 failure rate and Hi the short failure rate. 32’ If Xr (t), m = l, 2, ..., n are all 0 when the system is m in state i, then qi is given'by Eq. (26) n (26) qi =2 Hir if i E Z0 m=1 m 0 ifi€Zl. If the failure of Br(t) causes the system to go from state i to state j, i,j E? Z0, q. is one of two failure rates. 13 If Xr(t) = l in the binary representation of j, qij is given by Eq. (27a). (27a) qij = Hirl If Xr(t) = 2 in the binary representation of j, qij is given by Eq. (27b). (27b) qij = Hir2. If i 67 Z0, j = 211 and if the open failure of any of the part parameters Br(t), ..., Bvit) causes the system to go from i to j in one step, qij is given by Eq. (28a). 65 V (283) (11‘j = zmzl Hirmle If i 67 Zo’ j = Z12 and if the short failure of any of the part parameters Br (t), ..., Br (t) causes the system to go 1 h from i to j in one step, qij is given by Eq. (28b). h (28b) qij =Em=l Hirmg Reliability Special techniques for finding reliability expressions when drift is neglected are now presented. For notational simplicity, all part parameters are allowed to have both failure modes. Non-loaded and loaded systems are discussed separately. Reliability of non-loadeggsystemg Since drift failures are neglected, the system parameter A(t) defined in Eq. (13) is again the only system parameter of interest. Only two values of A(t) need be defined, even though the set Z in Eq. (13) contains two points, since the reliability 1 value function does not distinguish the modes of system failure. The expression for reliability in Eq. (14), that for the density function of A(t) in Eq. (17), and the one-failure algorithm are all directly applicable to this case. The only difference is that in the previous section, base 2 numbers are used in the reliability table and in this section, base 3 numbers are used. The simplification procedure discussed previously would 66 have to be extended to cover this case. Three , rather than two, base three representations have to be combined to eliminate a factor. The one-failure algorithm, although applicable here, requires 3k rows in the reliability table, where k is the number of part parameters. A second algorithm, called the two-failure algorithm, is now presented which requires only 2k+l rows and allows direct utilization of the simplification procedure. The two-failure algorithm is described by the following three steps. Its theoretical basis is given in Appendix D. 1. Assume that no part parameter can fail in the short mode. Find an expression for either of two functions, called p(l)(0;t) and p(1)(1;t) by using the one-failure algorithm and simplification procedure. Only 0's and 1's are listed in the reliability table. The bits in the unchecked numbers resulting from the simplification procedure are interpreted as follows. i) 1 becomes pX (l;t) ii) 0 becomes 1 i pxi(l;t), not pXi(O,t) 2. Assume that no part parameter can fail in the open mode. Find an expression for either of two functions, called p(2)(0;t) and p(2)(l;t) by using the one-failure algorithm and simplification procedure. The reliability table contains only 0'5 and 2's. The results of the simplification procedure are interpreted as follows. i) 2 becomes pX (2;t) ii) 0 becomes 1 : pX (2;t), not pX (O;t) 1 1 3° Determine p (O;t) from one of the following relations. A , 67 p(l)(0;t) + P(2)(O;t) - 1 pA(O;t) pA(O;t) = 1 - pA(l;t) = 1 - [p(l)(l;t) + p(2)(l;t)] A method similar to the two-failure algorithm has been used by Price (47), without proof, to compute the reliability of parallel components. The situation when all blocks in a reliability diagram have identical failure distributions has been studied by Barlow and Hunter (34). The diagram studied is a series and parallel connection of subdiagrams, each of which is constructed from blocks connected in series and parallel. In this case, the reliability can be written directly, using a multinomial distribution. As an example, an expression for pA(O;t) will be found for the system whose reliability diagram is given in Fig. 10, using the two-failure algorithm. T _ A Fig. lO.--Example of a reliability diagram The details of Step 1 of the two-failure algorithm are shown in Fig. 11. Only those rows for which A(t) = l are listed in the reliability table. The following expression results from this step. as pi(x). p(1)(l;t) -.-. pL,(1) [ (1 - p,(1)[p1(1) O O 1 0 l 1 1 0 0 1 O 1 1 1 0 1 l 0 l 1 1 1 l l [1 - p4(1)] p2(1) p1(1> + p3(1)) pl(1) + p3(1)] p2(1) p1(l) + 68 For convenience, pX (x;t) is written 1 + p3(1) - p1(l) P3(1) - P2(l) pl(l)] X 1 l l 1 0011 1001 0111 1011 1101 1110 Fig. ll.--Examp1e of Step 1 of 1100 1111 X 10 1 110_ x 111_ x the two-failure algorithm The details of Step 2 are shown in Fig. 12. Only those rows for which A(t) = The following expression is obtained. p(2)(O;t) + O are listed in the reliability table. [1 - p4<2)][_p3<2>(1 - p1(2)) + 1 - p3(2)] p,(2) [1 - p2<2)][1 - p1(2)] 1 - p3(2) p1(2) - p4(2) [p1(2) + p2(2) - p1(2) p2(2) - pl(2) p3(2)] 69 0 0 0 0 0 0000 0 0 0 2 0 0002 0 0 2 0 0 0020 0 0 2 2 0 0200 0 2 0 0 0 2000 0 2 2 0 0 0022 2 0 0 0 0 0220 2 2 0 0 0 2200 Fig. l2.--Examp1e of Step 2 of x 000_ x 00_ x 002_ X x 02_0 x 2_00 x x x x the two-failure algorithm The results of Step 3 are obtained as follows. pA(0;t) Loaded systems 1 - p(1)(1;t) + p(2)(0;t) - 1 P(2)(O;t) - p(l)(1;t). If the lifetime variables of all part parameters have the distribution shown in Eq. (25), the theory of Stationary Markov chains can be used to find pA(O;t) in Eq. (14). Equation (19) is again used for this purpose, on the basis of Eq. (25) through Eq. (28). CHAPTER IV DRIFT EFFECTS Two types of drift in redundant systems are discussed in this chapter, neither of which has been treated in the litera- ture. In the first type, the drift characteristics of each non-failed part parameter depend on the length of time the system operates; in the second, the drift characteristics depend both on the states the system assumes and on the length of time the system remains in each state. Both types are appli- cable to loaded and to non-loaded systems. The first section in this chapter outlines the particular drift problem to be discussed. Methods for finding the joint distribution of the system parameters at a given time are presented in the second section. These methods provide a basis for the third section which deals with time variations in the distributions of part and system parameters. The drift problgp If the joint distribution of the system parameters is known at any time, the reliability and utility value functions can be found. However, for definiteness, only the measure of utility defined by Eq.(la) will be treated in this chapter. The techniques presented can be extended to cover other value functions. 70 71 (1a) Q(t) P(Ai(t) 6‘ Si(t), all i=1,...,n) f-(;;t) d; —/S(t) A The function Q(t), smetimes called pointwise reliability, is the probability that the system is operating satisfactorily at time t. The notation S(t) represents the tolerance sets for the system parameters A(t)° In order to use the General Model to find expressions for Q(t), Eq. (II.4) is substituted into Eq. (1a) with the result shown in Eq. (lb). (lb) Q(t) = ii: p (x;t) Jf‘ f- (a/x;t) d; x X S(t) A/X If x (f Z1, i.e., if the system is in a failed state, the corresponding integral in Eq. (1b) is assumed to be zero for any tolerance sets. Since, by definition, the system is operationally useless after it has failed, this is not a restrictive assumption. Equation (1b) then becomes Eq. (1c). (10) Q(t) = EE: p (x;t) ‘j:_ f- (a/x;t) da 20 X S(t) A/X The sum in Eq. (10) extends over all x 6? Z the set of 09 non-failed states. When the system has not failed, those part parameters which have not failed and which have not been removed from the system because Of the failure of other part parameters are called the Operational part parameters. When the system is in 72 state x 6? Z0, the set of Operational part parameters is denoted by B¥(t). The members of this set change when the system changes states as do the forms of the system parameters. Thus, in order to evaluate the integrals in Eq. (1c), the joint distribution of the system parameters for every state in ZO must be found at time t. These joint distributions can be derived by the techniques discussed in the next section if the joint distribution of the operational part parameters for every state in Z are known at time t. Methods for deriving 0 these joint distributions from drift data are presented in the last section. Distributions of systemgparameters at one time This section is devoted to a review of the available techniques for studying the joint distribution Of the system parameters at a particular time. Specifically, the distribution of the A's in Eq. (2) will be studied, where E =(Bl,...,B ) k represents a set of Operational part parameters. (2) A1 = vi(B1,...,Bk) i=1,...,n The first two subsections present techniques for finding the entire distributions of functions Of random variables, the first using analytical methods and the second, Monte Carlo techniques. The methods available for finding certain measures of distributions are taken up in the third subsection. Some special techniques for the important special case of linear functions of independent random variables are presented in the last subsection. 73 Analytical techniques for finding complete distributions The exact form Of the joint distribution of the B's in Eq. (2) is assumed known; i.e., the density or cumulative distribution function describing this joint distribution is given in closed, analytical form. The vi functions are also assumed to be known. The first technique to be discussed has been presented by many authors, e.g., Cramer (48, p. 292). This technique is first presented for the case when n = k in Eq. (2) and applies only when all B's are continuous random variables. The following two conditions are assumed to be satisfied for all B ) (vector) values 3 of the set Of random variables B a (B1,..., k for which the joint density function of B is non-zero. A) The functions vi are everywhere unique and continuous and have continuous partial derivatives avi/az:j i,j = l,...,k. B) The relationships in Eq. (2) define a one-tO-One correspondence between the points b and 2, which are values Of B and A, so that the inverse relationships written as Eq. (3) exist, where the wi are unique. (3) B1 = "1(A1,0009Ak) i = l,.ee,k=ne The functions in Eq. (2) and inverses in Eq. (3) are written in terms of values of the random variables in Eq. (4)0 (4a) 8.1 = Vi(b1,...,b ) i 1' l,...9k=n k (4b) bi = "1(a1,...9ak) The probability element of the joint distribution function of the A's is expressed in terms of that for the B's in Eq. (5). (5) fK(;)da = lJlfg (B)db 74 The values 6 in the right-hand side of so. (5) must be replaced by values 3 from the inverse transformation in Eq. (4b). The function IJI is the absolute value of the Jacobian of the transformation defined by Eq. (4a). _ 8(wl,...,wn) J _ 8.0.1, . . . an) The Jacobian can also be expressed as follows, where, in the result, Eq. (4b) is used. -1 a (V1,...,vn) J = 3(b1,...,bn) For the special case when n=l, the transformation in Eq. (4a) is written in Eq. (6a). (6a) 81 = vl(bl,eoe,bk) a i bi , i = 2geee,ke If conditions A and B are satisfied, the unique inverse transformation given in Eq. (6b) exists. (6b) b 1 w1(al,...,ak) bi = a1 The Jacobian of this transformation becomes the following. , i = 2,oee,ko -l = <9wl(al,...,ak) = avl(b1,a2,...,ak) 69a1 €9b1 In this expression, b J 1 is replaced, after differentiation, by w1(al,...,ak). Then, Eqo (5) becomes the following. fK(a)da = 'JlfBl,B2,...,Bk(w1(al""’ak)’a2"'°'ak)da The (marginal) distribution of A1 is then found by integration as shown in Eq. (7). 75 (7) rA (.1) =f ---] fK(a)da2---dan 1 an a2 The extension to the case when n >-l, but n # k is Obvious. A second technique for finding the joint distribution of the A's in Eq. (2) has been discussed by Middleton (49, p. 21). The characteristic function of the vector variable A is expressed in terms of that for B. The joint density function of A’is then found by an inverse Fourier transform. In almost all applications of the above techniques, the resulting density functions are of non-standard form, and are not tabulated. However, in a few special cases, they are of recognizable form. For example, the sum of independent, normally distributed random variables is itself normally distributed. The chi-squared, student's t and F distributions were all derived from certain combinations of random variables. However, these special forms have only a limited application to the present problem. Monte Carlo techniques for finding complete distributions As applied to the problem at hand, "Monte Carlo" is a synonym for "sampling by computer". A method using this approach for approximating the density function of a known function of random variables has been presented by Hellerman and Racite (17). Their procedure, called "synthetic sampling", approximates the density function of A in Eq. (8). (8) A = V(Blgeee,B ) k The cumulative distribution function for each Of the random 76 variables B1,...,B is assumed known. Each B is a continuous k random variable so that there exists a unique inverse Of these distribution functions for any number between 0 and l. A random number generator1 is assumed to be programmed in the digital computer. The first computational step is obtaining k numbers r1...,rn from the random number generator. Each of these is used to compute one inverse of a cumulative distribution function are found, where b Fgl(ri). That k i: i is, a random sample from the distribution of each B1 is Obtained. One value of A is then computed from the b's using so that k numbers b1,...,b Eq. (8). The whole process is repeated at Often as is necessary until a number of values of A are obtained. Each is interpreted as a random sample from the distribution of A. These samples are then grouped in intervals on the axis representing values of A. The proportion of numbers in each interval is plotted in histogram fashion to approximate the density function of A. Some generalizations can be made on the technique des- cribed above. The B's need not be continuous random variables. If one is discrete, its cumulative distribution function is a step function with jumps at each point for which the density function is non-zero. A one-tO-One correspondence between the saltus at each jump and a subinterval Of the [0,1] interval can 1A random number generator is a set of computations which will produce random sam les from a distribution that is uniform on the closed interval TO,1 . They have been discussed, for example, by Taussky and TO (50)° 77 be made. Thus, these subintervals can be related to mass points of the distribution. Obviously, if the distribution of more than one function Of the B's is desired, each set of numbers (r1,...,rk) is used to compute a value for all functions of interest. Techniques for finding measures of a distribution In situations where the complete distribution of a function is not easily found, certain measures Of the distribution can be useful in a failure analysis. Some of these measures and methods for estimating them are discussed in this subsection. Moments The moments, LL1,L12,... of a distribution having cumu- lative distribution F are defined in Eq. (9). {>0 (9) [13 = Jr xde(x) —96 If L10 ES 1, and if the moments LL1,L12,... corresponding to a distribution function F(x) are all assumed to be finite, and if the series shown below is absolutely convergent for some r=> 0, then F(x) is the only distribution function having these moments. 00 Z (uJ/j!)rj i=0 If C(s) is the characteristic function for the distribution defined by F(x), then C(s) is given by the following, where i is the complex Operator. 78 (>0 0 C(s) = Z (LL./j:)(is)J i=0 3 I J These relationships between the moments, distribution function, and characteristic function were pointed out by Cramer (48, p. 174). Tchebycheff's inequality It is seldom easy to find or estimate all the moments of a distribution. Fortunately, many reliability computations require only an estimate of the percentage of the density function which is within certain limits. Such estimates can be made from approximations to the mean and variance of a distri— bution by means of the Tchebycheff inequality given in Eq. (10), which has been discussed by Cramer (48, p. 183). (10) P(IY - ml 2 ks) 5 (l/k2) , k > 0 In Eq. (10), Y is a random variable having (finite) mean m and variance 52. Thus, the probability that the value Of a random variable lies in the range m i 4.58 is at least 0.95, no matter what the distribution of the random variable, as long as it has finite mean and variance. The estimates provided by the inequality in Eq. (10) is, in many cases, quite conservative. For example, if Y has a normal distribution, the following is true. P( IY - ml 2 3s) a 0.0014 However, Eq. (10) produces an estimate of 0.111. Nevertheless, Eq. (10) provides one of the only analytical means for using estimates of m and s to place bounds on probabilities. 79 Estimation of the mean and variance Methods for estimating the mean, m and variance, 52, for a random variable A of the form shown in Eq. (8) are now considered. The first method to be discussed is called the "propagation of error" method. A very thorough review Of this technique has been given by Murphy (51) in which the relationships between cumulants of A and the B's are studied. A simplified form has been presented by Heyne (52) for functions Of two variables. The following results are derived in Appendix E. The mean Of A is estimated from Eq. (E.5), in which mi (E.5) m é v(ml,...,mk) = E(Bi). The variance of A is estimated from Eq. (E.6), where s: is the variance of Br and vr is the partial derivative of v with respect to Br’ evaluated at the means of the B's. k 2 . 2 2 (E.6) s = E r=1 vr(ml,...,mk) sr Equation (E.6) rests on the following three assumptions. 1) Equation (E.5) is satisfied. 2) The B's are statistically independent. 3) The higher order terms in Eq. (E.1) are neglected. If condition 2 is not satisfied, the covariance terms in Eq. (3.4) must be added. If condition 3 is not satisfied, covariance terms resulting from the higher order terms in the Taylor expansion in Eq. (E.1) must also be added. If condition 1 80 is not satisfied, Eq. (E.6) is essentially useless. In this case, one must resort to numerical integration Of the expression for the mean and second moment, since for any random variable A, the variance, if it exists, is given by the following expression. 32 = L12 _ m2 The discussion of a Monte Carlo technique in the previous subsection suggests another possible approach to the problem of approximating m and 8?. Since random samples of A are Obtained in the Monte Carlo technique, methods Of statistical inference could be used for estimating m and s2. A review of such techniques is beyond the scope of this thesis. Linear functions of independent random variables The very important special case of linear functions of independent random variables is now discussed. Of course, all techniques described previously also apply here, but some special techniques are of interest. The importance of this case stems from the fact that non-linear functions can some- times be approximated by linear functions, using Eq. (E.1) as shown in Eq. (11). k (11) v(B1,...,Bk) 2 J05) + Z vr(E)Br r=l The following notation is used in Eq. (11). k Jo(m) v(m1,...,mk) - 2E:r mrvr(ml,...,mk) =1 vr(m) = vr(m1,...,mk) 81 The vr(m) factas are sometimes called sensitivity terms since they can also be Obtained from the differential of v. For convenience, the function shown in Eq. (12), rather than that in Eq. (11), will be discussed. In Eq. (12), the Ci are mutually independent random variables. k (12) A :2 i=141 Two analytical techniques for finding the distribution of A have been presented by Cramer (48, p. 158). The first is concerned with characteristic functions. If Ci(5) is the characteristic function of £1 and CA(s) that of A, then they are related by Eq. (13). x, k (13) CA(s) = .n’ 01(s) 1:1 This method has been used, for example, by Depian and Grisa- more (22). The second method uses the convolution integral, shown symbolically in Eq. (l4). (l4) FA(x) = F1(x) ‘ F2(x) ‘ -~- * F(x) In Eq. (14), Fi(x) is the cumulative distribution function of i' The star symbols are interpreted as follows. (>0 F1(x) * F2(x) = j[°;F1(x-z)dF2(z) = F2(x) ‘ F1(x) It must be emphasized that Eq. (13) and Eq. (14) are valid 82 only if the 41 are independent random variables, related to A by Eq. (12). An interesting method for approximating the distribution of A in Eq. (12) has been presented by Gray (53) for the case when all C1 are continuous random variables. Piecewise linear approximations to the density functions of the {'5 are made and an algorithm for approximating the density function of A, section by section, is given. Time variations Methods for finding the joint distribution of a set of operational part parameters, knowing their drift characteristics are presented in this section. If this joint distribution is derived for every non-failed state of the system at time t, the techniques of the previous section can be applied to find the joint distribution of the system parameters, and Q(t) can be computed from Eq. (10). The drift phenomenon is described qualitatively in the first subsection. The second and third subsection treat independent and dependent drift, respectively. The drift phenomenon in a part parameter As outlined by Golubjatnikov (54), the variations in the value of a part parameter due to drift can be segregated into three categories, viz, initial variations, short term drift, and long term drift. Initial variations account for differences in the values of several new part parameters from the same lot. Such variations are caused by the manufacturing process and are usually centered 83 about a rated, or nominal value. A normal distribution is Often used to describe this variation with the mean as the nominal value. Short and long term drift describe non-catastrophic changes in the value of a part parameter in time. Long term drift is a gradual deterioration resulting in eventual failure. The limiting or end-of-life value depends on the particular system and on the tolerance sets assigned to system parameters. Short term drift, on the other hand, is a random fluctuation about some norm, caused by random changes in the environment. This norm includes the effects of long term drift. The percentage of variation caused by short term drift is usually smaller than that caused by long term drift. Independent drift Independent drift refers to the situation when the drift characteristics Of the Operational part parameters, at any time, depend only on the length of time the system has operated. That is, the environment which causes the drift does not change as long as the system is Operating. The sets of operational part parameters are, at any time, assumed to be statistically independent random variables. If B¥(t) = (Br1(t),...,Brh(t))is a set of operational part para- meters, their joint density function, for all t such that X(t) = x, is shown in Eq. (15). h (15) fig-(b;t): n” fB (b;t) x i=1 ri 84 The drift characteristics for a single operational part parameter B1(t) will now be discussed. These characteristics are expressed as the sum of two random variables in Eq. (16). (16) B (t) = B (o) + B(d)(t) i i i The system is assumed to be turned on at t = 0 so that, in Eq. (16), Bi(0) is a random variable representing initial variation. The random variable Bid)(t) represents the drift of B1(t) from its initial value at the end of a time interval of length t and includes both short and long term drift. For any time t for which B1(t) is operational, B1(O) and Bid)(t) are assumed to be statistically independent, an assumption that has been made by Xavier et. a1. (55). Two methods for determining the distributions of the separate terms in Eq. (16) are now considered. Empirical method The first method relies upon measured data and is used to find the distribution Of Bi(t) at one particular time. This method has been suggested by Xavier et. a1. (55) and is now quantified in terms of the General Model To find the distribution of B1(O), a group Of components from the same lot are phped on test simultaneously. The value Of Bi(0) is measured on all components and the data is plotted in histogram fashion as shown in Fig. 13. The ordinate of Fig.13 represents the percentage of values of Bi(O) falling within the various intervals. A (discrete) density function can be generated from the histogram as follows. The possible values of Bi(O) are denoted by bij' 85 58 Fig. l3.--Histogram of measured data bid = ()é)(eJ + ej-l If pij is the ordinate of the histogram for the interval ) , j=l,...,K (ej,e ), then the density function for B1(O) is given in j-l Eq. (17). j=1’OOO,K (l7) pB (bi;0) p. iszb i 13 i ij’ 0 otherwise The distribution of the change from the initial value at time t, Bid)(t), is often known from specifications. If it is not, the above technique can be used to measure it. In any event, the (discrete) density function for this distribution is shown in Eq. (18). (18) péd)(bi;t) - pig) if bi = big) 1 0 otherwise j=l,eeo,k The distribution of Bi(t) at time t is found by using Eq. (16), Eq. (17), and Eq. (18) with Eq. (1A). The resulting density function is shown in Eq. (19). 86 (19) (b -t) — K (d)(b b -t) pB. i9 ’ . pB. i - i,j, pij 1 j=l 1 . (d . 1f bi : bij + bir) , J=l,eeo,K; r=l’eee,k 0 otherwise Analytical method If the variation in time of the pointwise reliability, q(t), is desired, the above method cannot be used; rather, the dis- tribution of Bi(0) and Bid)(t) are approximated by continuous distributions which are assigned from experience. The distribution parameters for the assigned distributions are assumed tO be known functions of time. For instance, both (d) i mean and variance of Bi(0) would be constant while those of Bi(0) and B (t) might be assigned normal distributions; the Bid)(t) could be monotonically increasing functions of time. Equation (14) is again used to find the distribution of Bi(t), as shown by the density function in Eq. (20). (>0 (d) (20) fB (bi,t) _-/' fB. (bi-z,t) fB (z,O) dz 1 —oo 1 1 Example As an example, the pointwise reliability, Q(t), is com- puted for the d-c power supply system shown in Fig. 14. The following assumptions are made. 1) Neither voltage supply can drift; the drift of Bl(t) and B2(t) is independent. ii) Failure of any part parameter removes the corresponding supply from the system; one failure mode is allowed. iii) The system is loaded. I Bl(t) = gl(t) ‘2 I? B2(t) = g2(t) 1 2? L ::>?i » B(t)=e + 3 1 63 ~ + = 1 3 $963. B4(t) e2 RL = 50 Ohms el = e2 : e = 100 volts State table Transition diagram 0 O O O O O O 0 l 1 O O l O 2 O O 1 l 3 O l O O 4 O 1 1 O 3 1 O O O 8 l O O 1 3 Z0 = (0,1,2,4,8) 1 l 0 0 3 Zl = (3) Fig. l4.--Power supply system and corresponding state table and transition diagram. iv) The load current, iL, is the only system parameter. The system parameter is expressed in terms of the part parameters as follows, using the notation of Eq. (II.3). hO(B(t)) = h1(B(t)) l + RL[B1(t) + 82(t)] h8(B(t)) = v... e[Bl(t) + B2(t)] 1 ‘ eB2(t) l + RLB2(t) h2(B(t)) = h4(B(t)) = System state variable 88 eB (t) __7 l 1 + RLB l(t) The distribution of the system state variable for the loaded sy stem px(0;t) PX( jzt) is stated below. p00 p03 (t) (t) exp(-th) qj-qO j = 1,2, 9 = ——91— exp(-qot) - exp(-qjt) 4,8 The numerical failure rates are given below in dimensions Of (hours)-l, using the notation Of Eq. (C.l). q01 = H01 q02 = H02 qou = H03 q08 = H04 q0 = H01 q2 = H21 ql - 2 = 3 = l 5 + H 02 + H2 -1o’5 «10'5 0-4 «110"5 + H03 3 + H H2 H1 H2 H1 -4 01+ = 2°10 " H12+ H14 = qt]. = HA2 + I144 = 35.10 1 = H81 2 = HA2 ‘ 3=H83 4:344‘ _ _ _ . '5 5 The value of this density function for the states in Z is given below at different times. 0 t (hours) 0 100 500 1000 5000 PX(O;t) 1 0.9802 0.9048 0.8187 0.3679 pX(1;t) 0 0.00195 0.00863 0.0152 0.0265 px(2;t) 0 0.00126 0.0131 0.0229 0.0397 px(4;t) 0 0.00970 0.0432 0.0760 0.1327 pX(8;t) 0 0.00210 0.02185 0.0382 0.0662 89 Drift analysis The drift prOperties of the two conductances are assumed to be the same and are given below, using the notation of Eq. (16). Both Bi(0) and Bid)(t), i=l,2, have normal distributions, the former with mean m0 and standard deviation s0 and the latter with mean md(t) and standard deviation sd(t). The following values are in mhos. m - 1 , s0 = 0.0333 md(t) = 0.5 [1 - exp(-8o10'3t)] sd(t) 0.5 [exp(-2olo’3t) -1 ] The mean decreases and the standard deviation increases in time. For all values of time, the 3s limits are positive so that negative values of conductance are, for all practical purposes, eliminated. The tolerance set is taken as S(t)=(l.95,<>0). To com- pute Q(t), the integrals in Eq. (10) are expressed in terms of cumulative distribution functions. 00 J4 f (a/x;t)da = l - F (1.95/x;t) 1°95 A/X A/X In state 0, A(t) = ho(t). The sum B1(t) + B?(t) has a normal distribution with mean m(t) and standard deviation S(t). m(t) = 2[mO + md(t)] s(t) = [255 + 2s§(t)]% The cumulative distribution function of A at any time t can thus be expressed as follows. 1 - rA/X(1.95/O;t) 1 -P 90 [Bl(t) + 82(t) - m(t) s(t) 0.78 - m(t)] s(t) ¢[m(t) - 0.78] E w0(t) s(t) In the above expression, 0 is the cumulative distribution function of the standardized normal distribution for which tables are availabe. Since the drift properties of the conductances are iden- tical, the values Of the integrals in Eq. (10) are the same for states 1,2,4,8. l - FA/X(l.95/j;t) = ¢[ %m(t)}é- 0.78] E 711(t) 9 j=1929u98 (2) s(t) Numerical values for WO(t) and w1(t) are tabulated below. t 0 100 500 1000 5000 wO(t) 1.0000 1.0000 1.0000 1.0000 w1(t) 1.0000 0.9986 0.0281 0.0002 0.0000 Finally, Q(t) is computed from Eq. (10) as shown below. Q(t) = wo(t) pX(O;t) + W1(t)[:px(l;t) + px(2;t) t 0 100 500 1000 5000 9(t) 1 0.9952 0.9074 0.8218 0.3679 + PX(4;t) + pX(8;t)] It should be noted that if the catastrophic failure effects were neglected, the Operational effectiveness of the system would be judged by w0(t); if drift were neglected, the 91 reliability, R(O,t), is the sum of the density function values of the system state variable at t for all non-failed states. In particular, R(O,5,000) = 0.6330. Both these procedures give Optimistic predictions, when compared to Q(t). Dependent drift The second type of drift to be discussed is called dependent drift. In this case, the drift characteristics of the part parameters depend not only on which states the system assumes but also on the length of time the system remains in each state. For any x ET Z0 and any t, Eq. (15) is again assumed to be satisfied so that the drift properties of each part parameter can be investigated separately. TO find the distribution of an Operational part parameter, given that the system is in state x ff Z0, each possible sequence of states through which the system can pass, starting at state 0 and ending at state x, must be investigated. This is accomplished by defining an auxiliary random variable, called the sequence variable and denoted by Yx’ for each state of the system in 20. A system is said to assume sequence (x,t;x,t), where E = (x1,...,xN) and t = (to,tl,...,tN), if the state-to-state transitions, beginning in state 0 and ending in state x, occur in the following order. The system goes from state 0 to state x1 at time to, from x1 to x2 at t1, from xN-l to xN at tN-l’ from xN to x at tN, and the system remains in state x from 92 tN to t, where 0 < t1 < t2 < --- < tN < t. The sequence variable is assigned a unique value for each sequence the system can assume. Assuming that Bi(t) is operational when the system is in state x, its distribution is expressed in terms of the sequence variable for state x by Eq. (21). (21) fB./X(bi/x;t) = 2E: J{_ fB,/Y (bi/y;t) fY (y;t) dt 1 y t 1 x x Each term in Eq. (21) represents one sequence which the system can assume in going from state 0 to state x Z0. The value y of Yx specifies the exact transition times for the sequence. Integration is over all possible transition times, 0.: t0< --- < tN< t. The conditional density term f is found from an Bi/X extension of the independent drift development. The amount by which part parameter Bi(t) drifts when the system is in state j for a time interval of length T is denoted by the random variable B§:)(T). This is the amount Of change in Bi(t) from its value when the system enters state j. If the system assumes the sequence ((x1,...,xN), (to,...,tN);x,t), the relationship between the value of Bi(t) at time t and the drift variables is given by Eq. (22). (d>(t _t ) + ___ (d) (22) B1(t) = 31(0) + 1301 (to) + 3x11 1 0 (d) (d) + BxNi(tN-t ) + Bxi (t-tN) N-l The drift in each state is assumed to be independent of the value of Bi(t) when the system enters that state. Thus, the 93 random variables on the right of Eq. (22) are statistically independent for any set Of times, 0 < t0< --- < tN < t. If the system assumes the sequence used in Eq. (22), the distribution of Bi(t) is found from Eq. (14) as shown by the density function in Eq. (23). For convenience, the density function Of B§:)(T) is written as fji(2;t) and that of Bi(O), as f.(z;0). 1 _ N+1 (d) _ _ _ (23) fB /Y (bi/y,t;t) = Jr_ fxi bi - 2E: zj;t-tN f (x;t) dz 1' X Z j=0 The following notation is used in Ed. (23). f(d)(;;t) = f ( )"'fOi(zl't0)fi(zo;O) x i zN+13tN'tN-1 N d2 = dzN+1dzN---dzldzo The integration is over all values Of the z's for which the arresponding density functions are non-zero. Since the drift variables in Eq. (23) are independent, the central limit theorem can be used to approximate the distri- bution of Bi(t) by a normal distribution. If the distributions of the variables in Eq. (22) are near-normal, this would be a good approximation. However, for arbitrary distributions, the number Of variables must be large before this approximation is valid. The distribution of the sequence variable for a particular state is now discussed. The lifetime variables for all part parameters are assumed to have exponential distributions which are given in Eq. (III.9), for non-loaded systems, and in Eq. (C.1), for loaded systems. To minimize the notational com- 94 plexity, each part parameter is assumed to have only one failure mode. If the system assumes the sequence ((xl,...,XN).(tO,...,tN); x,t), the part parameter that fails at time tJ is denoted by BS (t), j=0,1,...,N. j ._ If the system is non-loaded, fY (y;t)dt is computed as X f0 llOWS o i) The probability that BS (t) fails at time tj’ J j:O,l’ooe,N, is the fOIlOWingo N N _ H H exp - g H t dt 0 s O O 5 3:0 J 3:0 j j 11) The probability that no part parameter in B¥(t) fails to time t is the following. X exp -t 2 H i=1 rj Both events must be realized so that fY (y;t)dt is the x product Of the two terms. If the system is loaded, the failure rates become functions of the states and the expression for fY (y;t)dt becomes more complex, as shown below. X i) The probability that Bs.(t) fails at time tj is the J following, where xO represents state 0. N N TT II eXp - :E: H t. i=0 ”‘38: i=0 3383' 3 ii) The probability that no part parameter in B¥(t) fails 95 to time t is the following. 0 X exp -t1 2E:. HOr. 3:1 J iii) The probability that no part parameter in §¥(t) fails from time tj to tj+l’ j=O,l,...,N-l is the following. x exp -(tj+l-t3) 21:1 ijri iv) The probability that no part parameter in §¥(t) fails from time tN to t is the following. x exp -(t-tN) 21:1 Heri The product of all terms above is the required density function. In stating the expressions above, it was assumed that no part parameter is removed from a system because of the failure of other part parameters. If this assumption is not satisfied, extra factors must be inserted into some of the terms in Step iii. 96 Conclusions The failure analysis used to analytically predict the susceptibility of a redundant system to failure is dependent on three factors, viz., the failure state, the figures of merit, and the available time and money. The General Model furnishes a practical means for accomplishing a failure analysis at the level of sophistication and complexity that is dictated by these factors. If only catastrophic failure data is available, the one- failure and two-failure algorithms permit direct computation of reliability when the system is non-loaded. If drift data is also available, the General Model partitions the failure analysis into a set of drift problems which are re-united on the basis of catastrophic failure tendencies. A higher level of complexity is introduced when the system is loaded and when the drift is dependent. Correspondingly, both the amount of required data and the analytical complexity increase. Although only non-repairable systems are considered in this thesis, the theory of Markov chains used with loaded systems is directly adaptable to repairable systems. The difficulty of the drift problem is magnified in repairable systems, especially if the drift is dependent. Two extensions of the work in this thesis would be of great value. Firstly, the distribution of the system state variable for loaded systems should be studied when the lifetimes do not have exponential distributions. Secondly, methods for finding the joint distribution of operational part parameters 97 from their drift properties are needed when they are not statistically independent. APPENDIX A SOME BASIC DEFINITIONS FROM PROBABILITY THEORY These definitions are taken from Doob (56). Definition: Probability set function P Let f) be some basic space; let there be a certain col- lection of sets of points of {2 which are called measurable sets. The class of measurable sets is assumed to be a Borel field. A (set) function P is assumed to be defined for all measurable sets and is a probability measure, i.e., P is completely addi- tive, non-negative, and P(g?) = l. The number P(A) is called the probability or the measure of the (measurable) set A of points on , or w points. Definition: Random variable A (real) function x, defined on a space of w points, is called a (real) random variable if there exists a probability measure P defined on the w sets and if, for every real number a, the inequality X(w) .5 a delineates a measurable w set; i.e., F(A) is defined for all real a. F(a) = P[x(w) :5 a] Thus, a (real) random variable is a (real) measurable function. Definition: Stochastic process A stochastic process is any family of random variables 98 99 [x(t), t E T]. The notation used is that x(t) is the obser- vation at time t and T is the time range involved. The term "stochastic process" is usually applied only when the process involves infinitely many random variables. Historically, this term has been reserved for families of random variables with some simple relationship between the variables. Definition: Sample function of a stochastic process If [X(t). t 6‘ T] is a stochastic process, a function of t f T obtained by fixing w in x(t,w) and letting t vary is called a sample function of the process, where x(t,w) is the value of the random variable x(t) at the point w. Definition: Markov process and Markov chain A (strict sense) Markov process is a stochastic process [x(t), t 6f T] which satisfied the following condition. For any integer n 2 1, if tl < t2--=---<:n are para- meter values, the conditional probability distribution of x(tn) relative to x(tl),---,x(tn_1) is the same as that relative to X(tn-l) in the sense that for each (real) a, the following is satisfied with probability 1. P[X(tn).‘_‘a/ x(tl),---,x(tn_1)] = P[x(tn,):a/ X(tn-l)] Whenever Markov processes are discussed, strict sense is assumed. If the values of x(t) for any t are discrete in nature, the Markov process is called a Markov chain. Definition: Separable stochastic processes Let [x(t), t ET T] be a real stochastic process with linear parameter set T. Let H! be a system of linear Borel sets. Then, 100 the process is called separable relative to “f if the following condition is true. Let {t3} be a sequence of parameter values; let A be a w set of probability zero; let I be any open interval. Then, the w sets, [w: x(t,w) EB, t € IT], [wz x(tj,w) 6 B, t3. 6 IT], where BE‘ELi , differ by at most a subset of A. If \y is the class of all (finite or infinite) closed intervals, the process is called separable. If T is an infinite sequence, the stochastic process [x(t), t 67 T] is called a discrete random process or a random sequence. The general sample is a sequence. If T is an interval, the stochastic process is called a continuous parameter process. The general sample is a function of t defined on an interval. Markov chains which are random sequences are studied in detail by Feller (57). Both random sequences and continuous parameter Markov processes are considered by Bartlett (58). The main interest here is the continuous parameter process, considered in Appendix B. APPENDIX B CONTINUOUS PARAMETER MARKOV CHAINS WITH A FINITE NUHBER OF STATES This discussion is taken mainly from Doob (56). Let [x(t), O 2 t5“] be a (continuous parameter) Markov chain such that the random variables x(t) assume the values l,2,...,No Each value is called a state of the chain. If P(x(s,w) = i) " O, the probability, given state i at time s, that state 3 is found at time t is called a transition probability and written as pi (s,t). J pij(s,t) = P(x(t,w) = j / X(S,W) = i) , t 33 5 Two fundamental properties of transition probabilities are stated in Eq. (E.1) and Eq. (B.2). Let P(x(s,w) = i) " O. c? 2 , : (B.2) pik(s,u) = Zj pij(s,t)pjk(t,u), O E s < t < 11 Equation (8.2) is a special case of the Chapman-Kolmogorov equation. The sum is over those values of j for which pjk(t,u) is defined. Definition: Markov transition matrix function Let P(s,t) be the matrix with typical element pij(s,t). If P(s,t) satisfied Eq. (8.1) and Eq. (B.2) it is called a Markov transition matrix function. 101 102 A sufficient condition for the existence of a Yarkov process is now stated. Given a Markov transition matrix function, there exists a corresponding Markov process [x(t), O .f tO] which is obtained as follows. 1) Choose any initial probability distributions. P(x(O,w) = i) , i = l,2,...,N ii) For every finite t set, 0 = to < tl< < t , define the following. P(x(to,w) = a0, x(tlw) = al,...,x(tn,W) = an) = P(x(O,w) = a0) P(x(t1,w) = al/x(0,w) = ao) - - - P(x(tn,w) = an/x(tn_l,w) = an_l) = P(x(O,w) = a0) paoa1(0,t1) - - - pan-lan(tn_l,tn) If, for each pair (i,j) the transition probability pij(s’t) depends only on t-s whenever P(x(s,w) = 1) => 0, then the Markov process is said to have stationary transition probabilities. The two fundamental properties given in Eq. (B.l) and Eq. (8.2) then become those in Eq. (E.5) and Eq. (B.4). (3.3) pij(t) : O, z pij(t) : l , t > O 3 (B.4) pik(s+t) = 2E:j pij(8) pjk(t), s,t => 0 Definition: Stationary Markov transition matrix function Let P(t) be the matrix with typical element pij(t). If P(t) satisfied Eq. (B.3) and Eq. (B.4), it is called a stationary Markov transition matrix function. 103 A Markov chain defined in terms of a stationary Markov transition matrix function, with some initial probability distribution, will be called a stationary Markov chain. Only this type of chain will be discussed. Solutions for the transition probabilities are discussed below. Two general approaches may be used. The first considers equations resulting from the differentiation of Eq. (B.4). The resulting solutions are usually presented in matrix form. The second method, summarized below, attacks the problem from the viewpoint of the sample function. It is felt that this second method leads to greater understanding because of its more direct probabilistic interpretation. Definition: Step function A function g(t) will be called a step function if it satisfies all of the following conditions. 1) g(t) has only finitely many points of discontinuity in every finite closed interval; ii) g(t) is identically constant in every open interval of continuity points; iii) If tO is a point of discontinuity, either of the inequalities in Eq. (E.5) is satisfied. (E.5) 8(t0-):s(t0) .5 g(t0+) g(tO-):g(to) : g(tO+) Definition: Jump of a function A function g(t) is said to have a jump at the point tO if it is discontinuous there, and the one sided limits g(tO-) and g(to+) exist and satisfy one of the inequalities in Eq. (E.5). 104 Before considering solutions for the transition probabili- ties, some useful properties concerning stationary Markov chains are given. The following assumptions are used. If P(t) = [pij(t)] is a stationary Markov transition matrix function, it is assumed that the conditions in Eq. (E.6) is satisfied, where pi (t) is continuous when t - 0. j _ (E.6) lim _ . t "O Pij(t) — l for i — J The following conclusions are evident from Eq. (E.6). 1) lim t o pij(t) = o for i a 3. ii) p1j(t) is continuous for all t. iii) If j i i, (t) either vanishes identically or never vanishes, except when t = 0. Theorem 3,; Let[ (t)] be a stationary Markov transition matrix P13 function. Then lim t___Wpijw) exists for all i,j and the limit is approached exponentially fast. Theorem B.2 Let [pij(t)]be a stationary Markov transition matrix function satisfying Eq. (3.6). Then, the limit in Eq. (3.7) exists for all i. (3.7) lim 1 ' pii(t) _ t-—-o t ‘ qi Furthermore, if [x(t), O : t<<>0] is a separable process determined by[ (t)], together with an initial distribution, pi.j Eq. (3.8) is satisfied. 105 (3.8) p [x(T,w) i, to : T : to + a / mom) = i]: exp(-qia) Theorem B.3 Let [pij(t)] be a stationary Markov transition matrix function satisfying Eq. (E.6). Then, the limits in Eq. (B.9) exist and satisfy Eq. (B.10). p (t) (3.9) lim ii q , 1 ,1 j, t——-o t ij (B010) Z q , = q = O jii 13 i If, in addition, [x(t), O :5 t‘qxqis a separable process determined by [pi (t)] together with an initial distribution, j then there exists with probability 1 a sample function discon- tinuity which is a jump. In fact, there exists a first and a last discontinuity which are jumps. Theorem B.“ The sample function of a separable Markov chain with a finite number of states having stationary transition probabili- ties satisfying Eq. (E.6) are almost all step functions. Equations (B.7). (B.8), and (B.9) express three crucial points. The probability of no transition is seen to be asymp- totically linear as t approaches 0 from Eq. (B.7). From Eq. (B.8), the probability of no transition in some interval is seen to be a decreasing exponential function of the length of the inter- val. Finally, Eq. (B.9) shows that the probability of transition from one state to any other is also asymptotically linear as t approaches 0. These facts are used to derive the equations describing the transition probabilities. 106 a Let lpik(t) be the probability, given x(to,w) = i, that X(t0+t,w) = k and the transition from i to k is accomplished in exactly one step. This implies, from Theorem B.4, that the sample function is a step function which takes on the two values i and k and that there is only one jump discontinuity in the sample function. This jump point is labelled 5. Then, lpik(t) can be evaluated by the following considera- tions. 1) x( T,w) E i to the jump point 5. From Eq. (B.8), the probability of this event is exp(-qis). ii) There is a jump to k at the jump point 5. From Eq. (3.9), the probability of this is qikds. iii) x(T,w) E k from s to t. From Eq. (3.8), this probability is exp(-qis). iv) The probability of realizing all three of these events, for any 5 in the interval (O,t) is given below. lpik(t) =j;t qik exp [-qis -qk(t-s)]ds k # i = O k = i Now let npik(t) be the probability, given x(t0,w) = i, that x(to+t,w) = k and that the transition from i to k has been accomplished in exactly n steps. An induction process is used to derive this probability. However, two methods can be used to state the induction process, depending on how the sample function is viewed. In both methods, Eq. (B.11), which is based on Eq. (B.8), is used (t) (3.11) o , k g i Opik exp(-qit) , k = i 107 The first jump method Here, the sample function is viewed as having a first jump at s. The following considerations lead to the general term of the induction process. 1) The probability of no jump until 8 is exp(-qis) from Eq. (B.8). ii) The probability of a jump at s from state i to any state j i i is qijds from Eq. (B.9). iii) The probability of going from j to the final state of interest k in exactly n steps is (t-s). npjk iv) The above three events can be realized for any state j ¥ 1. Thus, the general term of the induction process is given by Eq. (3.12). t _ q.. exp(-q s) p. (t-s)ds, n 1' O (3.12) n+1 pik(t) - :E:j#i .j; 13 i n 3k Thg_last jump method_ In this method, the last jump of the sample function is assumed to take place at s. The general term of the induction process is derived as shown below. i) The probability of going from the initial state i to any other state j i k in exactly n steps is n1313(5). ii) The probability of jumping to state k from state j i k at the last jump s is qikds. iii) The probability of no jump between 5 and t is exp[-qk(t-s)]. iv) The above events can be realized for any state j # k. Thus, the general term of the induction process is given by Eq. (3.13). 108 t (B.15) n+1pik(t) = 23.1% L qjk exp -qk(t-s) npij(8)ds’ n I 0 Since, by Theorem (B.#), almost all sample functions are step functions, a general expression for any stationary transition probability is given by Eq. (B.14). 00 (3.14) pik(t):=:E: o npik(t) n: N. The equations for either of the induction processes together with Eq. (B.1h) provide an explicit algorithm by which, given the qi's and qij's, the transition probabilities can be computed. Specifically, the two algorithms are represented by Eq.'s (3.15) and (3.16). t (B.15) pik(t) = (Sik exp(-qit) + 25:3g1 -j; qij exp(-qis)pjk(t-s)ds Equation (B.16) is obtained from Bq.'s (B.11), (B.13), and (B014) 0 t (B.16) pik(t) = 61k exp(-qit) + ngk qukexp[-qk(t-s)] pij(s)ds In Eq. (B.15) and Eq. (B.16), (Sik is the Kronecker delta function defined below. = O , i f k An alternate method of solution is found from the deriva- tives of Eq. (B.15) and Eq. (B.16). 109 The set of equations in Eq. (8.17), called the ”backward system", results from differentiating Eq. (8.15). (3.17) pii(t) = .q1 pik(t) + :E:j¥i qij pjk(t) ; i,k = l,...,N The set of equations in Eq. (8.18), called the "forward system", results from differentiating Eq. (8.16). (3.18) pii(t) = -pik(t) qk + 2E:j¢k qjk pij(t) ; i,k e l,...,N The initial conditions for both systems are shown in EC}. (B019). (B019) (0) = 1 , 1 : 3 P13 = O i i j From Eq. (8.17) or Eq. (8.18) together with Eq. (8.19), it can be seen that the q's and qid's determine the transition probabilities uniquely. It can be shown that Eq. (8.17) and Eq. (8.18) with initial conditions in Eq. (8.19) have a unique solution satisfying Eq. (8.1) and Eq. (8.2) if Eq. (8.10) is satisfied. The 's and qij's themselves must be known. qi However, they should be deducible either experimentally or theoretically using Eq. (8.7) and Eq. (8.9). That is, from Eq. (8.7), the probability of no transitions in time dt is l - qidt, where second order terms are neglected. Similarly, the probability of a transition from i to j in time dt is qijdt, again neglecting second order terms. APPENDIX C THE APPLICATION OF FARKOV CHAINS TO THE ANALYSIS OF LOADED SYSTEMS In this appendix, the q's which describe a Markov chain are expressed in terms of the lifetime distributions of part parameters for loaded systems. The part parameters are assumed to have one failure mode. The lifetime density function fir for part parameter Br(t) when the system is in state i is assumed to be of the form shown in Eq. (0.1). (0.1) f1r(2) . > 0 _ Hirexp(-Hirz) if z _ O and i E Z0 and Xr(t) - 0 =0 ifz0 pij(t) e ano npij(t) Since 1 g j, (t) = O. Opij Since the system has only a finite number of states and since a state cannot be reached after the system has left it, this sum has only a finite number of terms, say N. Then, qij can be expressed as follows. - 11” (l/t) N - (t) - N 11“ (t)/t qij - t_’0 :1 up“ — n=l t—“OFHPij ] n The term lpij(t) is computed below. The dummy variable 5 is some time for which the system is in state i, but which is less than t. i) The probability that the part parameter Brl(t) fails at time s is the following. firl(s) ds = Hirlexp(-Hirls) ds ii) The probability that no other part parameter fails to time s is the following. 00 n Tr f. (2) da = exp(-H ) 211:2 5 1r 1 s m n H = H 1 Z m=2 irm iii) The probability that no other part parameter fails from s to t is the following. 113 n 00 ff J[ f3 (2) dz = exp -H (t-s) ...—2 [2 1 n H = H. 2 Z m=2 er These three events can be realized for any 5. t 1pij(t) = J2 Hirl exp(-Hes) exp[-H2(t-s)] ds n H = H + H, = H g H e l 1rl Zmz-l irm 2 lpij(t) e [Hirl / (Ha-He)][exp(-Het) - exp(-H2t)] The first term in the expression for qij can now be found. lim t-*-O (c.3) lfiiifl _ 11” H /(H H ) H ( H t) H ( H t) t ‘ t——4’O [ irl 2‘ e ' eeXp ’ e + aexp ' 2 : H, 11‘ l The following can be shown. . p (t) 11m n ij _ >, t C t - O for n ._ 2 Thus, qij is given by Eq. (C.3). qij = Hir 1 That is, if non-failed state j can be reached from state i in one step, qij is the failure rate of the part parameter whose failure would cause the transition from i to j. If j = 21 and if j can be reached from i in one step, there may be a set of part parameters such that if any one fails, 114 the system goes from i to Z If this set of part parameters is 1. Brl,...,8rv, qij is given by Eq. (C.4). V (Colt) qij =Zmzl Hirm APPENDIX D JUSTIFICATION OF THE TWO-FAILURE ALGORITHM When all part parameters can fail in either of two modes, and when drift is neglected, the two-failure algorithm presented in Chapter III can be used to find reliability expressions. The theoretical justification for this algorithm is demonstrated in this appendix. The four terms used in the two-failure algorithm are defined below. The system is assumed to have k part parameters. p(1)(l;t) = P(A(t) = l / Xj(t) g 2, j=l,...,k) p(l)(0;t) = P(A(t) e o / Xj(t) # 2, j=l,...,k) p(2)(1:t) = P(A(t) e 1 / Xj(t) g l, j=l,...,k) p(2)(0;t) = P(A(t) o / Xj(t) g 1, j=l,...,k) Since A(t) can assume only the values 0 and 1, Eq. (D.l) is obtained. (D.la) p(l)(l;t) l - p(l)(0;t) (D.lb) p(2)(l;t) = l - p(2)(0;t) Equation (D.2) is obtained from Theorem D.l, given below. (D.2) pA(l;t) = p(l)(l;t) + p(2)(l;t) Equation (D.2) and the definition of a density function produce the following expression. pA(O;t) = l - pA(l;t) = l - [p(1)(l;t) + p(2)(l;t)] The following is found by substituting Eq. (D.l) into this 115 116 relation. (l)(0;t) + p(2)(0;t) - l pA(O;t) = p The last two equations are used in the final step of the two-failure algorithm. Thus, the proof of Theorem D.l completes the justification of this algorithm. Theorem D.l Let the value of the density function of A(t) at A(t) = 1 be given by Eq. (III.l7b). k (III.l7b) p (l;t) e E Tr p (x ;t) A X i Z 1:]. i l The sum in Eq. (III.l7b) extends over all possible sets of parameter state variable values for which the system is failed, even if some are sets of measure zero. (1) (2) . . Let p (l;t) and p (l;t) be computed as prescribed in Steps 1 and 2 of the two-failure algorithm. Then, Eq. (D.2) is satisfied. (D.2) pA(l;t) = p(l)(l;t) + p(2)(l;t) Proof: The following notation is used in this proof. A 2-tab1e is a (partial) reliability table which lists all possible sets of parameter state variable values for which A(t) = 1 when each part parameter has two failure modes; i.e., each row represents one term in Eq. (III.l7b). A 1-0pen table is the table used to compute p(l)(l;t) is Step 1 of the two-failure algorithm. Similarly, a leshort table is used in Step 2 to compute p(2)(l;t). 117 The following relations are defined. (1 A )(l;t) + p(2)(l;t) pA(l;t) = p A (l) k (D.3a) pA (l;t) = Tr pX (xi;t) l (2) k (D'Bb) pA (l;t) =2 7T pX (x1;t) 2 The one subscript in Eq. (D.5a) means that the sum extends over all possible sets of parameter state variable values for which the system is failed Open; the two subscripts in Eq. (D.5b) cover short system failures. Each row in the 2-table is included in one and only one sum in Eq. (D.3). The method of proof is tOthow, on the basis of the reliability tables, that Eq. (3.4) is satisfied. (3.4.) p§1)(l;t) . p(1)(l;t) (D.4b) p§2)(l;t) = p(2)(l;t) Proof for Eq. (D.#a) For simplicity, the method of proof is demonstrated for the reliability diagram of Fig. (111.5). The table below shows entries in the l-open and 2-tables which produce cor- responding terms on the two sides of Eq. (D.4a). Only that portion of the 2-table for which the system is failed Open is shown. 118 l-open table (for pkl)(l;t)) 2-table (for pi1)(l;t)) X3(t) X2(t) X1(t) X3(t) X2(t) X1(t) l 1 l l l l l 2 O 1 l O l l 2 l l 3 I O l l O l l 2 l 4 l l O l l O l l 2 5 O l O O l O O l 2 2 1 O 2 1 2 The following table shows the terms which are computed from the separate entries in the preceeding table. For con- venience, pxi(j;t) is written as p1(j). The terms from the l-open table are computed by Step 1 of the two-failure algorithm; those from the 2-table, by Eq. (III.l7b). The following identity is used. 119 Entry p(l)(l;t) pél)(l;t) 1 p3(l)p2(l)p1(l) p3(1)p2(1)p1(1) 2 (l-p3(1))p2(1)pl(l) (p3(0)+p3(2)p2(l)p1(1) = (l-p3(1)p2(1)p1(1) 3 p3(l)(l-p2(l))p1(l) p3(l)(p2(0)+p2(2))p1(1) = p3(1)(l-p2(l))p1(l) 4 P3(l)p2(1)(1-pl(l) p3(l)p2(1)(p1(0)+p1(2)) = p3(1)p2(1)(l-p1(1)) 5 (l-p3(l))(l-p1(l)) p3(0)(p1(0)+p1(2))p2(l) ° p2(l) + p3(2)(p1(0)+p1(2))P2(l) = (l-p3(0))(l-p1(l))p2(l) This proof hinges on the fact that if the system is failed open when all the part parameters in a certain set are failed open and the remaining part parameters are non-failed, then the system is failed Open no matter what states the remaining part parameters assume. This can be seen by noting that, when the system is failed open, its reliability diagram is separated into two parts which cannot be joined together by assuming that any of the remaining part parameters have failed short. This fact is used to make the correspondences in the first table. For instance, in entry 5, the state of 82(t) cannot change the fact that the system is failed open. Similarly, in entry 5, the system is failed open for any states of 81(t) and 83(t). A similar proof can be stated for Eq. (D.#b). APPENDIX E DERIVATION OF AN APPROXIMATION TO THE MEAN AND VARIANCE OF A FUNCTION OF RANDOM VARIABLES In this appendix, an approximation to the variance, s2, and mean, m, of the random variable A defined in Eq. (IV.8) is derived for use with the Tchebycheff inequality. The first step involves the Taylor series expansion of the function v(bl,...,bk) about a point (bl,...,bk) = (z1,...,zk) which is given in Eq. (E.1). (E.1) v(b1,...,bk) = v(zl,...,zk) (>0 k j + z (l/j!) Z (br-zr)a/&b v(zl,...,zk) jzl r:1 The notation in Eq. (E.l) is interpreted as follows. k 3 2E: hr / bj v(b1,...,bk) r=l z! j' k c1 V(bl,eee,bk) = . (h ) c c c c !c !---c ! nr 1 1 2 k l 2 k 1.1 b1 b2 --- bk The prime on the summation sign indicates that the sum extends over all set of values (cl,c2,...,ck) for which the following is satisfied. :E:k c = j and O :5 c. :5 j i=1 i i 120 121 In the second term on the right of Eq. (E.1), the argu- ments (21,...,zk) are used with v to signify that the partial derivatives are evaluated at the point (21,...,zk). The Taylor series expansion of the function in Eq. (IV.8) about the point (81,...,Bk) = (m1,...,mk) is given by Eq. (E.1) if 21 is replaced by mi 1 terms for which j Z 2 are neglected, Eq. (B.2) is obtained. , where m is the mean of Bi. If all (B.2) v(Bl,...,Bk) - v(ml,...,mk) k 22 (Br-mr) vr(ml,...,mk) r=l In Eq. (8.2), the following notation is used. Vr(ml,...,mk) = (6/8 hr) V(Bl,eee,Bk) The partial derivative is evaluated at the point (m1,...,mk). In order to approximate the variance of the random variable A, the number Si ' A 1’ ' ' k 1’ "m1: is defined by the expectation in Eq. (8.3). Equation (E.4) results from substituting Eq. (B.2) into Eq. (E.3) and taking the expectation. k 2 2 2 (E.4) SA E vr(ml...,mk) sr r=l + 222E: vrl(ml,...,mk) vr2(m1,...,mk) Cov(BrlBr2) In Eq. (E.h), s: is the variance of Br and the double prime implies that the sum extends over all terms for which the following is satisfied. 122 < < r >r1andl_rl,r2_n 2 If the 8's are statistically independent, all covariance terms are zero. In order to use Eq. (E.#) as an estimate of the variance of A, the approximation in Eq. (E.5) must be made. (E.5) v(ml,...,mk) é E[v(81,...,Bk)] Equation (E.5) is exactly satisfied if v is a linear function. For some non-linear functions, this might produce an extremely poor approximation, as shown by the following example. If B has a normal distribution with mean zero, then the left side of Eq. (E.5) is zero for the 82 function. However, A has a chi-squared distribution with one degree of freedom. Thus, the right side of Eq. (E.5) is unity. Equation (E.5) has been used by various authors, e.g., Whiteman (59), Dreste (5), and Krohn (60). Dreste and Krohn suggest using Eq. (E.5) under the assumption "...that component-part value ranges are not large relative to the nominal value..." Assuming that the 8's are statistically independent, that the higher order terms in Eq. (E.1) can be neglected, and that Eq. (E.5) is satisfied, the estimate for the variance is obtained from Eq. (8.4) as shown in Eq. (E.6) r—I The estimate for the mean is given by Eq. (E.5) "0 (E.6) 52 2 Si LIST OF REFERENCESl Koenig, P. Transistor reliability and Air Force require- ments. Proc. of Transistor Reliability Symposium. New York University Press. 1956. Perry, D. Component reliability: what's needed? Missiles and Rockets. 2:50. July, 1959. Saltz, M. Methods for evaluating reliability growth and ultimate reliability during development of a complex system. Proc. FNSRQC. 89-98. Greene, E. Designing and building reliability into control systems. District Conference Paper, AIEE Empire Tri- District Meeting. May, 1962. Dreste, F. Circuit design concepts for high reliability. Ryerson, C. The reliability and quality control field from its inception to the present. Proc. IRE. 29:1521-1358. May, 1962. Chorafas, D. N. Statistical Processes and Reliability Engineering. D. Van Nostrand Co. Princeton, N. J. 1960. Epstein, 8. Estimation from life test data. IRE TRQC. RQC-2:lOQ-107. April, 1960. Proc. Proc. 1The following abbreviations are used in this list. FNSRQC. Proceedings of the Fifth National Symposium on Reliability and Quality Control in Electronics (I.R.E.). 1959- SNSRQC. Proceedings of the Sixth National Symposium on Reliability and Quality Control in Electronics (IoRoEe)e 1960 IRE TRQC. Transactions of the Professional Group on Reli- Proc. ability and Quality Control (I.R.E.). NYUCRT. Proceedings of the New York University-Industry Conference on Reliability Theory. 1958. 123 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 124 Feyerherm, M. Basic reliability considerations in electronics. Proc.FNSRQC. 119-125. Connor, J. Prediction of reliability. Proc. SNSRQC. 134-154. Flehinger, 8. System reliability as a function of system age; effects of intermittent component usage and periodic maintenance. 0p. Res. 8:30-44. Jan.-Feb., 1960. Ashcraft, W. and Hochwald, w. Design by worst-case analysis; a systematic method to approach specified reliability requirements. IRE TRQC. RQC-10:15-21. Nov., 1961. Suran, J. Effects of circuit design on system reliability. IRE TRQC. RQC-lO:l2-l8. March, 1961. Patterson, 4. Design techniques for upgrading the reliability of weapon systems during flight-readiness check out. IRE WESCON Conv. Rec., pt. 6. 3-9. 1958. Brown, G. and Dennis, R. Electronics reliability. Missiles and Rockets. 2:172-176. Oct., 1957. Drenick, R. Mathematical aspects of reliability problems. Soc. Indus. and Applied Math. J. 8:125-149. March, 1960. Hellerman, L. and Racite, M. Reliability technique for electronic circuit design. IRE TRQC. IGRQC-l4:9-l6. Sept., 1958. Davis, H. Report on a reliability program for an analog computer. Proc. FNSRQC. 79-82. Luebbert, w. Dynamic failure control for military elec- tronics. IRE TRQC. PGRQC-10:43-42. June, 1957. Flehinger, 8. Reliability improvement through redundancy at various switching levels. IRE National Conv. Rec., pt. 6. 137-151. 1958. Von Neumann, J. Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata Studies (Annals of Mathematical Studies No. 34), 43-98. 1956. Depian, L. and Grisamore, N. Reliability using redundancy concepts. IRE TRQC. RgC-9:53-60. April, 1960 Moore, E. and Shannon, C. Reliable circuits using less reliable relays, part I. J. of Franklin Institute. 332 191-208. Sept. 1956. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35- 36. 37. 38. 39. 125 Moore, E. and Shannon, C. Reliable circuits using less reliable relays, part II. J. of Franklin Institute. 262:281-2970 Octe’ 1956. Koenig, H. E. and Blackwell, W. A. Electromechanical System Theory. McGraw-Hill Book Co., New York. 1961. Bazovsky, I. Reliability-Theory and Practice. Prentice- Hall, Englewood Cliffs, N. J. 1961. Rosenblatt, J. On prediction of system performance from information on component performance. Proc. of Western Joint Computer Conf. 85-93. 1957. Rosenblatt, J. On prediction of system behavior. IRE TRQC. RQC-9z23-28. Dec., 1960 Benner, A. and Meredith, 8. Designing reliability into electronic circuits. 'Proc. N.E.C. X:137-145. 1954 Birnbaum, Z. Life length of materials as a stochastic process. Proc. NYUCRT. 1-6. Drenick, R. The failure law of complex equipment. Soc. Indus. and Applied Math. J. 8:68o-69o. Dec., 1960. Flehinger, B. and Lewis, P. Two-parameter lifetime distri- butions for reliability studies of renewal processes. Proc. NYUCRT. 7-22. Barlow, R. and Hunter, L. Mathematical models for system reliability (first part). Sylvania Technologist. 12:16-31. Jan., 1960. Barlow, R. and Hunter, L. Mathematical models for system reliability (second part). Sylvania Technologist. 12:55-65. April, 1960. Barlow, R. and Hunter, L. Criteria for determining Optimum redundancy. IRE TRQC. RgC-9:73-77. April, 1960. Rohn, W. Reliability prediction for complex system. Proc. FNSRQC. 381-388. Ashar, K. Probabilistic model of system operation with a varying degree of spares and service facilities. 0p. Res. 8:707-718. Sept.-Oct., 1960. Roberson, R. An approach to system performance prediction. J. Franklin Institute. 268:85-105. Aug., 1959. Blanton, H. Reliability prediction technique for use in the design of complex systems. IRE National Conv. Rec., pt. 10. 68-79. 1957. 40. 41. 42. 43. 41+. 45. 46. 47. 48. 49. 50. 510 520 53- 54. 55. 56c Radner, R. 126 Limit distributions of failure time for series- parallel systems. Proc. Wirth, J. NYUCRT. 163-186. Time domain models of physical systems and existence of solutions. University. Caldwell, S. H. John Wiley and Sons, New York. 1962. Ph. D. Thesis. Michigan State Switching Circuits and Logical Design. 1958. Lipp, J. Topology of switching elements vs. reliability. PGRQC-lO:2l-33. June, IRE TRQC. Moskowitz, pt. 1. 22: Weiss, G. and Kleinerman, M. 1957. F. Analysis of redundancy networks. AIEE Trans., 627-632. Nov., 1958. Proc. N.E.C. X:128-136. Balaban, H. 1954. On the reliability of networks. Some effects Of redundancy on system reli- ability. Proc. SNSRQC. Price, H. IRE TRQC. Cramer, H. University Middleton, Taussky, 0. random numbers. 388-402. Reliability of parallel electronic components. April, 1960. RQC-9:35-39. Mathematical Methods of Statistics. Princeton Press. 1946. D. An Introduction to Statistical Communication Theory. McGraw-Hill Book Co., New York. 1960. and Todd, J. Generation and testing of pseudo- Wiley and Sons, New York. 15-28. Murphy, Re obstaining Heyne, J. Conv. Rec., Symposium on Monte Carlo Methods. John 1954. Some statistical techniques in setting and tolerances. On an analytical design technique. IRE National 122. 1958. pt. 6. 121- Gray, 8. Application of piecewise approximation to reli- ability and statistical design. July, 1959- Golubjatnikov, 0. equipment design. Proc. FNSRQC. IRE Proc. 42:1226-1231. Systematic approach to transistorized 21-35. Xavier, M., Schneider, L. and Gottfried, P. Utilization of component part reliability information in circuit design. IRE TRQC. Doob, J. L. New York. Stochastic 1952. PGRQC-l4:60-68. Sept. Processes. 1958. John Wiley and Sons, 57. 58. 59. 60. 127 Feller, W. An Introduction to Probability Theory and Its Applications, Vol. 1. 2nd Ed. John Wiley and Sons, New York. 1957. Bartlett, M. S. An Introduction to Stochastic Processes. Cambridge Univorsity Press. 1960 Whiteman, I. Reliability starts with the design. Proc. FNSRQC. 98-102. Krohn, C. Reliability analysis techniques. Proc. IHE. 58:179-192. Feb. 1960. f A .1 mil" IIIIIIII EEEEEEEEEEEE I!I/I(IL/m@Wflflflrfljtfilfliflflfljl[filmI;