INVERTER RELIABILITY IN PV SYSTEMS: STATE-SPACE MODELING AND BAYESIAN ANALYSIS By Josue S´anchez A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Electrical and Computer Engineering—Master of Science 2025 ABSTRACT The push for cleaner energy sources, coupled with declining costs, has facilitated the massive deployment of solar photovoltaic (PV) systems in electric grids worldwide. At the heart of any PV system is the inverter, a device responsible for converting DC power captured by solar cells into AC power suitable for grid use. In recent years, reliability concerns have emerged regarding inverters, with multiple reports identifying central and string inverters as the primary culprits in most forced outages in PV systems. Inverter failures significantly hinder energy production, potentially reducing it to zero. Thus, estimating the reliability of these devices is crucial for forecasting the long-term performance of PV systems. In this thesis, we develop a state-space reliability model to characterize the failure be- havior of string inverters, using a limited and heterogeneous failure dataset from residential and commercial PV systems in the U.S. Despite the data constraints, the proposed model successfully captures both decreasing and increasing failure rate behaviors observed in the data. Additionally, we derive an exponential approximation of the model, enabling system- level reliability evaluation via Markov Reward Models (MRM). To address the uncertainty inherent in limited datasets, we adopt a Bayesian framework, which is better suited for un- certainty quantification under data scarcity. This approach allows us to compute credible intervals on expected energy production by propagating parameter uncertainty through the MRM. Our findings indicate that, although parameter uncertainty is non-negligible, its im- pact on expected long-term energy yield remains limited—primarily due to the relatively fast replacement of inverters compared to their average time to failure. Lastly, since the failure rate is an important quantity for reliability optimization and risk assessment, we establish a method for detailed failure rate estimation, providing deeper insights into the failure process. Following this approach, without relying on any major assumptions, the model estimations confirm our assumptions of a bathtub-like failure rate behavior. To my parents, my brothers, and Ana iii ACKNOWLEDGEMENTS I am profoundly grateful to my advisor, Dr. Joydeep Mitra, whose invaluable guidance, extensive knowledge, and insightful advice have been instrumental for my academic develop- ment. I am also deeply thankful to Dr. Mohammed Ben-Idris for serving on my committee and for generously sharing his expertise throughout various research endeavors, which have greatly enriched my academic experience. My sincere appreciation goes to Dr. Shanelle Foster for her participation in my committee. Special thanks to Argonne National Laboratory for sponsoring part of this research and providing the failure dataset, as well as to Dr. Shijia Zhao for his valuable feedback and thoughtful guidance. I am truly grateful to my family for their constant support and for always cheering me on throughout my journey at MSU. To my beloved Ana, thank you for your endless encouragement, patience, and love. iv TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Motivation & Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 2 PV SYSTEM RELIABILITY . . . . . . . . . . . . . . . . . . . . 2.1 PV system as a multistate system . . . . . . . . . . . . . . . . . . . . . . 2.2 Reliability block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 3 RELIABILITY MODELING OF INVERTERS . . . . . . . . . . . 3.1 Developing reliability models . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Description of the failure database . . . . . . . . . . . . . . . . . . . . . . 3.3 Non-parametric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Piecewise exponential distribution . . . . . . . . . . . . . . . . . . . . . . 3.5 Maximum likelihood estimator . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Fitting failure distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Proposed model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 4 SUBSYSTEM RELIABILITY MODEL . . . . . . . . . . . . . . . 4.1 Markov Regenerative Process . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Exponential approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Subsystem model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Markov Reward Models 4.5 Rewards for estimating energy yield . . . . . . . . . . . . . . . . . . . . . 4.6 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 5 BAYESIAN INVERTER RELIABILITY MODELING . . . . . . . 5.1 Basics of Bayesian analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Rethinking the PWE distribution under Bayes . . . . . . . . . . . . . . . Interval-based failure rate analysis . . . . . . . . . . . . . . . . . . . . . . 5.3 CHAPTER 6 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 6 7 8 8 11 13 13 14 15 18 19 20 21 25 28 28 29 32 35 36 36 39 39 41 48 55 58 v LIST OF TABLES Table 1.1 Weibull parameters for annual failure rates reported in [1] . . . . . . . . . Table 3.1 Comparison of distributions according to AIC . . . . . . . . . . . . . . . . Table 5.1 Initial priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 5.2 Posterior summary statistics using normal prior . . . . . . . . . . . . . . . Table 5.3 Posterior summary statistics using gamma prior . . . . . . . . . . . . . . Table 5.4 Posterior summary statistics using exponential prior . . . . . . . . . . . . 3 21 42 43 43 43 vi LIST OF FIGURES Figure 1.1 Residential grid-tied PV system [2] . . . . . . . . . . . . . . . . . . . . . Figure 2.1 Reliability block diagram of subsystems . . . . . . . . . . . . . . . . . . . Figure 3.1 Histogram of failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.2 Kaplan-Meier estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.3 Cumulative failure rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.4 Visual identification of change points . . . . . . . . . . . . . . . . . . . . Figure 3.5 Survivor function of each failure distribution estimated . . . . . . . . . . Figure 3.6 Proposed reliability model . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.7 System availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 4.1 Exponential approximation of a deterministic transition for different coefficients of variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 4.2 Exponential inverter model . . . . . . . . . . . . . . . . . . . . . . . . . Figure 4.3 Availability of both inverter models . . . . . . . . . . . . . . . . . . . . . Figure 4.4 Subsystem model with string inverter . . . . . . . . . . . . . . . . . . . . Figure 4.5 Subsystem model with microinverter . . . . . . . . . . . . . . . . . . . . 2 12 15 17 18 20 22 26 26 32 33 33 34 34 Figure 4.6 Yearly energy production of both system configurations for varying in- verter mean time to repair (A=40 days, B=60 days, and C=105 days) . . 37 Figure 5.1 Comparison of the survivor function with the Kaplan-Meier estimate for a normal prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Figure 5.2 Comparison of the survivor function with the Kaplan-Meier estimate for a gamma prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Figure 5.3 Comparison of the survivor function with the Kaplan-Meier estimate for an exponential prior . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Figure 5.4 Expected energy production with a 40-day mean time to inverter repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Figure 5.5 Expected energy production with a 60-day mean time to inverter repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 vii Figure 5.6 Expected energy production with a 105-day mean time to inverter repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 5.7 MLE estimation of PWE distribution: change points every year . . . . . Figure 5.8 MLE estimation of PWE distribution: change points every 100 days . . . Figure 5.9 Bayesian estimation of PWE distribution: change points every year . . . 47 49 49 50 Figure 5.10 Bayesian estimation of PWE distribution: change points every 100 days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Figure 5.11 Survivor function of PWE distribution with changepoints placed every 100 days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Figure 5.12 Failure rate estimates of PWE distribution with change points placed every 100 days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Figure 5.13 Survivor function of PWE distribution with changepoints placed every 50 days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Figure 5.14 Failure rate estimates of PWE distribution with change points placed every 50 days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 viii CHAPTER 1 INTRODUCTION Electric grids are facing two important challenges, an ever-increasing demand for electricity and higher pressure to reduce their reliance on fossil-based generation amid global warming concerns. The combined effect of these two challenges, along with declining costs, is driving the integration of more and more renewable energy resources (RERs) in an attempt to reduce the carbon footprint associated with electricity generation and ensure sufficient supply to meet the load. From the pool of RERs available, solar photovoltaic (PV) is the most widely deployed. According to the International Energy Association (IEA), in 2023 around 407–446 GW of solar PV were installed globally. In the U.S. alone, 32 GW were installed, closing the year with a cumulative installed capacity of 137.5 GW [3]. Utility-scale installations make up most of the newly installed capacity, but residential installations have been growing steadily [4]. Commercial and residential PV systems give customers the opportunity of reducing their bills, and have more control over their electric supply. However, as more of these systems are installed, concerns about their reliability and how it can affect their expected perfor- mance have become critical. Moreover, the inverter, one of the most critical components of the system, has been deemed the most failure-prone [2, 5–8]. Since PV systems generate electricity at no cost as long as there is sunlight, the amount of time it is fully functional (i.e, its availability) has a direct impact on the returns of investing in these systems. Reliability analysis of these systems requires improved models and comprehensive ap- proaches that consider the effects of component failure, along with other significant factors such as solar panel degradation. This thesis develops a state-space reliability model to char- acterize the failure behavior of string inverters using data and integrates this model into an analytical framework based on Markov Reward Models. A Bayesian approach facilitates accounting for parameter uncertainty, making it possible to establish confidence bounds for the expected energy production. Additionally, the thesis establishes a method for detailed 1 Figure 1.1 Residential grid-tied PV system [2] failure rate estimation, providing deeper insights into the failure process and confirming a bathtub-like failure behavior. 1.1 Motivation & Challenges A PV system is made up of several components, including solar panels, mounting struc- tures, watt-meters, inverters, and protective devices. Most small PV systems are grid-tied, meaning they operate connected to the electric grid. The basic configuration of a grid-tied PV system is shown in Figure 1.1. The system design is determined by the type of inverter used. Residential and commercial PV systems can be designed to work with string or mi- croinverters, but they are typically designed to use string inverters because of their lower cost [2]. Several studies have analyzed failure records of PV systems and concluded that inverters are responsible for the majority of forced outages. In [7] failure data were collected by Sandia National Laboratory from PV systems in the U.S. of various sizes and grouped in four portfolios. Portfolio D consists of mostly small systems of up to 500 kW. In all portfolios, inverter failures are the main cause of maintenance tickets, accounting for 50% of all failures in portfolio D. Gunda et al. [1] analyze a dataset of 55,000 inverter-related maintenance records and ap- ply machine learning techniques to classify each entry into a distinct failure mode. However, because of data constraints, the failure distributions provided are only for time until the 2 first failure. The parameters of the distributions for some failure modes are shown in Table 1.1. The shape parameter of all distributions is less than one, which indicates a decreasing likelihood of failure. It is worth noting that no information about the rating of inverters is provided, and all systems under study were located in the U.S. Table 1.1 Weibull parameters for annual failure rates reported in [1] Inverter Subsystem Shape (α) Scale (β) Communications Ground Faults Heat Mgmt. Systems IGBTs 3.29 3.60 3.35 6.01 0.69 0.77 0.93 0.81 A study conducted by Golnas [5] analyzed energy production in 600 PV systems during a period of 27 months and found that inverter-associated outages accounted for 36% of energy not produced. Furthermore, a comparison between central-, string-, and microinverter-based systems is done in terms of a normalized metric called tickets per inverter-year; showing string systems to be far more reliable than central inverters, but having 3 to 4 times higher tickets per inverter-year than microinverters. The largest PV system failure database found in the literature is presented in [8]. The dataset consists of 100,000 PV systems in the US and over 7 GWDC of total capacity, collected for five years. Again, inverters make the top of the list in the number of failures for all three system types (residential, commercial, and utility). However, the impact on energy production of inverter failures is significantly higher in residential systems than in commercial and utility systems. The authors argue that this is because of the closer supervision and monitoring in larger systems. Additionally, residential systems may be leased by the property owner, which means that the property and system owners need to coordinate repair schedules, potentially increasing the time to resolve hardware issues. A possible explanation why inverters fail so often is given in [9] and it is a consequence of three factors. First, inverters have to carry out a multitude of functions with little to no redundancy, and under harsh environmental conditions. Secondly, the pressure to reduce 3 costs in inverters forces some manufacturers to rely on cheaper materials that may not provide the required resistance to achieve the expected life of the component. Lastly, inverters are operated through software that is sometimes updated remotely, and these updates can cause malfunctions when not enough testing is carried out prior to rollout. There is evidence in the literature to believe that inverters are the most fragile component in PV systems, and their failures affect significantly energy production; this is true partic- ularly for residential systems. When a string inverter fails, energy production is severely hindered, even dropping to zero [1]. Therefore, accurate modeling of inverter failures is crucial to estimate the reliability of PV systems. This not only will allow a better under- standing of how they fail, but also to assess more accurately the impacts of reliability on the economics of PV systems [2]. Reliability studies concerning PV systems have long relied on two kinds of analysis, analytical and simulation-based. The simulation-based approach is conducted by means of a Monte Carlo Simulation (MCS), easily accommodating any effects that could not be modeled analytically. Regardless of this, accurate reliability models are still required to obtain correct results, which has been difficult to achieve in power systems because of the lack of data [10]. However, only steady-state numerical results can be obtained from MCS. Ignoring transient behavior can lead to an under- or overestimation of the risk associated with system failure [11]. Most analytical models used for PV system reliability analysis rely on Continuous-time Markov Chains (CTMCs) to derive expressions for system availability and other performance metrics [12–14]. An inherent assumption of CTMCs is that the failure rate of components remains constant. CTMCs are commonplace in power systems reliability because failure data is usually presented as Mean Time to Failure (MTTF) and Mean Time to Repair (MTTR). However, assuming a constant failure rate limits the capacity of these models to account for the impacts of aging or the occurrence of premature failures, potentially leading to inaccurate estimations of system availability [10]. 4 The constant failure rate assumption may not be appropriate for inverters for two reasons. First, inverters, as any other power converter, include capacitors and power switches as their building blocks, which are known to wear and fail prematurely, causing their failure rate to be non-constant [10]. Secondly, previous research has analyzed inverter failure data and estimated distributions for various failure modes that have time-varying failure rates [1, 7]. As seen in Table 1.1, the shape parameter of these distributions indicates that inverters are less likely to fail later on than just after being put to operation; an indication of decreasing failure rate. The impacts of non-constant failure rate behavior have been addressed extensively in the literature of power system reliability to model the long-term effects of aging [15–17] and account for component condition [18–20]. These works focus more on the long-term performance rather than the dynamic failure behavior, which also provides a lot of insights into how the component is failing. Nevertheless, very few studies in power systems have examined components where failures are followed by replacements, such as inverters. In [21], the authors propose using a Semi-Markov Process (SMP) to model the time- varying failure rate due to aging, along with a CTMC for random failures. They calculate the total component availability as the product of the individual availabilities. The SMP employs a Weibull distribution for time to failure, and both models assume an exponential time to repair. While the SMP demonstrates high accuracy, it requires numerically solving a system of integral equations in the time domain, which makes it unsuitable for an analytical whole- system analysis. Lifetime data for the models were derived from Stress-Strength Analysis (SSA) and electrothermal modeling. Sangwongwanich and Blaabjerg [22] approach the reliability modelling of fault-tolerant power converters using the method of stages. The method consists in approximating the Weibull distribution using an Erlang distribution. The resulting model comprises several stages of constant failure rate states that behave as a Weibull distribution. Even though it is possible to approximate any distribution using the method of stages, difficulties will 5 arise when the failure data being dealt with exhibits both increasing and decreasing failure rates; only increasing failure rate data is present because the authors generate the lifetime data using electrothermal modeling, which is meant to account only for aging—an increasing failure rate process. Peyghami et al. [10] compares CTMC, method of stages, and SMP to model the avail- ability of components subject to increasing, constant, and decreasing failure rates. Although the method of stages approximates wear-out failures very well, it behaves similarly to CTMC when modeling premature failures, ignoring the sharp decrease in availability as a result of a decreasing failure rate. The results show that only SMP accurately captures the behavior of decreasing failure rates. However, it has the same limitations as the SMP in [21]. The analyses shown in this work are theoretical, and no actual failure distribution development is undertaken. 1.2 Thesis Contributions Up until now, reliability modeling of inverters has primarily focused on those used in large-scale installations, which differ from those in smaller systems. The reasons for this focus include the lack of sufficient data to derive a robust model and the perceived lower importance of high reliability in small systems. However, as mentioned in [2, 8], reliability can significantly impact the long-term performance of small systems due to the typical delays in resolving failures, making it imperative to develop accurate models from data that can account for their actual behavior. Previous research has often assumed that there is no decreasing failure rate stage in the life of power converters, which may not be accurate for inverters. Moreover, critical aspects of reliability modeling, such as parameter uncertainty and distribution fitting, have gone unnoticed, possibly because the focus has been on generating large samples of lifetime data from first principles, which cannot account for hidden defects responsible for premature failures. In order to bridge the existing gaps in the literature, the contributions of this work can 6 be summarized as follows: • We propose a methodology to develop a data-driven state-space reliability model tai- lored for equipment that exhibits both increasing and decreasing failure rates. • The methodology is applied to a limited dataset of non-repairable string inverter fail- ures collected from residential and commercial PV systems in the U.S., where data scarcity and variability motivate the use of flexible, probabilistic modeling techniques. • We derive an exponential approximation for the inverter model, facilitating the analyt- ical comparison of expected energy production between systems designed with string and microinverters. • A Bayesian approach is adopted to quantify the uncertainty of parameter estimates, seamlessly integrated into a reliability analysis based on Markov Reward Models. • Leveraging Stan, we propose a detailed failure rate estimation procedure, providing estimates with a granularity of up to 50 days. 1.3 Thesis organization The remainder of the thesis is organized as follows. Chapter 2 describes the reliability modeling of small PV systems. Chapter 3 explains the concepts concerning failure data anal- ysis and describes the methodology used to develop the inverter reliability model. Chapter 4 introduces the concept of Markov Regeneration Process and Markov Reward Models, and provides the mathematical basis for the exponential approximation of the inverter model. Chapter 5 adopts a Bayesian approach to the failure data analysis. Lastly, Chapter 6 presents the conclusions. 7 CHAPTER 2 PV SYSTEM RELIABILITY In this chapter, we analyze small PV systems from a reliability perspective, modeling them as multistate systems. We narrow our focus to residential PV systems for simplicity of design; however, the concepts presented here can be extended to commercial systems. We begin by providing the necessary definitions to formalize the analysis as multistate systems. Next, we examine the components of both types of system designs found in residential PV systems—string inverter and microinverter systems—and discuss their respective reliability models. Finally, the reliability of the system is graphically represented using a reliability block diagram. 2.1 PV system as a multistate system It is possible to represent any system composed of binary or multistate components as a multistate system (MSS), since the overall performance or efficiency level of the system depends on the availability of its constituent units. If the MSS can occupy K distinct states, and gi denotes the total system performance in state i (i ∈ 1, . . . , K), then the MSS performance rate at time t is a random variable taking values in the set g = g1, . . . , gK [23]. Different procedures have been employed in the reliability literature to analyze MSS, and they can be classified into state-space and non-state-space methods. The former are typically used to analyze non-repairable MSS, although, under the assumption of independent failures and repairs, they can be used for repairable systems as well. These methods include reliability block diagram, reliability graph, fault-tree analysis, and state-space enumeration. On the other hand, state-space methods have higher modeling power and are not limited by the assumption of independent failures and repairs. Continuous-time Markov Chains, Semi- Markov Process, and Petri nets are among the most widely used state-space methods [24]. To develop a MSS reliability model, it is necessary to understand how the individual components behave and how they affect the performance of the system [23]. Therefore, reli- ability models for each component must be devised and a analysis method chosen accordingly. 8 Grid-tied residential PV systems are constituted by [25]: • Solar panels, an interconnected package of solar cells in charge of converting sunlight into DC electricity. • Inverter, recognized as the brain of the installation, takes DC electricity coming from the panels and converts it into AC electricity that can be used to feed loads or feed back to the grid. • Energy meter, keeps track of the energy fed back to the grid and how much is being drawn from it; there could be two energy meters, depending on the compensation scheme used. • Wiring, electrical boxes and protective devices, provide connectivity between system components and protection against faults. We will now focus on each component of the PV system, its reliability model, and how system performance is affected once it fails. 2.1.1 Solar panels In string systems, solar panels are connected in series to form a string. Each string is then connected to one optimizer, whose job is to maximize the energy production of the string. In contrast, in microinverter systems, each solar panel has its own optimizer, and the production of the panels is aggregated on an AC bus. The way the panels are connected determines the impact of their failures on the system performance. If a solar panel in a string fails, the whole string fails unless there is a bypass diode that provides an alternate path for the current. In contrast, if a solar panel in a microinverter system fails, only its output would be lost and the rest of the system continues functioning [26]. Past research has found that solar panels seldom suffer from failures, making them one of the most reliable components in the PV system [1, 5, 7]. In [8], the share of solar panel 9 failures was about 0.1% of the total hardware failures. However, since there are many solar panels in a system, the impact of their failures needs to be accounted for. Therefore, a simple two-state model with an exponential time to failure for each solar panel is sufficient to describe their behavior; these models are commonplace in the literature [12, 13, 25–27]. For the string systems in this work, we will assume that a bypass diode is present and the failure of one panel does not cause the failure of the entire string. By doing so, and given that all solar panels in a system have the same rating, the different states associated with solar panel failures can be grouped to reduce the number of system states. For example, a system with two solar panels where each has two states, functional and failed, would have a total of three states instead of four. The failure rate of solar panels is drawn from [28], where the authors present a mean time to failure of 65,789,474 hours or 7,510.2 years. 2.1.2 Inverters Inverters are responsible for converting DC into AC, which makes them indispensable for the operation of the PV system. Similar to solar panels, the way the system is designed and wired determines how inverter failures will impact the system. Failure of central, string, and microinverters results in the loss of energy production of all panels connected to them. This behavior suggests a kind of series dependence between solar panels and inverters, regardless of what kind of inverter we are dealing with, and only differs by the aggregated energy production lost, being noticeably lower for microinverter failures. Microinverters are highly reliable, with Enphase advertising a time between failures of 600 years [29]. This duration is still far from that of solar panels, but is significantly longer than that of string and central inverters. Furthermore, field studies confirm that microinverters fail significantly less frequently than central and string inverters [8]. The reason for why microinverters are more reliable is that their components have significantly lower power handling requirements than string and central inverters [2]. Consequently, as for solar panels, using a simple two-state model with an exponential time to failure is justified. 10 Unlike microinverters, string and central inverters are prone to failures, and their failures result in the loss of production from several solar panels, rendering crucial having an accurate reliability model. We will address the reliability model of string inverters in the next chapter. 2.1.3 Balance-of-system components In well-designed and properly installed systems, failures originating from energy meters, wiring, disconnects, and other hardware required to transfer power are unlikely [8]. Lastly, any grid failures would not significantly impact the long-term performance of PV systems because most outages are resolved in a few hours, at most [30]. Furthermore, grid and energy meter failures will affect the reliability of the system in the same way regardless of its architecture, making it possible to compare the reliability of two different inverter configurations without the need of including them in the model. Therefore, for the reliability diagram of the system, only inverter and solar panel failures will be considered. 2.2 Reliability block diagram The PV system is decomposed into several subsystems. The way of decomposing the sys- tem has to comply with the dependence between components and how repairs are carried out. For string systems, we can decompose them into subsystems comprising one string inverter and the solar panels connected to it. Only the inverter performance is usually monitored in these systems, which leaves solar panel failures to be addressed during maintenance visits. Conversely, when inverter failures happen, the system owner is notified by the monitoring system, prompting its repair. Under these assumptions, we can consider each subsystem as independent. Microinverter systems are more reliable, and information about their performance is usu- ally available, ensuring that the repair procedure for both, microinverter and solar panel, can be initiated promptly. Therefore, these systems can be decomposed into multiple subsystems of a microinverter and the panels connected to it. From the way we have defined subsystems, as long as there is one solar panel functioning 11 and the inverter or microinverter to which it is connected has not failed, the system can produce electricity. Thus, each subsystem is aggregated in parallel to build the complete representation of the PV system. If a two-state model corresponding to DC or AC discon- nects, energy meter, or grid failures were to be added, it would be in series with the parallel connection of the subsystems because, if any of these fail, energy production will drop to zero. The reliability diagram of the string inverter or microinverter subsystem is shown in Figure 2.1. Figure 2.1 Reliability block diagram of subsystems 12 CHAPTER 3 RELIABILITY MODELING OF INVERTERS In this chapter, we describe the reliability modeling of string inverters based on failure data. This approach, known as the actuarial approach, assumes that all relevant information about the failure of a component is captured by a time-to-failure distribution f (t). From f (t), reliability characteristics such as the failure rate λ(t) can be directly derived. We analyze the failure data and propose a piecewise exponential distribution to model the observed failures. This distribution also connects with the concept of state-space mod- eling, which enables the development of a reliability model. Finally, we present numerical results that highlight the importance of accounting for the time-varying failure rate behavior exhibited by inverters. 3.1 Developing reliability models Reliability modeling of most components, except those present in safety-critical systems, involves the way they fail and how they are brought back to operation upon failure. In our case, the failure of an inverter leads to its subsequent replacement; thus, the component goes back to a good-as-new condition. Under this scenario, the failure-repair process can be considered an alternating renewal process. Therefore, the failure and repair distributions can be established independently [24]. To obtain the failure distribution of a component, it is necessary to carry out life tests, where the time until failure of each component being tested is recorded. However, in many cases, the lifetimes of the components are observed while they are in operation; this kind of measurement campaign is referred to as field data collection. Moreover, when failures are recorded from components in operation, a significant portion of them will not fail before the end of data collection, and instead of a lifetime value, their potential lifetime is obtained. This is called censored data in the reliability literature [31]. The procedure to fit failure models recommended in most reliability textbooks is as follows [32]: 13 1. Construct a histogram of the failures. 2. Calculate descriptive statistics. 3. Assess the empirical failure rate 4. Make use of any prior knowledge of the failure process 5. Select suitable candidate models 6. Estimate parameters 7. Perform a goodness-of-fit test Steps 1 through 4 provide insights to narrow down the number of candidate distributions that will be considered in step 5. Step 6 consists of estimating the parameters of each candidate model, usually by means of maximum likelihood estimation, and judging them according to accuracy metrics such as the Akaike Information Criterion. Finally, in step 7, it is determined whether it is reasonable to assume the data came from the hypothesized distribution [31, 32]. A reliability model of the component can be developed from the estimated failure distri- bution using different approaches. The simplest is to use a Semi-Markov Process where the sojourn time in the up state is distributed according to the failure distribution estimated, and the state space model consists of only two states (up and down) [10]. An alternative approach is to use a CTMC with two or more transient and one absorbing states to ap- proximate the non-exponential failure distribution. The preceding definition corresponds to a family of distributions called phase-type [10, 22, 24]. Some authors have shown that PH approximations are not so accurate, especially in the transient period, when there is a decreasing failure rate behavior present [10]. 3.2 Description of the failure database Data was collected from residential and commercial PV systems located in the Midwest part of the U.S. The earliest installation date of the systems under study was 2013, and 14 the capacity of the inverters ranges from 5 to 100 kWac. A total of 193 failures were reg- istered. Failures were identified from current and voltage signals, when possible, and from the monitoring system component changelog when there was no communication during the failure. The histogram of the failures is depicted in Figure 3.1; more than half of the failures occurred within 1000 days, while only 28 failures were recorded beyond 2400 days. Figure 3.1 Histogram of failures The failure database is complemented by censored data from 234 inverters, which are currently communicating with the monitoring platform. The censored data in this case is multiply right-censored, that is, each component was placed in operation at different times and only some of them failed while the data was being collected [32]. 3.3 Non-parametric models The procedure of fitting a failure distribution to data can be less of trial and error if there is some indication of the underlying failure behavior embedded in the data. In light of this, non-parametric representations of the failure data provide valuable insights into choosing 15 better models. The only assumptions made by these representations is that the cumulative distribution function (CDF), F (t), is continuous and monotonically increasing [31]. Two important non-parametric representations will be discussed next, the Kaplan-Meier estimator and the cumulative failure rate. 3.3.1 Kaplan-Meier estimator The Kaplan-Meier estimator provides an estimate of the survivor function R(t) from censored data [31]. The Kaplan-Meier estimator is defined as: ˆR(t) = (cid:89) j∈Jt nj − 1 nj (3.1) where ˆR(t) is the estimate of the survival function at time t, t(j) represents a failure time, Jt denotes the set of all indices j where t(j) < t, and nj represents the number of items functioning and in observation immediately before time t(j), j = 1, 2, ..., n. When the largest lifetime recorded is censored, the estimator is said to be undefined after the last registered failure. 3.3.2 Cumulative failure rate The Cumulative failure rate, also Nelson-Aalen estimator, is used to empirically deter- mine the cumulative failure rate, Z(t), in the presence of censored data [31]. The estimator is defined as: ˆZ(t) = (cid:88) j∈Jt 1 nj (3.2) Z(t) can be interpreted as the accumulation of the failure rate over time, and its concavity indicates the shape of failure rate. If Z(t) is concave upward, there is an indication of increasing failure rate, and downward concavity indicates decreasing failure rate. A bathtub shaped curve would be seeing as Z(t) being concave downward followed by a transition to a concave upward section. From the cumulative failure rate, it is possible to ascertain if the data could be well represented by a Weibull distribution if it has a decreasing failure rate. Alternatively, by a 16 normal distribution, lognormal, or a Weibull if it has an increasing failure rate [32]. However, it is not possible to estimate the numerical value of the failure rate using this estimator. 3.3.3 Non-parametric analysis of the data Figure 3.2 depicts a Kaplan-Meier estimate that can be thought of as three lines with different slopes. The changes in slope occur at around 1100 and 2500 days. The cumulative failure rate depicted in Figure 3.3 initially has a concave downward section, but suddenly changes to a linear section around 2500 days. The former is an indication of a decreasing failure rate, agreeing with what is shown in Figure 3.1. The latter, on the contrary, is more complicated to analyze because of the lack of sufficient data beyond 2500 days. Although, it is clear that there are two different failure rate behaviors, which is not easy to model using regular continuous distributions. We will show that it is possible to model this kind of behavior using a piecewise exponential distribution. Figure 3.2 Kaplan-Meier estimate 17 Figure 3.3 Cumulative failure rate 3.4 Piecewise exponential distribution A random variable T is piecewise exponentially (PWE) distributed if its hazard rate function is piecewise constant with a total of r change points dk, where 1 ≤ k ≤ r [33]. The hazard and survivor function are given by equations 3.3 and 3.4, respectively. h(t) = λ1, λ2, ...    λr+1, t < d1 d1 ≤ t < d2 t ≥ dr S(t) =    e−λ1t, e(λ2−λ1)d1−λ2t, ... t < d1 d1 ≤ t < d2 e[(cid:80)r i=1(λi+1−λi)di]−λr+1t, t ≥ dr 18 (3.3) (3.4) The parameter estimation process of the PWE distribution is usually done using the maximum likelihood estimator, and it consists of determining the location of the change points and the hazard rates. If the change points are known, the likelihood function becomes smooth and the hazard rates can be found solving the maximization problem. On the other hand, when the change points are unknown, the estimation can be conducted by brute force or by fitting lines to the empiric survivor function plotted in logarithmic scale [33]. A brute force approach involves assuming a number of change points and estimating parameters for numerous combinations of possible locations, which can be tedious when several change points are considered. In this work, the change points will be determined visually from the survivor function plotted in logarithmic scale. 3.5 Maximum likelihood estimator The Maximum Likelihood Estimator (MLE) of any parametric distribution f (θ) is found by determining the parameter values (θ1, . . ., θk) that maximize the likelihood function [32]. The likelihood function is given by equation 3.5. L(θ1, . . . , θk) = n (cid:89) i=1 f (ti | θ1, . . . , θk) (3.5) where i = 1, 2, 3, ..., n are the indices corresponding to the number of failure observations. Due to the multiplicative nature of equation 3.5 it is easier to find the parameters that maximize the logarithm of the likelihood function. Equation 3.5 is slightly modified to take into account multiply censored observations, resulting in equation 3.6. The term R(ti; θ) is added to find the set of parameters that maximize the reliability function for the censored observations. L(θ) = (cid:89) i∈U f (ti; θ) (cid:89) i∈C R(ti; θ) (3.6) where U and C are the set of indices corresponding to failure and censored observations, respectively. 19 3.6 Fitting failure distributions In this section, we estimate the parameters of different failure distributions and assess how well each fits the data. The parameter identification is done using lifelines library in Python. For the PWE distribution, it is necessary to identify the change points. In Figure 3.4, the Kaplan-Meier estimate is plotted with the vertical axis in logarithmic scale. Initially, we identified two changes in slope, at 1100 and 2500 days, which are the red dots in Figure 3.4. However, the best fit found was when two more change points are added, at 130 and 2900 days, depicted as green dots in Figure 3.4. Figure 3.4 Visual identification of change points Now, we compare distributions to model the data. A qualitative fit assessment is provided by Figure 3.5, where the survivor function of each estimated distribution is compared to the Kaplan-Meier estimate. It can be seen that the PWE distribution is the only whose 95% confidence bounds enclose completely the Kaplan-Meier estimate, a visual indication of a good fit. Quantitatively, the quality of fit can be assessed using the Akaike Information 20 Criterion (AIC), which is presented in Table 3.1; the PWE distribution achieves the lowest AIC value and is therefore the best model among the six proposed. Table 3.1 Comparison of distributions according to AIC Weibull 3496 Loglogistic 3506 Exponential 3500 PWE 3470 Lognormal 3520 Generalized gamma 3491 The Walt test is used to perform a goodness-of-fit test of the parameters of the PWE distribution. The p-values obtained for each parameter of the distribution are smaller than 0.05, which means that it is possible to reject the null hypothesis that each rate equals the baseline value of 1. In other words, the rates are statistically significant. 3.7 Proposed model Reliability literature has consistently favored state-space modeling over non-state-space approaches, as it offers greater modeling power and the ability to represent dependencies among various components of a system [34]. Therefore, if the model is meant to be used for a whole-system analysis where there could be dependencies between components, a state- space modeling approach should be adopted. We propose a state-space reliability model where the PWE distribution becomes a se- quence of states, and use a Semi-Markov process as model formalism to describe its dynamic behavior. In the following subsections, we will present the mathematical basis of our ap- proach. 3.7.1 Semi-Markov process A Semi-Markov process is a generalization of the CTMC where the exponentially dis- tributed sojourn time requirement is relaxed, but the process still follows the Markov prop- erty [24]. The SMP is completely characterized by its kernel matrix Q(t) and the initial state of the process. Each element Qij(t) represents the probability of a one-step transition from state i 21 Figure 3.5 Survivor function of each failure distribution estimated 22 to state j during the interval [0, t]. Considering a scenario where a process is in state 0 and can move to two different states (1 and 2), Q01 can be found by applying equation 3.7. Q01(t) = Pr {(T0,1 ≤ t) ∩ (T0,2 > t)} (3.7) where Pr{·} indicates the probability of the event enclosed in the curly brackets, T0,1 and T0,2 are random variables that denote the time to transition from state 0 to states 1 and 2, respectively. The unconditional sojourn time distribution in state i can be obtained from Q(t) using equation 3.8. Fi(t) = K (cid:88) j=1 Qij(t) (3.8) where j = 1, 2, ..., K denote the system states. The conditional time-dependent probabilities can be found by solving the following sys- tem of integral equations [24]: θij(t) = δij [1 − Fi(t)] + K (cid:88) (cid:90) t dQik(τ ) dτ 0 θkj(t − τ )dτ (3.9) k=1 where δij is an indicator variable that evaluates to 0 if i ̸= j, and 1 if i = j. The term θij(t) reads as the probability that the process is in state j at time t, given that the process started from state i. 3.7.2 Deriving the state-space model The PWE distribution can be viewed as a sequence of J transient states, where J = r +1, and one absorbing state F . From each transient state, except the last state in the sequence, it is possible to transition toward the failure state or to the next state in the sequence. The time until a transition to failure TiF , i = 1, 2, . . . , J, is exponentially distributed with parameter λk, corresponding to the k-th interval of the PWE distribution, defined between the change points dk − dk−1. The CDF of TiF is denoted by equation 3.10. Similarly, the 23 time until a transition to the next state in the sequence Tij, j = i + 1, is deterministic with a CDF given by equation 3.11. FiF (t) =    0, t < 0, 1 − e−λkt, t ≥ 0. Fij(t) =    0, 1, t < Tk, t ≥ Tk. (3.10) (3.11) where Tk = dk − dk−1. When k = 1, Tk = d1 Since the process can only start from the first state of the sequence, the probabilities of interest have the form θ1j(t). 3.7.3 State probabilities The probability of being in any state at time t can be found by solving equation 3.9 in Laplace domain. These equations can be written in matrix form as equation 3.12 [24]. where: θ(s) = (I − Q(s))−1 (cid:19) I − F(s) . (cid:18) 1 s θij(s) = Qik(s) = Fi(s) = (cid:90) ∞ 0 (cid:90) ∞ 0 (cid:90) ∞ 0 e−stθij(t) dt, e−st dQik(t), e−stFi(t) dt. (3.12) (3.13) (3.14) (3.15) To obtain the time-domain expression of θ(s), the inverse Laplace operator is applied. It may not be possible to obtain a close form solution for θ(t); however, the integral that defines the inverse Laplace transform can be numerically evaluated. Without a transition out of the failure state, the state probabilities for the SMP are nonzero only during the interval where the process could certainly have reached that state, and no transition to the next state could have occurred. Thus, the individual time-dependent 24 probabilities have no meaning in this case. However, adding the probabilities of all transient states yields the survivor function. The survivor function is derived analytically by solving the system of equations in (3.16) because the singularity of Q(t) when there is no transition out of failure prohibits the use of equation 3.12.    θjF (s) = sQj(j+1)(s)θ(j+1)F (s) + Q(j+1)F (s) ... (3.16) θN F (s) = QN F (s) where j = 1, . . . , J − 1. The equations can be back-substituted to obtain an expression for θ1F (s) solely in terms of the complex Laplace-domain variable s. Thereafter, the inverse Laplace operator is applied to obtain the time-domain expression. 3.8 Results and discussion The PWE distribution fitted in Section 3.6 can be converted into a SMP with five tran- sient states and one absorbing state. The matrix Q(t) has dimensions 6 × 6 and the initial state is the first state of the sequence. The state-space model is represented by the diagram in Figure 3.6. Solving equation 3.16 yields a time-domain expression consisting of exponential terms multiplied by Heaviside functions, which describe the same survivor function as the fitted PWE distribution. Thus, the failure process is accurately reproduced by the SMP. Next, we want to use the model to find the availability of the inverter. To do this, we include a transition out of the failure state. The time to restoration is assumed to follow an exponential distribution with rate µ. In this scenario, Q(t) is full rank, and it is possible to apply equation 3.12. The element θ16(s), which represents the unavailability of the system, is extracted and its time-domain representation is obtained numerically using Euler’s method [35]. The availability of the proposed model, along with that of a two-state 25 Figure 3.6 Proposed reliability model Figure 3.7 System availability Markov model, is plotted in Figure 3.7. The two-state model failure rate is obtained from the MLE of an exponential distribution from the data. As shown in Figure 3.7, the availability of the proposed model initially decreases sharply, reflecting a high likelihood of premature failures, which implies a decreasing failure rate. This is followed by an increase in availability and a subsequent dip—suggesting a period of 26 rising failure rates—before finally stabilizing at a steady-state value. The two decreasing epochs in availability demonstrate the model’s capability to capture the underlying patterns of varying failure rates in the data, as confirmed by the cumulative failure rate. In contrast, the two-state CTMC model demonstrates a much simpler behavior, where availability quickly stabilizes at a steady state. As the mean time to repair increases, the error incurred by using a two-state model becomes more significant. This may not be an issue for large-scale facilities, but it can impact the economic feasibility of systems used in residential and commercial settings, which are not monitored as closely and where repairs can take longer [8]. The accuracy of the inverter model comes with an important caveat: conducting a whole- system analysis with our model requires the use of a modeling formalism capable of main- taining some memory of where the process is in time, because the model is not memoryless. To illustrate this, let us consider a system comprising one inverter and one solar panel; here we assume the solar panel to be a two-state exponential model. The process starts from the state 1 of the inverter and the up state of the solar panel. If the solar panel fails, the clock resets and the progress towards state 2 of the sequence is lost. After the failure of the solar panel, the inverter will restart in state 1 as if no time has elapsed, leading to an incorrect representation of its failure behavior. In the next chapter, we will introduce a modeling formalism capable of dealing with the non-memoryless nature of our model, called Markov Regenerative Models. 27 CHAPTER 4 SUBSYSTEM RELIABILITY MODEL Up until now, we have discussed the reliability of PV systems and reduced it to several subsystems connected in parallel, each consisting of solar panels and the inverter to which they are connected. In the previous chapter, we developed a reliability model for string inverters that captures both premature and wear-out failures, and discussed why the model structure cannot be directly applied to a system analysis under the Semi-Markov framework. In this chapter, we introduce Markov Regenerative Process and elaborate on how to apply it to the solar panel-inverter subsystem. Thereafter, we utilize phase-type distributions to find an exponential representation of the string inverter model. Lastly, we describe Markov Reward Process and its use for estimating energy yield of PV systems. 4.1 Markov Regenerative Process Markov Regenerative Processes (MRGPs) is a class of stochastic models that can be used to model processes in which there are transitions that do not adhere to the Markov property. MRGPs are a generalization of many stochastic models such as SMP and CTMC. The key concept behind MRGPs is that the Markov property holds for a certain group of states called regeneration states [24]. The embedded time points {Tn, n ≥ 0} are the regeneration time points (RTP) of the system, that is, the process resets every time it reaches these points. To define a MRGP, two matrices need to be specified: global K(t) and local E(t) kernel matrices. The global kernel describes the occurrence of the next RTP, and can be formulated as equation 4.1. On the other hand, the local kernel expresses the state transitions within the regeneration interval, before the process hits the next RTP; each element of E(t) is computed using the expression in equation 4.2. Kij(t) = P {Y1 = j, T1 ≤ t|Y0 = i} Eij(t) = P {Z(t) = j, T1 > t|Y0 = i} (4.1) (4.2) 28 where Z(t) is the state of the process at time t, and i, j are states within the state space S. Using the global and local kernels, the transition probability over (0, t] can be found by solving the generalized Markov Renewal Equation, shown in equation 4.3. Vij(t) = Eij(t) + (cid:90) t (cid:88) k∈S 0 dKik(y)Vkj(t − y). (4.3) where Vij(t) is the transition probability. As for SMPs, equation (4.3) can be converted to the Laplace domain to obtain a matrix equation that can be solved analytically if the kernels can be expressed in closed form. The resulting matrix equation is denoted by equation 4.4. ˆV (s) = [I − ˆK(s)]−1 ˆE(s) (4.4) Solving equation 4.4 involves numerous symbolic operations just to find ˆV (s), which we will need to numerically integrate to find V (t). This procedure becomes computationally expensive as the state space grows, necessitating an alternative solution approach. A more efficient way of solving equation 4.3 is to use an exponential approximation of the inverter model so that the subsystem model can be expressed as a CTMC, which are easier to analyze than MRGPs [24]. 4.2 Exponential approximation Finding an exponential approximation of a non-exponential stochastic process is called Markovization. This procedure entails replacing the non-exponential distribution with a phase-type distribution, representable as a CTMC with one absorbing state. Furthermore, any discrete state non-exponential stochastic process can be approximated by an equivalent CTMC over an expanded state space using phase-type distributions [24]. 4.2.1 Phase-type distributions Phase-type (PH) distributions are employed to model non-exponential distributions as the time until absorption of a Markov process with one absorbing state. PH distributions of order n can approximate as closely as desired any distribution function [24]. The components of a PH distribution are: 29 1. State Space: A set of transient states 1, 2, ..., n and one absorbing state. 2. Initial Probability Vector: The probability of starting in each transient state is given by α. 3. Sub-Generator Matrix (T): The matrix represents the transition rates between the transient states. Different techniques have been proposed in the literature for fitting PH distributions: Fitting mixture of Erlangs with moment-matching techniques, MLE, and Expected Maxi- mization with various classes of PH distributions. Computer programs such as PhFit have been developed to fit continuous distributions and experimental data to PH distributions [24]. 4.2.2 Deterministic transitions The inverter model developed in the previous chapter exhibits non-exponential behavior due to deterministic transitions between success states. This deterministic time to transition can be modeled as a PH distribution by representing it as an Erlang random variable, which is the sum of k i.i.d. exponential random variables. Being the sum of k exponential random variables with the same rate parameter means that it is equivalent to a succession of k exponential stages, all with the same rate. The mean and coefficient of variation of an Erlang random variable are given in equations 4.5 and 4.6, respectively. µ = k/λ √ CV = 1/ k (4.5) (4.6) where k is the number of states of the exponential approximation and λ is the transition rate between states. Equation 4.6 reveals an interesting property of the exponential approximation by means of an Erlang: The quality of the approximation depends on the number of states. Ideally, a deterministic variable has zero variance or an infinite coefficient of variation. However, for 30 practical purposes, accurate representations can be achieved without the need for too many states [36]. To find the exponential approximation, both the CV or k, and the mean need to be specified. The mean simply represents the value at which the transition occurs, and the CV or k is set to achieve the desired accuracy. Subsequently, the goodness of the approximation can be assessed by plotting the CDF. The CDF of the exponential approximation can be found by solving the Kolmogorov differential equation for the transition rate matrix (A) and initial state vectors (p0), given in equations 4.7 and 4.8, respectively.    Qu aT 0 0    , A = (cid:20) (cid:21) p0 = 1 0 · · · 0 (4.7) (4.8) where Qu is a k × k matrix and each of its rows has only two nonzero values, aii = −λ and ai,i+1 = λ. The column vector aT is found as −QueT , where eT is a column vector with all entries equal to 1. The initial state vector p0 is a 1 × (k + 1) vector. The Kolmogorov differential equation is written as [23]: dp(t) dt = p(t)A (4.9) Figure 4.1 depicts the exponential approximation of a deterministic transition at 100 days for different values of CV. As denoted by equation 4.6, in Figure 4.1 we can see that the accuracy of the approximation increases with decreasing CV or increasing number of states. 4.2.3 Exponential inverter model Now, we elucidate the procedure to find the exponential approximation of the inverter model. First, we find the exponential approximation for each deterministic transition. Then, we aggregate these approximations to form the state-space of the inverter model. Figure 4.2 depicts the transition rate diagram of the exponential inverter model. Each state in the same 31 Figure 4.1 Exponential approximation of a deterministic transition for different coefficients of variation group has a transition to failure given by the PWE distribution, whereas the transition from one transient state to the next is given by the rate determined through the deterministic approximation. Assuming k = 25 for the first and last transition, and k = 100 for the rest, we find a good compromise between accuracy and complexity. Figure 4.3 shows the availability of both the exponential model and the SMP. The maximum difference between the two responses is 0.3%. Therefore, we can conclude that the exponential approximation does a very good job at capturing the non-exponential behavior of the inverter model. 4.3 Subsystem model Now that we have found an exponential approximation of the inverter model, we can create a subsystem model. The transition rate diagram depicted in Figure 4.4 shows the state space of the subsystem model with string inverter and the different failure transitions between states. Horizontal transitions and transitions denote the progression through each state of the inverter model, whereas vertical transitions denote changes in the number of 32 Figure 4.2 Exponential inverter model Figure 4.3 Availability of both inverter models 33 solar panels functioning. From each row to the next, the system moves with a transition rate given in equation 4.10. λi = ni × λsp for i = 1, . . . , N (4.10) where ni is number of solar panels functioning in state i and λsp is the failure rate of a single solar panel. Figure 4.4 Subsystem model with string inverter Figure 4.5 Subsystem model with microinverter 34 The repair transitions are not shown in the diagram. Solar panels repairs take place only during scheduled maintenance. This would be represented by a transition from each state where there is a failed solar panel to the first state of the corresponding column in which is located, with rate given by the mean time to maintenance. In case of an inverter repair, the process moves horizontally to the left, arriving at the first state of the row. The rate at which this transition occurs is determined by how long it takes to solve inverter-related hardware issues. The subsystem model with microinverter is shown in Figure 4.5. Since information at component level is available, the failure of the solar panels or the microinverter triggers a repair action that will restore the system to the state where all components are functional. All repair transitions are equal and its rate is given by the time it takes to repair the inverter. 4.4 Markov Reward Models A Markov Reward Model (MRM) or Markov Reward Process is a mathematical frame- work to analyze stochastic systems that adhere to the Markov property and accumulate some prize or reward by staying in a state or transiting to another [23]. In the context of a PV system, MRMs provide a flexible framework that we can use to estimate the expected energy yield. An MRM consists of the following components: 1. State space (S): A finite or countable set of states s1, s2, s3, ..., sK. 2. Transition rate matrix (A): A matrix that describes the instantaneous rate of transiting to another state. 3. Reward matrix (R): A matrix that contains the different rewards the process obtains by staying in its current state or transitioning to another. The rewards accrued per unit of time when the system is in the state si is rii. Alternatively, the prize won by transitioning from state i to state j is denoted as rij. 35 The total expected accumulated reward until time t when starting from state i is vi(t). The solution of the system of differential equations with zero as the initial condition provides the accumulated expected reward: dvi(t) dt = zi + K (cid:88) j=1 qijvj(t), i = 1, . . . , K zi = rii + K (cid:88) j=1 j̸=i qijrij, i = 1, . . . , K (4.11) (4.12) 4.5 Rewards for estimating energy yield For a PV system, the reward matrix specifies the energy production in each state per unit of time. Energy production in a given state is determined by the number of solar panels functioning and if whether the inverter is operational. Off-diagonal values of the reward matrix are zeros, since there is no reward associated with transitions between states. Equation 4.13 denotes the amount of energy produced in each state. Ei = δ × min(Pin, Psp × (N − M )) × Eavg (4.13) where Ei is the energy production of the i-th state, δ is an indicator variable with a value of 1 if the inverter is operative and 0 if it is not, Pin is the inverter rated power, Psp is the solar panel rated power, M is the number of solar panels not working in the i-th state. Eavg is the average yearly energy output of a perfectly reliable PV system per kWac installed. The rewards matrix can be set up in a way that also accounts for time-dependent changes in the system. It can decrease sequentially to account for the degradation of solar panels, providing a more comprehensive description of the performance of the system. 4.6 Results and discussion We will now compare the long term performance of two 8 kW systems, one with string inverters and the other with microinverters. The system configurations under study are: 1. One 8 kW string inverter and twenty 400 W solar panels connected to it. 36 2. Twenty 400 W microinverters, each connected to one 400 W solar panel. No shading effects are assumed, so the performance of both systems neglecting failures is the same; any differences in performance will be because of reliability. The yearly energy production is 1375 kWh/kW. Solar panel degradation is assumed to be 0.5% yearly. Figure 4.6 Yearly energy production of both system configurations for varying inverter mean time to repair (A=40 days, B=60 days, and C=105 days) The energy production of both systems is shown in Figure 4.6 for different values of inverter mean time to repair, which for the microinverter system is the time to repair of both solar panel and microinverter. As expected, the higher reliability of microinverters grants them higher energy production overall. It is worth pointing out that the string system production is heavily influenced by the decreasing and increasing failure rate behavior of the inverter; this manifests as a reducing the gap in energy production between the two systems within the first 5 to 6 years, and then increasing it until the difference settles. Another key takeway from Figure 4.6 is that the string system energy production is very sensitive to the mean time to repair of inverters, whereas microinverter systems are so reliable that even for 37 higher time to repairs the energy production results unaffected. These repair values were selected from [8] and they and represent the mean, 75th percentile, and max value of the recorded times to solve inverter-related hardware issues. 38 CHAPTER 5 BAYESIAN INVERTER RELIABILITY MODELING In Chapter 3, we established time-to-failure distributions using maximum likelihood estima- tion, as is customary in traditional reliability and lifetime data analysis. In this approach, the notion of probability is frequency-based; that is, it derives from the understanding of probability as the limiting relative frequency of an event in a repeated series of identical trials. An alternative to this notion is Bayesian probability, which differs from the frequency-based paradigm by considering probability as a subjective assessment of the state of knowledge. This philosophical difference enables the use of information that would otherwise remain unutilized, thereby enhancing model building. In Bayesian models, parameters are treated as random variables about which we make probabilistic statements, providing a robust frame- work to quantify uncertainty—especially vital in the absence of abundant data. In this chapter, we revisit the identification of the PWE distribution from a Bayesian perspective for two reasons. First, we aim to analyze the impact of uncertainty stemming from model parameters on the expected energy production and provide confidence bounds for these esti- mates. Secondly, we aim to analyze the failure rate as a key indicator of the failure process by employing a PWE distribution with change points evenly placed at distances sufficient to capture how the process varies over time. 5.1 Basics of Bayesian analysis In Bayesian analysis, probability models are composed of two parts: prior and likelihood. The prior represents our knowledge about the parameters before any data is used. On the other hand, the likelihood function is obtained from the sampling distribution, which essentially describes the probability of observing the data given a set of parameter values. These two elements give rise to the posterior distribution through the application of the Theorem of Bayes, shown in equation 5.1 [37]. p(θ|y) = f (y|θ)p(θ) m(y) 39 (5.1) where p(θ|y) is the posterior distribution, p(θ) is the prior density, m(y) is the marginal density of the data, and f (y|θ) is the likelihood. 5.1.1 Priors For every parameter θi in a Bayesian model, we need to provide a prior. The prior encapsulates an initial plausibility assignment for each possible value the parameter can take. Priors can serve other purposes such as constraining the parameters to be within a reasonable range, taking advantage of any insight we may have about what the correct value should be. Priors can be informative and noninformative; the former is used when it is known that the parameter is more likely to take certain values, whereas the latter is used when very little information is known about the parameter; an example of a noninformative prior would be to use a uniform distribution from zero to one when estimating a population proportion. Since priors can and will affect the posterior, it is necessary to do a prior sensitivity check to gauge how robust the posterior is to changes in the priors [38]. 5.1.2 Likelihood The likelihood specifies the plausibility of observing each sample in the dataset, and it is expressed in terms of the parameters of the model. For instance, if we are interested in the proportion of the number of heads to the total number of tosses, we would choose a binomial probability mass function as the likelihood function. In fact, the likelihood function is what Bayesian analysis has in common with its frequency-based counterpart, and it is the reason why, as the sample size grows, the results of both approaches become very similar [38]. 5.1.3 Posterior Obtaining the posterior consists of updating our beliefs contained in the priors. This updating procedure is carried out by applying equation 5.1. Closed-form expressions for p(θ|y) are only possible when conjugate priors are used, since finding m(y) is numerically intractable in most cases. Fortunately, Markov Chain Monte Carlo (MCMC) methods can be used to sample from complex posteriors and generate sequences of parameter values upon which inferences can be based. Probabilistic programming tools such as Stan have built- 40 in samplers that can be used to generate a sample of the posterior for a wide variety of models, so that all the effort can be devoted to finding appropriate priors and writing out the likelihood function, if it is not already implemented in the chosen tool [39]. 5.1.4 Parameter uncertainty The frequency-based way of thinking about parameter uncertainty is based on the sam- pling distribution of the estimator being used. However, deriving analytical expressions of the sampling distribution for every model is not always possible, and in these cases, inferences about the parameter must rely on asymptotic results—approximating the sample distribu- tion of the parameter when the sample size is ”large.” The MLE of a parameter θ for a large sample size is approximately normally distributed, with a mean equal to θ and a variance equal to the negative reciprocal of the second derivative of the log-likelihood evaluated at the MLE. Confidence bounds derived following this approach are not probability statements about parameter uncertainty, but rather statements based on repeated sampling that may not even make sense in most scenarios, especially in those where data is limited [37]. Unlike frequency-based confidence bounds, Bayesian probability does provide probability statements about parameter uncertainty, also called credible confidence bounds [38]. Addi- tionally, these probability statements can be easily propagated through complex models, a task that is difficult and sometimes impossible with frequency-based confidence intervals [37]. 5.2 Rethinking the PWE distribution under Bayes Bayesian probability offers a framework to incorporate uncertainty into the reliability and performance analyses conducted so far. The expected energy production is a fixed value once the parameters have been specified; however, if there is uncertainty in the parameters, it becomes a random variable. Using Bayesian credible confidence bounds, we can offer a confidence interval for the expected energy production. 5.2.1 Building the Model In this section, we will assume that the location of the change points is the same as those in the model of Chapter 3; hence, the only parameters are the failure rates. Despite knowing 41 the MLE of the parameters, it is desirable to introduce as little bias as possible into the priors because they can heavily influence the posterior. As an initial prior, we have assumed a bathtub-shaped behavior, summarized in Table 5.1. The lowest failure rate is chosen as the inverse of ten years, a reference value used by most manufacturers to establish warranties for string inverters, and the standard deviation is set large enough to allow the sampler to explore the search space. Table 5.1 Initial priors Parameter λ1 λ2 λ3 λ4 λ5 Prior N (4.1096 × 10−4, 2.25 × 10−8) N (3.4246 × 10−4, 2.25 × 10−8) N (2.7397 × 10−4, 2.25 × 10−8) N (3.4246 × 10−4, 2.25 × 10−8) N (4.1096 × 10−4, 2.25 × 10−8) The likelihood function of the PWE distribution is shown in equation 5.2 [40]. h(t) = λi, t ∈ Ii, i = 1, . . . , r + 1, N (cid:89) L(t, λ) = λi · di · exp(−λi · ai), i=1 (cid:88) xij, di = j (cid:88) (tij − ti−1,j) ai = j (5.2) where Ii is the interval (ti−1, ti], li is the likelihood contribution of all data points through λi, di is the number of failures in interval i, and ai is the exposure time in interval i. 5.2.2 Estimation results The model estimation is implemented in RStan, which is an interface in R to Stan, a C++ library for Bayesian inference using the No-U-Turn sampler [39]. The estimation summary is shown in Table 5.2. The resulting marginal posterior for all parameters has a lower standard deviation than the priors, which indicates that uncertainty was reduced after the Bayesian update. However, 42 Table 5.2 Posterior summary statistics using normal prior λ Mean 1 2 3 4 5 5.429E-04 3.218E-04 1.923E-04 5.906E-04 4.961E-04 SD 8.056E-05 3.210E-05 2.887E-05 1.003E-04 1.180E-04 2.5% 3.945E-04 2.622E-04 1.401E-04 4.032E-04 2.774E-04 25% 4.868E-04 2.996E-04 1.722E-04 5.217E-04 4.136E-04 50% 5.393E-04 3.206E-04 1.908E-04 5.870E-04 4.934E-04 75% 5.958E-04 3.428E-04 2.109E-04 6.562E-04 5.747E-04 97.5% 7.116E-04 3.877E-04 2.530E-04 7.975E-04 7.335E-04 it is important to verify to what extent this reduction was influenced by the choice of priors. To assess the sensitivity of the posterior to changes in the priors, we explore more candidate prior distributions with the same expected value: exponential and gamma. The latter can be considered a small perturbation, whereas the former represents a large perturbation. For the gamma distribution, the parameters α and λ have been set to match the mean and to have a standard deviation closer to that of the normal priors; the exponential distribution is specified by just the mean. The posterior summary statistics using each prior are shown in Tables 5.3 and 5.4. Table 5.3 Posterior summary statistics using gamma prior λ Mean 1 2 3 4 5 5.464E-04 3.185E-04 1.891E-04 6.578E-04 4.977E-04 SD 8.859E-05 3.152E-05 2.819E-05 1.364E-04 1.384E-04 2.5% 3.882E-04 2.598E-04 1.384E-04 4.187E-04 2.660E-04 25% 4.836E-04 2.964E-04 1.695E-04 5.608E-04 3.972E-04 50% 5.418E-04 3.174E-04 1.875E-04 6.481E-04 4.858E-04 75% 6.032E-04 3.393E-04 2.072E-04 7.445E-04 5.834E-04 97.5% 7.341E-04 3.825E-04 2.487E-04 9.485E-04 8.031E-04 Table 5.4 Posterior summary statistics using exponential prior λ Mean 1 2 3 4 5 5.826E-04 3.177E-04 1.862E-04 8.144E-04 6.064E-04 SD 1.020E-04 3.242E-05 2.824E-05 1.857E-04 2.305E-04 2.5% 4.010E-04 2.571E-04 1.346E-04 4.939E-04 2.450E-04 25% 5.104E-04 2.953E-04 1.666E-04 6.824E-04 4.375E-04 50% 5.769E-04 3.166E-04 1.848E-04 7.995E-04 5.779E-04 75% 6.478E-04 3.387E-04 2.045E-04 9.306E-04 7.421E-04 97.5% 7.986E-04 3.845E-04 2.453E-04 1.219E-03 1.147E-03 The summary statistics of all three choices of prior are very similar, which indicates that the model setup is not very sensitive to changes in the prior. The only noticeable difference is in λ5 of the exponential prior—slightly more skewed than for the other priors. This skewness 43 can be attributed to the likelihood not being as influential as in the rest of the intervals, making the prior set the shape of the posterior. Figure 5.1 Comparison of the survivor function with the Kaplan-Meier estimate for a normal prior Another way of assessing model quality is by plotting the survival function with its confidence bounds on top of the Kaplan-Meier estimate, as shown in Figures 5.1 to 5.3. For all three priors, the survival functions look identical up until 2500 days, but they differ in how they behave at the tail. Using a normal prior, we get the smoothest tail, even leaving a portion of the Kaplan-Meier estimate outside the 95% confidence bounds. The tail of the gamma prior is slightly less smooth, yet it completely encloses the Kaplan-Meier estimate. Lastly, with the exponential prior, we observe the roughest tail, with higher failure rate estimates in the last two intervals. Tail robustness shown by the distribution with normal priors is desirable, and usually, control actions are taken to ensure this [33]. We will choose the model with the normal prior as the final model, although the gamma prior could also have been chosen. 44 Figure 5.2 Comparison of the survivor function with the Kaplan-Meier estimate for a gamma prior Figure 5.3 Comparison of the survivor function with the Kaplan-Meier estimate for an exponential prior 45 5.2.3 Expected energy production with parameter uncertainty Accounting for parameter uncertainty using a Bayesian time-to-failure model is straight- forward. From the generated posterior samples, we can select the set of parameters that define the confidence region of interest. For the sake of simplicity, we will assume a two- sided equal-tailed confidence interval of 95%; thus, we will have a set of parameters for the 2.5% bound and another for the 97.5%. Thereafter, we compute the expected energy pro- duction for each of these two sets of parameters, following the same procedure outlined in Chapter 4. The numerical results presented next are based on the case study used in Chapter 4. Figures 5.4 to 5.6 depict the expected energy production for the test system, assuming an inverter mean time to repair of 40, 60, and 105 days. Figure 5.4 Expected energy production with a 40-day mean time to inverter repair. Figures 5.4 to 5.6 suggest that even though there is significant uncertainty associated with model parameters, its effects on the expected energy production of the system are not significant, due to the rapid execution of repairs. Nevertheless, it is important to acknowledge the ease with which it is possible to account for parameter uncertainty in this Markov-based 46 Figure 5.5 Expected energy production with a 60-day mean time to inverter repair. Figure 5.6 Expected energy production with a 105-day mean time to inverter repair. 47 reliability analysis using a Bayesian approach for time-to-failure modeling. 5.3 Interval-based failure rate analysis The failure rate is a crucial metric for non-repairable components, not only because it provides a more intuitive interpretation of the failure process but also because it represents the conditional probability of failure given survival up to a certain point in time. Thus, knowing its value can contribute to more effective risk management [41]. The objective of the failure modeling so far has been to find a compact model for in- tegration into a system reliability model, and as such, the behavior of the failure rate has been simplified as much as possible to reduce the complexity of the model. Now, we aim to capture the failure rate behavior with arbitrary granularity. Fitting such a model using MLE is challenging because it uses limited information to estimate the failure rate for each interval. For instance, we will use MLE to find a yearly failure rate model with change points every year from the first to the eighth year. Figure 5.7 illustrates the goodness of fit of this model in terms of the survival function. It is evident that the model overestimates the failure rates, yielding an overly pessimistic outlook. To further illustrate, consider a model with change points placed every 100 days; the fit of this model is depicted in Figure 5.8, and as expected, it performs even worse. The reason why MLE is not stable is because it uses only local information to estimate the failure rate, and since the data is spread unevenly, it is not possible to provide reasonable estimates in every interval, especially those in which data is very scarce. This instability could be overcome by including a mechanism for exchanging information across intervals or by implementing some form of regularization. Using a simple Bayesian approach, as demonstrated in the previous section, does provide a good model for yearly intervals, shown in Figure 5.9. However, decreasing the interval spacing to 100 days results in a model with wide and noisy confidence bounds, as seen in Figure 5.10. Also, a model with this many change points requires the specification of numerous priors, increasing the modeling burden. 48 Figure 5.7 MLE estimation of PWE distribution: change points every year Figure 5.8 MLE estimation of PWE distribution: change points every 100 days 49 Figure 5.9 Bayesian estimation of PWE distribution: change points every year Figure 5.10 Bayesian estimation of PWE distribution: change points every 100 days 50 5.3.1 Relating Consecutive Intervals Using a Bayesian framework and probabilistic programming tools like Stan, we can model complex interactions between model parameters to find stable estimates of the failure rate for any granularity of change point placement. Essentially, we aim to relate failure rates across intervals in a manner that leverages the underlying nature of the failure process—a gradual degradation. We propose relating consecutive rates using equation 5.3; assuming that the next rate is normally distributed with mean given by the previous rate discourages large variations. Now, information flows across intervals, strengthening the failure rate estimation for intervals that lack sufficient data to produce a reliable estimate using local data alone. Another advantage of this approach is that only one prior is needed: the initial failure rate; thereby reducing the complexity of prior sensitivity analysis. λi ∼ N (λi−1, σ2) i = 2, . . . , r + 1, (5.3) where σ is a tuning parameter. 5.3.2 Model with Change Points Every 100 Days The change points are now placed every 100 days until 3100 days, thus the model includes a total of 31 change points. One aspect to keep in mind is that there are intervals where no failures occurred. The prior for the initial rate is N (4.1096 × 10−4, 2.25 × 10−8)—similar to what we assumed for the compact model—and σ is 1 × 10−4. The survivor function, plotted alongside the Kaplan-Meier estimate, is depicted in Figure 5.11; the model exhibits good visual agreement with the Kaplan-Meier estimate and tighter, smoother confidence bounds. The failure rate estimates are shown in Fig. 5.12. 5.3.3 Model with change points every 50 days The granularity of the change points can be further increased without compromising the quality of the resulting distribution. Now, we present results for an even placement every 50 days-a more complex task due to the increased number of intervals with zero failures. The prior for λ1 and the value of σ are the same as in the previous case. 51 Figure 5.11 Survivor function of PWE distribution with changepoints placed every 100 days Figure 5.12 Failure rate estimates of PWE distribution with change points placed every 100 days 52 Figure 5.13 Survivor function of PWE distribution with changepoints placed every 50 days Figure 5.14 Failure rate estimates of PWE distribution with change points placed every 50 days 53 Figures 5.13 and 5.14 depict the survivor function plotted against the Kaplan-Meier estimate and the failure rate estimates, respectively, confirming that robust estimates can also be obtained using this approach for change points every 50 days. 54 CHAPTER 6 CONCLUSIONS In this thesis, three main topics have been discussed: developing reliability models from scarce and censored data, converting a non-exponential reliability model into a Markov model amenable to conventional reliability analysis, and leveraging Bayesian probability to account for parameter uncertainty and provide detailed estimates of failure rates. Using field data instead of test data comes with many challenges, such as insufficient sam- ple sizes and censoring, which complicate the modeling process. Moreover, when the data do not follow typical behavior, employing well-known probability distributions in power systems reliability—such as Weibull, exponential, or log-normal—is not possible. To overcome this, we proposed using a piecewise exponential distribution, more commonly found in settings like clinical trials, which can inherently accommodate all kinds of failure rate behaviors and is more suitable for fitting the data at hand. We found that not only did it provide a good fit, but it also related to the notion of viewing the failure process as a progression through discrete stages, which is the basis for state-space reliability methods. However, unlike most reliability models in the power systems literature, a model based on the PWE distribution was not memoryless, necessitating a more advanced modeling framework than continuous-time Markov chains. The model was then framed as a Semi-Markov process. By solving the Markov renewal equation, we demonstrated how the availability of the string inverter changes over time when an exponential time to restore is assumed. However, this raised the concern that a whole-system model could not be framed as a Semi-Markov process due to the absence of a global clock to track progress. The conversion of a non-exponential process into an exponential model is referred to as Markovization, which is based on the use of phase-type distributions. By using these distributions, we approximated each deterministic transition of the Semi-Markov process as a series of exponentially distributed stages, all with the same rate; this sum of exponential stages can be represented by an Erlang distribution. Using the expressions for the mean 55 and variance of an Erlang distribution, we determined the rate corresponding to a number of stages, which ultimately controls the accuracy of the approximation. Numerical results show that using a model with 252 states provides a sufficiently accurate representation for long-term analysis. In Chapter 2, we briefly discussed the modeling of small PV systems, focusing on res- idential systems due to their simplicity in design and the economic rationale behind using string inverters. We proposed decomposing the system model into independent subsystems comprised of one string inverter and the solar panels connected to it, in line with how mon- itoring systems operate and repairs are carried out. Similarly, for microinverter systems, which are recognized for their higher reliability, we argued that a subsystem can consist of the microinverter and the connected solar panel. This arrangement not only simplified our analysis but also lets us compare the reliability of string inverters with that of microinverters. Having developed a model for the string inverter, we constructed a completely memory- less subsystem model, suitable for Markov-based reliability analysis. Although reliability is traditionally compared in terms of availability, it is more appropriate to compare the string inverter and microinverter subsystems in terms of expected energy production. For this pur- pose, we utilized Markov Reward Models, where the rewards represent energy production. Our results indicate that the energy production of microinverters is not sensitive to changes in the mean time to repair, unlike that of string inverters. The last chapter of this thesis adopted a Bayesian approach to the estimation procedure outlined in Chapter 3. By doing so, we were able to provide confidence bounds for the en- ergy production of the string inverter subsystem without any major changes to the modeling framework itself. Applied to the case study of Chapter 4, we see that, although there is significant uncertainty in the model parameters, most of it is absorbed by the promptness of the repairs. Another strength of Bayesian probability is how it allows for flexible model representation, enabling the modeling of complex interactions between model parameters. Leveraging this approach, we obtained failure rate estimates with minimal prior informa- 56 tion for arbitrarily placed changepoints, confirming the bathtub-shaped behavior previously suggested by the cumulative failure rate. This analysis is not only valuable for risk assess- ment and reliability optimization but also serves as an initial method to describe the failure process, facilitating subsequent simplifications based on observed behaviors. Data on the operation of small PV systems is not widely available, which limits our understanding of their actual reliability. In this work, we attempted to characterize the behavior of string inverters using a small dataset collected from a single region in the U.S. However, this dataset is by no means representative of all string inverters, which constrained the depth of the insights we could draw. To support more conclusive research in this area, a more comprehensive data collection platform must be established. A robust reliability assessment would require access to additional information such as voltage and current waveforms, environmental parameters like temperature and humidity, and the internal temperature of the device. Moreover, greater effort must be made to accu- rately categorize failure causes. This would enhance reliability analysis and enable root-cause investigations that improve failure modeling. Such information would not only deepen our understanding of component reliability but also allow us to leverage failure data for prog- nostic purposes—ultimately enabling the prediction of failures and the implementation of corrective measures to prevent them. 57 BIBLIOGRAPHY [1] T. Gunda, S. Hackett, L. Kraus, C. Downs, R. Jones, C. McNalley, M. Bolen, and A. Walker, “A machine learning evaluation of maintenance records for common failure modes in PV inverters,” IEEE Access, vol. 8, pp. 211 610–211 620, 2020. [2] T. J. Formica, H. A. Khan, and M. G. Pecht, “The effect of inverter failures on the return on investment of solar photovoltaic systems,” IEEE Access, vol. 5, pp. 21 336–21 343, 2017. [3] D. Feldman, J. Zuboy, K. Dummit, D. Stright, M. Heine, S. Grossman, and R. Margolis, “Spring 2024 Solar Industry Update,” National Renewable Energy Laboratory, Tech. Rep. NREL/PR-7A40-90042, Jun. 2024. [4] Solar Energy Industries Association, “Solar Market Insight Report Q1 2024,” 2024, accessed: 2024-07-16. [Online]. Available: https://www.seia.org/us-solar-market-insig ht [5] A. Golnas, “PV System Reliability: An operator’s perspective,” IEEE Journal of Pho- tovoltaics, vol. 3, no. 1, pp. 416–421, 2013. [6] T. Gunda and R. Homan, “Evaluation of component reliability in photovoltaic systems using field failure statistics,” Sandia National Lab. (SNL-NM), Albuquerque, NM (United States), Tech. Rep., 09 2020. [Online]. Available: https://www.osti.gov/bibli o/1660804 [7] G. T. Klise, O. Lavrova, “PV System Component Fault and Failure Compilation and Analysis,” Sandia National Lab. (SNL-NM), Albuquerque, NM (United States), Tech. Rep., 02 2018. [Online]. Available: https://www.osti.gov/biblio/1424887 and R. L. Gooding, [8] D. C. Jordan, B. Marion, C. Deline, T. Barnes, and M. Bolinger, “PV field reliability status—Analysis of 100 000 solar systems,” Progress in Photovoltaics: Research and Applications, vol. 28, no. 8, pp. 739–754, 2020. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/pip.3262 [9] T. Doyle, R. Desharnais, and M. Mills-Price, “2019 PV Inverter Scorecard,” PVEL LLC, Technical Report, 2019. [Online]. Available: https://www.pvel.com/inverter-scorecard/ [10] S. Peyghami, M. Fotuhi-Firuzabad, and F. Blaabjerg, “Reliability evaluation in micro- grids with non-exponential failure rates of power units,” IEEE Systems Journal, vol. 14, no. 2, pp. 2861–2872, 2020. [11] J. Cheng, Y. Tang, and M. Yu, “The reliability of solar energy generating system with inverters in series under common cause failure,” Applied Mathematical Modelling, vol. 68, pp. 509–522, 2019. [Online]. Available: https://www.sciencedirect.com/scienc e/article/pii/S0307904X18305687 58 [12] M. Theristis and I. A. Papazoglou, “Markovian reliability analysis of standalone pho- tovoltaic systems incorporating repairs,” IEEE Journal of Photovoltaics, vol. 4, no. 1, pp. 414–422, 2014. [13] S. V. Dhople, A. Davoudi, P. L. Chapman, and A. D. Dom´ınguez-Garc´ıa, “Integrating photovoltaic inverter reliability into energy yield estimation with markov models,” in 2010 IEEE 12th Workshop on Control and Modeling for Power Electronics (COMPEL), 2010, pp. 1–5. [14] X. Yu and A. M. Khambadkone, “Reliability analysis and cost optimization of parallel- inverter system,” IEEE Transactions on Industrial Electronics, vol. 59, no. 10, pp. 3881–3889, 2012. [15] Wenyuan Li, “Incorporating aging failures in power system reliability evaluation,” IEEE Transactions on Power Systems, vol. 17, no. 3, pp. 918–923, Aug. 2002. [Online]. Available: http://ieeexplore.ieee.org/document/1033745/ [16] H. Kim and C. Singh, “Reliability modeling and simulation in power systems with aging characteristics,” IEEE Transactions on Power Systems, vol. 25, no. 1, pp. 21–28, 2010. [17] P. Jirutitijaroen and C. Singh, “The effect of transformer maintenance parameters on reliability and cost: a probabilistic model,” Electric Power Systems Research, vol. 72, no. 3, pp. 213–224, 2004. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S0378779604001129 [18] H. Toftaker and I. B. Sperstad, “Integrating component condition in long- https: term power system reliability analysis,” Jun. 2022. //www.techrxiv.org/doi/full/10.36227/techrxiv.19977497.v1 [Online]. Available: [19] J. H. J¨urgensen, Condition-based Failure Rate Modelling for Individual Components in the Power System. Stockholm: KTH Royal Institute of Technology, 2016. [20] Hagkwen Kim and C. Singh, “Reliability Modeling and Simulation in Power Systems With Aging Characteristics,” IEEE Transactions on Power Systems, vol. 25, no. 1, pp. 21–28, Feb. 2010. [Online]. Available: http://ieeexplore.ieee.org/document/5247018/ [21] S. Peyghami, F. Blaabjerg, and P. Palensky, “Incorporating power electronic converters reliability into modern power system reliability analysis,” IEEE Journal of Emerging and Selected Topics in Power Electronics, vol. 9, no. 2, pp. 1668–1681, 2021. [22] A. Sangwongwanich and F. Blaabjerg, “Reliability assessment of fault-tolerant power converters including wear-out failure,” in 2022 IEEE Applied Power Electronics Con- ference and Exposition (APEC), 2022, pp. 300–306. [23] A. Lisnianski, I. Frenkel, and Y. Ding, Multi-state System Reliability Analysis and Optimization for Engineers and Industrial Managers. London: Springer London, 2010. [Online]. Available: http://link.springer.com/10.1007/978-1-84996-320-6 59 [24] K. S. Trivedi and A. Bobbio, Reliability and Availability Engineering: Modeling, Anal- ysis, and Applications. Cambridge University Press, 2017. [25] A. Garro and F. Barrara, “Reliability Analysis of Residential Photovoltaic Systems,” RE&PQJ, vol. 9, no. 1, 2011, number: 1. [Online]. Available: https://www.repqj.com [26] A. M. Mustafa, W. A. Omran, Y. G. Hegazy, and M. Abu-Elnaga, “Reliability assess- ment of grid connected photovoltaic generation systems,” in 2015 International Confer- ence on Renewable Energy Research and Applications (ICRERA), 2015, pp. 1543–1549. [27] S. V. Dhople and A. D. Dominguez-Garcia, “Estimation of photovoltaic system relia- bility and performance metrics,” IEEE Transactions on Power Systems, vol. 27, no. 1, pp. 554–563, 2012. [28] M. Perdue and R. Gottschalg, “Energy yields of small grid connected photovoltaic IET Renewable [Online]. Available: component effects of system: Power Generation, vol. 9, no. 5, pp. 432–437, https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/iet-rpg.2014.0389 reliability and maintenance,” 2015. [29] Enphase, “Reliability of Enphase Microinverters,” Enphase Energy, Inc., Technical Report, 2021. [Online]. Available: https://enphase.com/download/reliability-enphase -microinverters-tech-brief?srsltid=AfmBOorK5ocHPKOgsMKiRcomFhXj62P-MqQ7r 3maaueOxk 8SHXM5BL [30] U.S. Energy Information Administration, “EIA Form 861: Annual Electric Power Industry Report,” U.S. Energy Information Administration, Data Report, 2022. [Online]. Available: https://www.eia.gov/electricity/data/eia861/ [31] M. Rausand and A. Høyland, System Reliability Theory: Models, Statistical Methods, and Applications, 2nd ed. Wiley-Interscience, 2004. [32] C. Ebeling, An introduction to reliability and maintainability engineering. McGraw- Hill, 2004. [33] T. Xu and R. Wen, “PWEXP: An R Package Using Piecewise Exponential Model for Study Design and Event/Timeline Prediction,” 2024. [Online]. Available: https://arxiv.org/abs/2404.17772 [34] K. Gaurav, V. Kumar, and B. K. Singh, “Dependability analysis of a system using state- space modeling techniques: A systematic review,” IEEE Transactions on Reliability, vol. 72, no. 4, pp. 1340–1354, 2023. [35] A. Cohen, Numerical Methods for Laplace Transform Inversion. Springer US, 2007. [36] M. Colledani, A. Ratti, and C. Senanayake, “An approximate analytical method to evaluate the performance of multi-product assembly manufacturing systems,” Procedia CIRP, vol. 33, pp. 357–363, 2015, 9th CIRP Conference on Intelligent Computation in Manufacturing Engineering - CIRP ICME ’14. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2212827115007258 60 [37] M. S. Hamada, A. G. Wilson, C. S. Reese, and H. F. Martz, Bayesian Reliability, 1st ed. Springer New York, NY, 2008. [38] R. McElreath, Statistical Rethinking: A Bayesian Course with Examples in R and Stan, 2nd ed. Chapman Hall/CRC, 2020. [39] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin, Bayesian Data Analysis, 3rd ed. Chapman and Hall/CRC, 2025. [40] D. Gamerman, “Bayes estimation of the piece-wise exponential distribution,” IEEE Transactions on Reliability, vol. 43, no. 1, pp. 128–131, 1994. [41] M. Finkelstein, Failure Rate Modelling for Reliability and Risk. Springer London, 2008. 61