Analyzing Asset Failures


Simulation modeling can improve O&M and capital-planning processes.

Fortnightly Magazine - May 2008

Electric utilities are faced with the challenge of managing a range of aging distribution assets that are critical to system reliability. They also are threatened with potentially huge costs as they seek to replace these assets over the coming years to maintain reliability. Making intelligent decisions about asset maintenance and replacement requires accurate information about the failure patterns of these assets over time. However, most data elements that could shed light on such patterns—asset condition, joint use, maintenance patterns, or results of stratified inspection—are not widely available. Still, utilities must forecast capital and O&M spending requirements each year, regardless of their understanding of such asset failures.

In addition to these gaps in data, a further lack of tools and processes make it difficult to support budget-allocation decisions. For the most part, capital funding decisions are being made using the simple but potentially inaccurate forecasting method of taking the average of asset failures over a certain period of time. Existing replacement or maintenance strategies don’t necessarily match the costs and reliability that will be experienced in several years.

To truly understand how an asset class fails over time, it’s essential that utility companies capture and store historical data about asset failures. A database with critical asset attribute elements can provide insight in to the pattern of failures and establish the framework for a probabilistic model that can replicate the failure patterns over time in a simulated environment. Such an approach can help utilities in their efforts to develop effective O&M and capital replacement strategies.

Predicting Survival

Engineers often try to generate survivor curves for assets. If a utility can track an asset group from the time the assets were placed into service until the of that asset-year group was removed from service, then analyses such as survivor curves can be applied to assets of that group.

However, survivor curves generally yield unrealistic models, because of difficulties in defining and capturing the data that would enable a cradle-to-grave analysis of asset-failure patterns. Further, by the time a survivor curve has been generated for an asset, newer technologies are usually replacing that asset.

Nevertheless, basic asset-life characteristics, such as when an asset was put in service and when it failed generally are available. At a strategic level, this basic data allows a robust analysis of failure patterns and provides insight for predicting how much capital will be required to replace failed assets. The key requirement is assessing the probabilistic nature of the failure; then one must understand the differences in mitigation options, and the associated reliability and financial impacts. This allows asset managers to clearly communicate and support their asset-replacement requirements to senior management and to regulators.

Condition-based asset modeling provides a way to scrutinize asset failure patterns and can enhance a probabilistic model. However, the benefits of this type of modeling can’t be realized without historical failure data. Condition-assessment programs largely have been viewed as O&M expenses subject to elimination or reduction to meet cost-cutting mandates. As a result, asset-condition data is unreliable, leaving utilities to make estimates and assumptions about replacement parameters, condition-based hazard functions and asset conditions.

Asset Failure Curves

Given the lack of cradle-to-grave data and information on asset conditions, a suitable alternative is to gather asset inventory information that includes, at a minimum, installation dates and, for failed assets, the corresponding failure dates. Indeed, many of the available asset data sets are structured in this way.

Establishing a clear definition of failure for an asset class is the initial step in developing an understanding of when assets fail. Analyzing the data in terms of in-service and failure dates allows a utility to see how asset failures are distributed by age, and at what age most failures are occurring. A unique failure frequency probability curve can be created for each asset class (see Figure 1).

Where possible, the asset-failure data are stratified according to multiple asset characteristics to create a separate curve for each subset of the failure. As an example, transformers as a class have a unique pattern of failure, but one failure frequency curve generalizes the analysis. If the manufacturer or load of the transformer can be shown to have a correlation to failures that is statistically valid in stratifying the data set, then multiple failure-frequency curves can be created from an initial single failure frequency curve for transformers. This in turn allows for a more detailed and robust analysis of failures and forecasting.

Such a failure analysis yields strong results if the historical data establishes trends of failures based on the data itself and not assumed parameters. These failure probabilities are captured over time on an annual basis, providing updated data and allowing for a probabilistic model that is extremely easy to use and produces results that are easily communicated to decision makers.

Weibull Analysis

A potential weakness in failure-frequency analysis is that newer asset technologies have not aged enough to provide thorough insight into the pattern of failure associated with the failure frequencies. A Weibull probability distribution analysis can help address this weakness.

While not easy to explain to those who don’t understand probability distributions or statistics, the Weibull distribution has been used by engineers for many years to model the randomness of asset failures and understand the probabilities, risk and mitigation possibilities (See Figure 2). A traditional two-parameter Weibull distribution can be configured to capture infant mortality of equipment, attrition (random events that can put an asset out of service, such as vehicles hitting a pole or cable dig-in), or failure due to aging. However, the Weibull distribution typically only captures one of these failure modes at a time. A unique attribute of the Weibull distribution is that the sum of Weibull distributions can be modeled as one distribution by adjusting the parameters and ensuring the relative importance of each element is captured as well. A four-paremeter Weibull distribution allows the entire life cycle of an asset to be modeled, including infant mortality, attrition, and age.

The Weibull distribution can be used to smooth failure-frequency curves and interpolate the probability of failure for an asset class in later years—even when that later data is missing, as it is for newer technology. An example of this is XLP cable. As one of the newer underground cable technologies, there are no assets that have reached 80 years of in-service time. Forecasting the failures of these assets based on historical failures ignores the fact that some of these cable elements could continue to survive long after the failure-frequency curve analysis indicates. The four-parameter distribution takes this into account.

Utilities have a choice in modeling aging asset-failure patterns. They can use simple failure frequency curves, derived purely from available data showing a ratio of failures to total inventory. An advantage of this method is that it’s relatively easy to explain. Alternately, they can use a fitted Weibull curve, which adds a theoretical dimension and is more complicated than a simple failure-frequency curve. This approach can yield greater precision, but the results easily can be misunderstood in discussions with decision makers.

Creating a Simulation Model

Understanding asset failure patterns can provide insight by itself, but generally doesn’t provide a vehicle for analyzing scenarios to mitigate failures. Taking failure analysis and Weibull distribution curves a step further, operators might consider the financial and reliability implications of run-to-fail strategies and proactive replacement approaches. These considerations require the capability to evaluate differing strategies, and, just as important, to compare the strategies to find the best solution. A rigorous asset-management and life-cycle analysis methodology depends on the ability to accurately evaluate competing asset-management strategies.

This type of temporal analysis can be accomplished with a probabilistic discrete-event simulation model. Such a model incorporates a time clock, so at the start of a simulation run, the model understands the characteristics of an asset and can age it over a given time horizon. By incorporating failure-frequency probabilities or set Weibull probability distributions, a simulation model can track failures, mitigate these failures in the model environment with evaluated strategies, and can predict future spending and reliability impacts.

A simulation model differs considerably from standard forecast techniques. It plays out the aging process and parallel events, such as replacement and maintenance, accordingly moving assets back and forth along the age spectrum. That allows users to predict failures far into the future, and also the age profile of the resulting inventory at different points in time, as well as financial and reliability implications associated with various failure-mitigation strategies.

For example, a run-to-failure strategy can defer some capital costs, but can impair system-reliability statistics, such as SAIDI and SAIFI. A proactive replacement strategy might show sharp near-term financial spikes in capital spending, but the programmed replacements would improve reliability as the assets are replaced prior to failure.

A simulation model can allow operators to evaluate a mixture of different strategies, in effect allowing them to experiment with asset-management programs. For example, a manager might consider reactively replacing failed assets with a new asset class, while at the same time proactively replacing a certain portion of the total inventory each year and doing maintenance on yet another portion.

In evaluating possible strategies, the two measures that utilities are most concerned with are cost and reliability. The dynamic interplay among failures, replacements and maintenance merit a view of these measures over several years. When they incorporate adequate field data, asset-failure curves and Weibull distribution analysis can provide insight into O&M strategies and aid in predicting failure rates and capital costs. Simulation models can provide additional insights into the projected rise and fall of costs over time, the evolution of a utility’s asset age profile, and the consequent improvement or degradation of reliability parameters like SAIFI and SAIDI. Using such tools, utility operators can conduct a more precise evaluation of possible maintenance and replacement strategies.

Ultimately, asset life-cycle analysis supports rate-case strategies for cost recovery. An approach that incorporates knowledge of failure-frequency patterns and enables modeling of different asset replacement and life-extension strategies can allow utilities to better understand and explain the implications of implementing different strategies for maintaining and replacing their aging distribution assets—both in terms of financial and reliability risks.