Why this book still matters

In 1978, Nowlan and Heap proved – with United Airlines data – that the old assumption "the older a machine gets, the more likely it is to fail" is wrong for most equipment. Their report created Reliability-Centered Maintenance, but it was written for aviation. What John Moubray did in RCM II (first edition 1991, second edition 1997) was translate that research into a method any industrial site can run: mining, petrochemical, pharma, utilities, manufacturing. He and the Aladon network applied it on hundreds of sites in dozens of countries.

Three decades on, the book holds up for a simple reason: it isn't about technology, it's about decision logic. Sensors, IIoT and AI have changed how we collect condition data – they haven't changed the questions you must answer before spending a maintenance dollar. RCM II is where those questions were written down properly. Modern RCM standards (SAE JA1011) essentially formalize Moubray's framework.

Core idea

Maintenance is not about preserving equipment. It's about preserving what the equipment does. You don't maintain a pump – you maintain its ability to deliver 400 m³/h to the boiler. Once you frame it that way, everything else in the method follows.

The 7 questions of RCM

The whole method is a disciplined walk through seven questions, asked in order, about any physical asset in its operating context. If your maintenance plan can't answer them, it's guesswork.

  1. Functions – What are the functions and associated performance standards of the asset in its present operating context?
  2. Functional failures – In what ways can it fail to fulfil its functions?
  3. Failure modes – What causes each functional failure?
  4. Failure effects – What happens when each failure occurs?
  5. Failure consequences – In what way does each failure matter?
  6. Proactive tasks – What can be done to predict or prevent each failure?
  7. Default actions – What should be done if a suitable proactive task cannot be found?

Notice the order. Tasks come last. Most PM programs are built the other way around – someone starts from a list of tasks (often copied from the OEM manual) and works backwards to justify them. RCM II forbids that: you can't choose a task until you know which failure mode it addresses and why that failure matters.

The vocabulary that fixed our conversations

Before RCM, "failure" meant whatever the speaker wanted it to mean. Moubray's definitions are now the industry standard – and most failed FMECA workshops fail because teams skip them.

TermDefinitionExample (boiler feed pump P-101)
FunctionWhat the user wants the asset to do, with a quantified performance standard.Deliver 400 m³/h of feedwater at 80 bar.
Functional failureInability to meet a performance standard. One function can fail in several ways.Delivers less than 400 m³/h; delivers nothing at all.
Failure modeThe event that causes the functional failure – specific enough to act on.Impeller worn by erosion; mechanical seal leaks; coupling misaligned.
Failure effectWhat happens when the mode occurs: evidence, downtime, damage, safety impact.Flow drops over weeks; boiler trips; 6 h to replace impeller.
Failure consequenceWhy it matters: hidden, safety/environmental, operational, or non-operational.Operational – lost production while the standby pump carries the load.

Field tip

The phrase "in its present operating context" carries half the value of the book. The same pump model needs a different strategy as a duty pump in a refinery than as a standby pump in a water plant. Copying maintenance plans between assets without comparing contexts is one of the most common – and expensive – mistakes in industry.

The six failure patterns

The most quoted diagram in reliability. Studies in civil aviation (Nowlan & Heap, reproduced and popularized by Moubray) showed that complex equipment follows six distinct patterns of conditional probability of failure over time (simply put: how likely a survivor is to fail now, at each age) – and most of them have no wear-out zone at all.

A4% Bathtub: infant mortality + wear-out B2% Constant, then distinct wear-out C5% Steady increase, no wear-out point D7% Quick rise, then constant E14% Random: constant at all ages F68% Infant mortality, then constant Only ~11% of failure modes (A + B + C) benefit from an age limit. ~89% (D + E + F) show no age-related wear-out — fixed-interval overhauls cannot prevent them, and pattern F means overhauls can even cause failures. y: conditional probability of failure · x: operating age — percentages from the UAL aircraft study (Nowlan & Heap, 1978)
Fig. 1 — The six failure patterns. Redrawn by Rob Reliability after Nowlan & Heap (1978) and Moubray, RCM II (1997). Percentages are from the original civil aviation study; your plant's mix will differ, but the message survives: age is a poor predictor for most failure modes.

Think of your car: you replace the tyres when the tread wears down, because tyres genuinely wear out with use (pattern B). But you'd never replace the radio every three years "just in case" – it either fails randomly or it doesn't (pattern E). The aviation study's uncomfortable finding is that most industrial equipment behaves like the radio, not the tyres – yet most PM programs are built as if everything were a tyre.

The implication shook the industry: if most failures are not age-related, then most time-based overhauls and replacements are wasted – or worse, harmful, because intrusive maintenance reintroduces infant mortality (pattern F). The rational response is to detect failures developing (condition monitoring), design them out, or – when consequences are tolerable – deliberately let them happen. That is also why the P-F curve became central to modern strategy: see our P-F & D-I-P-F summary.

Consequences decide, not failures

The second big idea of RCM II: a proactive task is only worth doing if it deals with the consequences of the failure better than living with them. Moubray sorts every failure mode into four consequence categories, each with its own decision rule.

Failure mode e.g. "mechanical seal leaks" Is the failure evident to operators? under normal circumstances NO · HIDDEN YES · EVIDENT Hidden Protective devices, standby systems, trips, alarms. No direct impact — until needed. Safety / Environment Could hurt someone or breach a regulation. Operational Affects output, quality, customer service or operating cost + repair. Non-operational Only the direct cost of repair. Failure-finding Periodic test to check the device still works; redesign if risk is intolerable. Task must reduce risk to a tolerable level — otherwise redesign is compulsory. Task must cost less than the consequences it prevents (production + repair), else run to failure. Task must cost less than the repair it avoids — usually run to failure.
Fig. 2 — Simplified consequence logic, redrawn after the RCM II decision diagram (Moubray, 1997, ch. 5–8). The full book version also sequences which proactive task types to consider: on-condition first, then scheduled restoration, then scheduled discard.

Two details make this logic sharper than most criticality matrices in use today:

  • Hidden failures get their own branch. A pressure relief valve that is stuck closed costs nothing – until the day it's needed. Moubray showed that hidden failures of protective devices can account for up to half the failure modes on a modern plant, and gave us failure-finding intervals as the systematic answer.
  • Safety is not traded against money. For safety and environmental consequences, the question is "does the task reduce risk to a tolerable level?" – never "is it cost-effective?". If no task does, redesign is compulsory, not optional.

Using RCM II in 2026

What to take as-is, and where to bring the method up to date.

Keep as-isUpdate with modern practice
The 7 questions and the vocabulary. Still the cleanest way to structure any maintenance strategy review or FMECA. Workshop economics. A classical RCM II analysis is workshop-heavy (weeks per system). Streamlined approaches and AI-assisted preparation cut that dramatically without losing the logic – analyses are pre-drafted from manuals and CMMS history, and the workshop validates instead of typing.
Consequence-driven task selection. The hidden-failure logic and the "safety is not negotiable" rule have aged perfectly. Condition monitoring options. The 1997 condition-monitoring chapters predate cheap wireless sensors, IIoT platforms and ML anomaly detection. The on-condition logic stands; the toolbox is 100× richer (see the P-F / D-I-P-F summary).
The six failure patterns as a vaccine against overhaul-everything thinking. Data over assumption. With decent CMMS history you can fit the actual failure behaviour (see the Weibull summary) instead of assuming a pattern.

Bottom line

If you only ever read one maintenance book, this is still the one. Read it for the logic, then deliver it with 2026 tools – that combination beats both the purists and the gadget-chasers.

References & further reading

This summary is original explanatory writing. All concepts belong to their authors – go to the sources.

  1. Moubray, J. Reliability-centred Maintenance (RCM II), 2nd edition. Butterworth-Heinemann, 1997. ISBN 978-0-7506-3358-1. Publisher page (Elsevier)
  2. Nowlan, F.S. & Heap, H.F. Reliability-Centered Maintenance. United Airlines / U.S. Department of Defense, 1978. Report AD-A066579. DTIC record (free)
  3. SAE International. JA1011 – Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes. SAE standard page
  4. NASA. Reliability-Centered Maintenance Guide for Facilities and Collateral Equipment, 2008 – a free, practical companion. Our summary · Original PDF

Disclaimer. This page is an independent educational summary written entirely in Rob Reliability's own words. It is not affiliated with, sponsored by or endorsed by the estate of John Moubray, Aladon, Elsevier or Butterworth-Heinemann. No text from the original book is reproduced; all diagrams are our own original illustrations of engineering concepts that are part of the public technical literature. Book titles, trade names and trademarks (including RCM2™) remain the property of their respective owners and are used solely to identify the work being discussed. If you are a rights holder and have any concern about this page, contact us at hello@robreliability.com and we will address it promptly.

Done for you

Want this applied to your site?

We run Moubray-grade RCM logic on your CMMS data at fleet scale: PM kill-list, optimized frequencies, CMMS-ready files.

Maintenance Strategy Optimization