Chapter 17 – A Tiny Diagnosis Engine (TGDE)

The payoff chapter: take everything the TMS family gives us – assumptions, conflicts, nogoods – and turn it into a machine that figures out which part of a broken device is broken, and what to measure next. companion index

The big idea

You are handed a circuit. You apply known inputs, you measure some outputs, and one of the outputs is wrong. Something inside is faulted. Which component?

The naive answer is “try every possibility.” With n components, each either good or bad, that is 2^n candidate fault patterns to sift through. For any real device that number is hopeless. Chapter 17’s Tiny General Diagnostic Engine (TGDE) – a compact reconstruction of de Kleer and Williams’ General Diagnostic Engine (GDE) – shows how to do diagnosis without ever enumerating that lattice, by leaning on two ideas the book has been building toward all along:

Make “this component is healthy” an assumption. Give every component cᵢ a health assumption OK(cᵢ). Its normal behaviour (a gate’s truth table, a multiplier’s product) is a justification guarded by OK(cᵢ): the prediction only fires if you are assuming that component is good. This is the ATMS idea from Chapter 12, applied to hardware.
A wrong prediction is a conflict. When the values you predict (assuming some set of components are good) clash with the values you measure, the set of OK assumptions that produced the bad prediction cannot all be true at once. That inconsistent assumption set is a conflict – exactly the “nogood” of the LTMS and ATMS. Each conflict says “at least one component in this set is broken.”

Once you have a handful of conflicts, you do not need the lattice at all. A diagnosis is just a set of components that, when declared faulty, would explain every conflict – i.e. a set that intersects every conflict. That is the classic hitting-set problem, and it is the entire engine: collect conflicts as observations arrive, take their minimal hitting sets, and you have your candidate faults.

TGDE adds one more layer on top, which is what makes it feel intelligent rather than merely correct: it chooses what to measure next. Each candidate diagnosis gets a probability (from per-component prior fault rates), and TGDE picks the measurement whose outcome is expected to shrink the uncertainty the most – the one with the highest expected information gain. That is the loop: predict, find conflicts, rank diagnoses by probability, pick the most informative measurement, measure, repeat – until one diagnosis dominates.

Scope note. TGDE is not implemented in this Python package. We ship the JTMS/LTMS family (justification- and clause-based, single-context truth maintenance), not the ATMS, GDE/TGDE, interpretation construction, or the probabilistic measurement selector that TGDE rides on. So this chapter is a conceptual tour: it explains the machinery, walks the worked examples, and summarises the exercise analysis in ../exercises/ch17/README.md. The “Try it in this repo” section points you at the closest runnable shadows we do have.

Concepts, step by step

The vocabulary of model-based diagnosis

It helps to fix the standard terms (the Reiter / de Kleer-Williams framing) up front. A device is a set of n components {c₁, …, cₙ}, each with a behavioural model and a health assumption OK(cᵢ). The observations OBS are the inputs you applied plus the values you measured. Then:

A diagnosis is a way of declaring some components faulty so that the rest being good is consistent with OBS. Writing a diagnosis as the set Δ of components it calls faulted, Δ ⊆ {c₁,…,cₙ} is a diagnosis iff
```
MODEL ∧ OBS ∧ (⋀_{c∈Δ} ¬OK(c)) ∧ (⋀_{c∉Δ} OK(c))
```
is satisfiable. In a weak fault model (the usual TGDE setting) a faulted component simply imposes no constraint – “it could be doing anything” – so declaring a component faulty can only ever relax the equations, never add new ones.
A minimal diagnosis is a diagnosis Δ no proper subset of which is also a diagnosis: you cannot exonerate any of its accused components and still explain the observations.
A minimal-cardinality diagnosis has the fewest faulted components of any diagnosis. (Every minimal-cardinality diagnosis is minimal – see Exercise 2 – but not vice versa.)
A conflict (de Kleer’s nogood) is a set of components that cannot all be OK given OBS: MODEL ∧ OBS ⊢ ¬(⋀_{c∈S} OK(c)). A minimal conflict has no proper-subset conflict.

The duality that runs the engine: conflicts ⇄ diagnoses

Here is the pivot the whole chapter turns on:

Δ is a diagnosis iff Δ is a hitting set of the collection of conflicts – that is, Δ shares at least one element with every conflict. Minimal diagnoses are the minimal hitting sets of the minimal conflicts.

Why it must be true: a conflict S says “at least one component in S is bad.” If your proposed fault set Δ missed S entirely (shared no element with it), then every component of S would be outside Δ, hence declared good – but S says they cannot all be good. Contradiction. So a valid diagnosis must “hit” every conflict; and a minimal diagnosis hits them all using as few accused components as possible.

This is the engine in one sentence: GDE/TGDE never enumerates the 2^n lattice. It collects minimal conflicts (which are small) and computes their minimal hitting sets (which are also small in the common case). The lattice is exponential (Exercise 1); the conflicts and the hitting sets usually are not.

The diagnosis lattice and why we avoid it

Picture all 2^n candidate fault sets as the Boolean lattice ordered by ⊆, from “all OK” (∅) at the bottom to “all faulted” at the top. Under a weak fault model the diagnoses form an upward-closed region (a filter): if Δ is a diagnosis then so is every superset of Δ (declaring more components faulty only relaxes more constraints, so it stays consistent). That is why we only ever care about the minimal diagnoses – they are the lower frontier of the filter, and they implicitly describe all the rest.

Even that frontier can be exponential in the worst case (by Sperner’s theorem an antichain in 2^n can hold up to C(n, ⌊n/2⌋) ≈ 2ⁿ/√n sets). So we never materialise it eagerly; we generate it conflict-driven and probability-pruned.

How conflicts get discovered: guarded prediction

The conflicts do not arrive by magic; they fall out of constraint propagation over the guarded models, exactly as an ATMS would do it:

Each input value and each measured value is a fact.
Each component’s behaviour is a justification guarded by its OK assumption. A good multiplier M1 computing m1 = a·c justifies the value of m1 with OK(M1) in its support.
Propagate. Every derived value carries, in its ATMS-style label, the set of OK assumptions that were used to derive it.
When two derivations reach the same measurement point with different values – or a prediction disagrees with a measurement – the union of the OK assumptions behind the clash is a conflict. Those components cannot all be good.

That last step is the heart of it. A single faulty output, traced back through its support, hands you a ready-made conflict for free.

Single-valued predictions (why a clash is the signal)

A nice invariant (Exercise 3) keeps this honest: within a single diagnosis, each measurement point has at most one predicted value. Fixing a diagnosis fixes which components are good, and good components are deterministic functions of their inputs; propagation through them is functional, so a point gets at most one value (it may get none if it depends on a faulted component). The moment a point would get two different values, the assumed-good set is internally inconsistent – which is exactly the definition of a conflict, and means that set was never a diagnosis to begin with. Multivaluedness and diagnosis-hood are mutually exclusive: a double-valued point is precisely the engine’s alarm bell.

Maintaining the diagnoses incrementally

As measurements arrive, conflicts accumulate, and the minimal hitting sets need updating. Exercise 4 spells out the standard incremental update (Reiter’s HS-DAG / de Kleer’s hitting-set maintenance), and the key engineering choice is to store the small faulted sets, never the large good sets. A diagnosis is described by its handful of accused components, not by the n−k components it leaves alone:

D := { {} }                       # the empty fault set hits the empty conflict set
for each minimal conflict C arriving:
    new_D := {}
    for Δ in D:
        if Δ ∩ C ≠ ∅:
            new_D.add(Δ)          # Δ already hits C -- keep it
        else:
            for x in C:
                new_D.add(Δ ∪ {x}) # extend Δ to hit C, one branch per element
    D := minimal(new_D)           # keep only ⊆-minimal sets

Memory and work scale with the (small) conflict sizes and diagnosis cardinalities, not with n. Worst case it is still exponential (minimal hitting set is NP-hard), but the typical case of a few small conflicts is exactly where this representation wins.

Probabilities and choosing the next measurement

Everything so far is pure logic. TGDE’s intelligence comes from putting numbers on it:

Priors. Assume faults are independent and rare, each component faulting with small probability p. Then a k-fault diagnosis has probability ∝ pᵏ, so probability falls off exponentially with cardinality. Single faults dominate; double faults are far less likely; and so on. This is why TGDE focuses on small diagnoses.
Posterior. Each measurement is consistent with some diagnoses and rules out others. Conditioning shifts probability mass toward the survivors and (after renormalising) away from those that disagreed with what you saw.
Measurement selection by expected information gain. For each candidate measurement point, group the current diagnoses by the value they predict there. The measurement’s outcome will reveal which group is true, collapsing the rest. Score each candidate by how much it is expected to reduce the entropy of the diagnosis distribution:
```
score(point) = H(current Δ distribution) − E_outcome[ H(Δ | value at point) ]
```
Pick the highest-scoring point (or, with measurement costs, the best benefit/cost ratio). A point where all diagnoses agree teaches you nothing (zero gain); a point that cleanly splits the high-probability diagnoses in half teaches you the most. Measure it, fold the result in as a new observation, and loop.

This greedy, one-step-lookahead criterion is cheap and usually excellent – but it is greedy, and that has a cost (Exercise 7).

The examples, explained

The chapter’s running example is de Kleer’s polybox (sometimes “polycell”), the canonical model-based-diagnosis circuit. It is worth carrying in your head because every concept above lights up on it.

The polybox: the circuit

Five inputs a, b, c, d, e; three multipliers and two adders:

M1: m1 = a · c
M2: m2 = b · d
M3: m3 = c · e
A1: x  = m1 + m2
A2: y  = m2 + m3

The shared signal m2 (feeding both adders) is what makes it interesting: a fault in M2 shows up in both outputs, while M1 only affects x and M3 only affects y.

Worked example A: a single wrong output yields conflicts

Apply a = b = c = d = e = 3. Then the good-circuit predictions are m1 = m2 = m3 = 9, x = 18, y = 18. Now suppose you measure x = 10 while y = 12 matches expectation… (the precise numbers in the book differ; the shape is what matters). The mismatch at x means the components that cooperated to predict x cannot all be good:

x was predicted from m1 and m2 via A1, and m1 from M1, m2 from M2. So the support of the bad x prediction is {OK(M1), OK(A1), OK(M2)} – a conflict: at least one of M1, A1, M2 is broken.
A second derivation path (using the good output y to back out what m2 “should” be, then re-deriving x) produces another conflict that pulls in A2 and M3: roughly {OK(A1), OK(M2), OK(A2), OK(M3)}.

Those two minimal conflicts are everything the logic extracts from this single observation.

Worked example B: conflicts → minimal diagnoses by hitting sets

Take the minimal hitting sets of {M1, A1, M2} and {A1, M2, A2, M3}:

{A1} hits both (it is in each) – a single-fault diagnosis.
{M2} hits both – a single-fault diagnosis.
{M1} hits the first but not the second, so it must be paired with something from the second: {M1, A2}, {M1, M3} (and {M1, A1}, {M1, M2}, but those are subsumed by the single faults {A1}/{M2}).

So the minimal diagnoses are the single faults {A1} and {M2}, plus the double faults that combine M1 with a component from the second conflict. Notice how M2 and A1 jump out as the prime suspects: they are the components that participate in both conflicts, which is exactly what a hitting set rewards. This is the textbook polybox result, and it is recovered directly by the procedure of Exercise 9.

Worked example C: probability ranks them, then a measurement decides

With rare independent faults, the single faults {A1} and {M2} outrank every double fault (p ≫ p²), so they sit at the top of the list. To decide between them, TGDE looks for a measurement that they disagree about. The natural choice is an internal point like m2: under “M2 faulty” the value of m2 is suspect, while under “A1 faulty” m2 is fine and computable. Measuring m2 therefore splits the two leading hypotheses cleanly – maximal expected information gain – and one measurement collapses the diagnosis. That predict → conflict → hitting-set → rank → measure cycle, watched on the polybox, is TGDE.

Exercises walk-through

The full paraphrased problem statements and conceptual answers (derivations, complexity arguments, algorithm sketches, and design notes) are in ../exercises/ch17/README.md. There is no solutions.py to run – the chapter is analysis-only here. The highlights:

Ex 1 (★) – Counting diagnoses / lattice size. There are 2ⁿ candidate fault sets, and in the worst case (no constraining observations) every one is a diagnosis, so the number of diagnoses can be 2ⁿ. Even the minimal frontier can be exponential (Sperner: up to ≈ 2ⁿ/√n). The takeaway is the engine’s whole reason for being: never enumerate the lattice – go conflict-driven and prune by probability.
Ex 2 (★) – Minimal-cardinality ⟹ minimal. Clean proof by contradiction: if a least-cardinality diagnosis Δ had a proper subset diagnosis Δ′, then |Δ′| < |Δ| would beat the minimum – impossible. The converse fails: conflicts {a,b} and {a,c} give minimal diagnoses {a} (size 1) and {b,c} (size 2); both are subset-minimal but only {a} is minimal-cardinality. That gap is exactly why TGDE computes the two notions separately.
Ex 3 (★) – At most one predicted value per point per diagnosis. Covered above: a fixed diagnosis makes propagation functional, so each point gets at most one value; two values would be a conflict, ruling that set out as a diagnosis. Single-valuedness and diagnosis-hood are mutually exclusive.
Ex 4 (★★) – Compute minimal diagnoses directly over the small fault sets. The incremental hitting-set maintenance shown above (Reiter HS-DAG / de Kleer). The point of the exercise is representational: manipulate the small faulted sets, not the n−k-sized good environments the naive ATMS interpretation construction would carry.
Ex 5 (★★) – Make smallest-diagnoses cheap. Replace the exhaustive minimal-cardinality sweep with a best-first / probability-threshold strategy: pull candidate diagnoses in nondecreasing cardinality from a priority queue, and stop once the accumulated probability mass crosses a threshold (e.g. 0.99) or the next candidate’s prior drops below ε; cap cardinality at a small k_max (2 or 3 simultaneous faults). You trade completeness of the minimal-cardinality set for cost, but the entropy calculation only needs enough probability mass to pick a measurement, so the trade is almost always free in practice.
Ex 6 (★★) – Hierarchical models. Let a module (e.g. a full adder) carry a single OK(adder) assumption that is threaded into all its internal rules, so the diagnosis lattice is over modules, not leaf gates – shrinking n and the lattice dramatically. Support refinement on demand: once a module is implicated, re-instantiate it with per-gate assumptions linked by OK(module) ⇔ ⋀ OK(parts) and re-diagnose just inside it. Coarse-to-fine diagnosis: cheap module pass first, drill into suspects only.
Ex 7 (★★) – The horizon effect. Greedy one-step lookahead optimises against the current posterior, which (under rare-fault priors) is dominated by single faults. If the true fault is a double fault invisible at the single-fault-discriminating points, TGDE spends measurements crushing single-fault entropy, eliminates all single faults, and only then notices the double fault and measures the point that would have revealed it directly – using more measurements than an oracle that anticipated the double fault. The fix (multi-step lookahead or cardinality-hedging utilities) is more expensive, which is why TGDE accepts the horizon effect.
Ex 8 (★) – A misdiagnosis from the probability model, not the logic. TGDE assumes faults are independent and equiprobable, so a double fault gets ∝ p². If two components share a common cause (same power rail, same chip, an over-voltage event), reality makes P({c,d}) high, but TGDE scores it p², far below a single fault {a} that also explains the observations – so it confidently reports {a}. Every inference is sound and no diagnosis is missed; the probabilities are simply wrong. (Variant: even with independence, a too-uniform prior on a failure-prone part causes the same kind of error.) Re-running with a correct joint prior fixes it without touching a single rule.
Ex 9 (★★★) – A complete diagnostic procedure on the CLTMS. Because the CLTMS is a complete propositional reasoner, you can implement diagnosis straight from the definitions: encode each component’s behaviour as clauses guarded by OK(cᵢ), add inputs/measurements as units, and test diagnosis-hood by satisfiability of the corresponding environment. Extract minimal conflicts from the completed clause set (completeness guarantees none is missed) and take their minimal hitting sets. On the polybox this recovers the textbook {M1}, {A1}, {M2} diagnoses. Drawbacks: exponential lattice search, the (worst-case exponential) cost of computing prime implicates to make the LTMS complete, and no focusing – it does correctly what GDE does efficiently, a clean reference implementation rather than a practical engine. This is the exercise most directly connected to code we actually ship.
Ex 10 (★★★) – Choosing the best inputs (active diagnosis). When you cannot measure more internal points, choose new stimuli instead. Make an assumption for each possible value of each input cell, predict outputs over candidate input vectors, and score each vector by the same expected-information criterion. On the full adder this works beautifully – inputs are a, b, cin ∈ {0,1}, so there are only 2³ = 8 vectors, and the method reduces to classic digital test-vector generation. On the polybox it works in principle but the hint’s “one assumption per possible input value” does not scale: the cells are integer arithmetic with huge/unbounded domains, so you must discretise to a finite representative test set. Correct algorithm, tractable only on small finite input domains.

Try it in this repo

TGDE itself is not implemented here – there is no GDE, ATMS, interpretation construction, or probabilistic measurement selector in this package, and no exercises/ch17/solutions.py. So the honest answer is that Chapter 17 is conceptual in this repo, and the worked analysis lives in ../exercises/ch17/README.md.

That said, the mechanisms TGDE is built from – assumptions, propagation, contradictions, and conflict-driven pruning – are exactly what the LTMS/LTRE and CLTMS we ship do provide. Here is how to exercise the same ideas with the runnable machinery.

First the setup (from the repo root):

python -m venv .venv && . .venv/bin/activate
pip install -e ".[dev]"

1. Diagnosis in miniature. The closest GDE-shaped artifact in the repo is a one-fault lamp example. Asserting the fault disjunction and then ruling out one cause leaves the other – the single-context shadow of “conflicts narrow the candidate diagnoses”:

python examples/run_kb.py examples/kb/diagnosis.kb

which runs (../examples/kb/diagnosis.kb):

lamp off -> bulb dead | no power
lamp off
~ no power
expect bulb dead true

Asserting lamp off introduces the disjunction of faults (a tiny conflict: “bulb dead OR no power”); ruling out no power forces the remaining cause. This is hitting-set reasoning with a single conflict of size two. The genuinely multi-fault, all-at-once version – predicting across every OK-combination simultaneously – is exactly what the ATMS/GDE add and what this package does not implement.

2. The “complete reasoner” route (Exercise 9), for real. Exercise 9’s procedure builds diagnosis on top of the complete LTMS, and that layer we do ship as ../src/ltms/cltms.py. You can encode component health as guarded clauses and watch complete derive the conflicts that plain BCP would miss:

from ltms import LTMS, Label, add_formula, complete

m = LTMS()
# Health assumptions for two components feeding one observed output.
ok_m1 = m.create_node("ok_m1", assumption=True)
ok_a1 = m.create_node("ok_a1", assumption=True)
bad_x = m.create_node("x_wrong", assumption=True)   # the observed bad output

# "If both are OK, the output is right" == "x cannot be wrong if both OK".
# i.e.  ~(ok_m1 & ok_a1 & x_wrong)   ->   ~ok_m1 | ~ok_a1 | ~x_wrong
add_formula(m, ("or", ("not", ok_m1), ("not", ok_a1), ("not", bad_x)), "behavior")

# Observe the bad output.
add_formula(m, bad_x, "measurement")

complete(m)   # saturate; the conflict {ok_m1, ok_a1} is now explicit
# Reading the labels shows the conflict: ok_m1 and ok_a1 cannot both stand.

This is the conflict-extraction core of Exercise 9 in miniature: completeness guarantees the conflict is found, and its minimal hitting sets ({ok_m1} and {ok_a1}) are the single-fault diagnoses. See Chapter 13 for how complete and prime implicates work.

3. Conflict-driven, nogood-learning search. The focused half of diagnosis – commit to a hypothesis, propagate, and on a contradiction learn a nogood so the dead combination is never re-explored – is exactly what our dependency-directed search does for real (../src/ltms/dds.py):

python examples/coloring_dds.py        # map colouring via dependency-directed search
python exercises/ch10/solutions.py     # N-queens counts via dd_search

Reading dds.py alongside Exercise 9’s analysis is the best way to feel the conflict/nogood mechanics that GDE leans on, even though GDE applies them in many contexts at once rather than one at a time.

The full test suite (which covers the LTMS, CLTMS, and DD-search machinery referenced above) stays green with:

pytest

For the genuine multiple-fault, all-contexts-at-once engine, see the scope notes in Chapter 12 (ATMS) and Chapter 14: STUDY-NOTES.md §9 sketches where the ATMS would attach (via the CLTMS’s tms_env, reading ATMS-style environment labels off prime implicates), which is the layer TGDE would sit on.

Takeaways

Diagnosis is the payoff application of truth maintenance. Make “component cᵢ is healthy” an assumption OK(cᵢ); make each component’s behaviour a justification guarded by that assumption. The whole chapter is the TMS family pointed at hardware.
A wrong prediction is a conflict. The set of OK assumptions behind a prediction that disagrees with a measurement cannot all be true – that is a nogood, and it says “at least one of these components is broken.”
Conflicts and diagnoses are dual. A diagnosis is a hitting set of the conflicts; minimal diagnoses are the minimal hitting sets of the minimal conflicts. This lets GDE/TGDE skip the 2ⁿ lattice entirely – it works with the small conflicts and small fault sets instead.
Store the faults, not the good components. Minimal diagnoses are described by their few accused components; maintaining them incrementally keeps the data small (Exercise 4).
Probability turns logic into strategy. Rare independent faults make diagnosis probability fall off exponentially with cardinality, so single faults dominate; TGDE picks the next measurement by expected information gain, measuring the point that best splits the leading hypotheses.
Greedy and simple has known failure modes. One-step lookahead can waste measurements when the true fault is a higher-cardinality one (the horizon effect, Ex 7), and the independence/equal-prior assumptions can produce a confident misdiagnosis even when every inference is sound (Ex 8). These are features of the cost/accuracy trade, not bugs.
The complete-reasoner version is the clean reference; GDE is the efficient engine. Exercise 9 builds diagnosis directly on the CLTMS – correct but exponential and unfocused. GDE’s ATMS-with-priors approach does the same thing efficiently by avoiding full completion and exhaustive search.
In this repo it is conceptual. The runnable shadows are the lamp diagnosis.kb, the CLTMS for the Exercise-9 conflict-extraction route, and dependency-directed search for the nogood-learning mechanics; the full analysis is in ../exercises/ch17/.

Back to the companion index · previous: Chapter 14 — Putting the ATMS to Work · related: Chapter 12 — ATMS, Chapter 13 — CLTMS.