Chapter 17 – A Tiny Diagnosis Engine (TGDE)
The payoff chapter: take everything the TMS family gives us – assumptions, conflicts, nogoods – and turn it into a machine that figures out which part of a broken device is broken, and what to measure next. companion index
The big idea
You are handed a circuit. You apply known inputs, you measure some outputs, and one of the outputs is wrong. Something inside is faulted. Which component?
The naive answer is “try every possibility.” With n components, each either
good or bad, that is 2^n candidate fault patterns to sift through. For any real
device that number is hopeless. Chapter 17’s Tiny General Diagnostic Engine
(TGDE) – a compact reconstruction of de Kleer and Williams’ General
Diagnostic Engine (GDE) – shows how to do diagnosis without ever enumerating
that lattice, by leaning on two ideas the book has been building toward all
along:
-
Make “this component is healthy” an assumption. Give every component
cᵢa health assumptionOK(cᵢ). Its normal behaviour (a gate’s truth table, a multiplier’s product) is a justification guarded byOK(cᵢ): the prediction only fires if you are assuming that component is good. This is the ATMS idea from Chapter 12, applied to hardware. -
A wrong prediction is a conflict. When the values you predict (assuming some set of components are good) clash with the values you measure, the set of
OKassumptions that produced the bad prediction cannot all be true at once. That inconsistent assumption set is a conflict – exactly the “nogood” of the LTMS and ATMS. Each conflict says “at least one component in this set is broken.”
Once you have a handful of conflicts, you do not need the lattice at all. A diagnosis is just a set of components that, when declared faulty, would explain every conflict – i.e. a set that intersects every conflict. That is the classic hitting-set problem, and it is the entire engine: collect conflicts as observations arrive, take their minimal hitting sets, and you have your candidate faults.
TGDE adds one more layer on top, which is what makes it feel intelligent rather than merely correct: it chooses what to measure next. Each candidate diagnosis gets a probability (from per-component prior fault rates), and TGDE picks the measurement whose outcome is expected to shrink the uncertainty the most – the one with the highest expected information gain. That is the loop: predict, find conflicts, rank diagnoses by probability, pick the most informative measurement, measure, repeat – until one diagnosis dominates.
Scope note. TGDE is not implemented in this Python package. We ship the JTMS/LTMS family (justification- and clause-based, single-context truth maintenance), not the ATMS, GDE/TGDE, interpretation construction, or the probabilistic measurement selector that TGDE rides on. So this chapter is a conceptual tour: it explains the machinery, walks the worked examples, and summarises the exercise analysis in
../exercises/ch17/README.md. The “Try it in this repo” section points you at the closest runnable shadows we do have.
Concepts, step by step
The vocabulary of model-based diagnosis
It helps to fix the standard terms (the Reiter / de Kleer-Williams framing) up
front. A device is a set of n components {c₁, …, cₙ}, each with a
behavioural model and a health assumption OK(cᵢ). The observations OBS
are the inputs you applied plus the values you measured. Then:
-
A diagnosis is a way of declaring some components faulty so that the rest being good is consistent with
OBS. Writing a diagnosis as the setΔof components it calls faulted,Δ ⊆ {c₁,…,cₙ}is a diagnosis iffMODEL ∧ OBS ∧ (⋀_{c∈Δ} ¬OK(c)) ∧ (⋀_{c∉Δ} OK(c))is satisfiable. In a weak fault model (the usual TGDE setting) a faulted component simply imposes no constraint – “it could be doing anything” – so declaring a component faulty can only ever relax the equations, never add new ones.
-
A minimal diagnosis is a diagnosis
Δno proper subset of which is also a diagnosis: you cannot exonerate any of its accused components and still explain the observations. -
A minimal-cardinality diagnosis has the fewest faulted components of any diagnosis. (Every minimal-cardinality diagnosis is minimal – see Exercise 2 – but not vice versa.)
-
A conflict (de Kleer’s nogood) is a set of components that cannot all be OK given
OBS:MODEL ∧ OBS ⊢ ¬(⋀_{c∈S} OK(c)). A minimal conflict has no proper-subset conflict.
The duality that runs the engine: conflicts ⇄ diagnoses
Here is the pivot the whole chapter turns on:
Δis a diagnosis iffΔis a hitting set of the collection of conflicts – that is,Δshares at least one element with every conflict. Minimal diagnoses are the minimal hitting sets of the minimal conflicts.
Why it must be true: a conflict S says “at least one component in S is bad.”
If your proposed fault set Δ missed S entirely (shared no element with it),
then every component of S would be outside Δ, hence declared good – but S
says they cannot all be good. Contradiction. So a valid diagnosis must “hit”
every conflict; and a minimal diagnosis hits them all using as few accused
components as possible.
This is the engine in one sentence: GDE/TGDE never enumerates the 2^n
lattice. It collects minimal conflicts (which are small) and computes their
minimal hitting sets (which are also small in the common case). The lattice is
exponential (Exercise 1); the conflicts and the
hitting sets usually are not.
The diagnosis lattice and why we avoid it
Picture all 2^n candidate fault sets as the Boolean lattice ordered by ⊆,
from “all OK” (∅) at the bottom to “all faulted” at the top. Under a weak fault
model the diagnoses form an upward-closed region (a filter): if Δ is a
diagnosis then so is every superset of Δ (declaring more components faulty
only relaxes more constraints, so it stays consistent). That is why we only ever
care about the minimal diagnoses – they are the lower frontier of the filter,
and they implicitly describe all the rest.
Even that frontier can be exponential in the worst case (by Sperner’s theorem an
antichain in 2^n can hold up to C(n, ⌊n/2⌋) ≈ 2ⁿ/√n sets). So we never
materialise it eagerly; we generate it conflict-driven and probability-pruned.
How conflicts get discovered: guarded prediction
The conflicts do not arrive by magic; they fall out of constraint propagation over the guarded models, exactly as an ATMS would do it:
- Each input value and each measured value is a fact.
- Each component’s behaviour is a justification guarded by its
OKassumption. A good multiplierM1computingm1 = a·cjustifies the value ofm1withOK(M1)in its support. - Propagate. Every derived value carries, in its ATMS-style label, the set of
OKassumptions that were used to derive it. - When two derivations reach the same measurement point with different
values – or a prediction disagrees with a measurement – the union of the
OKassumptions behind the clash is a conflict. Those components cannot all be good.
That last step is the heart of it. A single faulty output, traced back through its support, hands you a ready-made conflict for free.
Single-valued predictions (why a clash is the signal)
A nice invariant (Exercise 3) keeps this honest: within a single diagnosis, each measurement point has at most one predicted value. Fixing a diagnosis fixes which components are good, and good components are deterministic functions of their inputs; propagation through them is functional, so a point gets at most one value (it may get none if it depends on a faulted component). The moment a point would get two different values, the assumed-good set is internally inconsistent – which is exactly the definition of a conflict, and means that set was never a diagnosis to begin with. Multivaluedness and diagnosis-hood are mutually exclusive: a double-valued point is precisely the engine’s alarm bell.
Maintaining the diagnoses incrementally
As measurements arrive, conflicts accumulate, and the minimal hitting sets need
updating. Exercise 4 spells out the standard
incremental update (Reiter’s HS-DAG / de Kleer’s hitting-set maintenance), and
the key engineering choice is to store the small faulted sets, never the
large good sets. A diagnosis is described by its handful of accused
components, not by the n−k components it leaves alone:
D := { {} } # the empty fault set hits the empty conflict set
for each minimal conflict C arriving:
new_D := {}
for Δ in D:
if Δ ∩ C ≠ ∅:
new_D.add(Δ) # Δ already hits C -- keep it
else:
for x in C:
new_D.add(Δ ∪ {x}) # extend Δ to hit C, one branch per element
D := minimal(new_D) # keep only ⊆-minimal sets
Memory and work scale with the (small) conflict sizes and diagnosis cardinalities,
not with n. Worst case it is still exponential (minimal hitting set is
NP-hard), but the typical case of a few small conflicts is exactly where this
representation wins.
Probabilities and choosing the next measurement
Everything so far is pure logic. TGDE’s intelligence comes from putting numbers on it:
-
Priors. Assume faults are independent and rare, each component faulting with small probability
p. Then ak-fault diagnosis has probability∝ pᵏ, so probability falls off exponentially with cardinality. Single faults dominate; double faults are far less likely; and so on. This is why TGDE focuses on small diagnoses. -
Posterior. Each measurement is consistent with some diagnoses and rules out others. Conditioning shifts probability mass toward the survivors and (after renormalising) away from those that disagreed with what you saw.
-
Measurement selection by expected information gain. For each candidate measurement point, group the current diagnoses by the value they predict there. The measurement’s outcome will reveal which group is true, collapsing the rest. Score each candidate by how much it is expected to reduce the entropy of the diagnosis distribution:
score(point) = H(current Δ distribution) − E_outcome[ H(Δ | value at point) ]Pick the highest-scoring point (or, with measurement costs, the best benefit/cost ratio). A point where all diagnoses agree teaches you nothing (zero gain); a point that cleanly splits the high-probability diagnoses in half teaches you the most. Measure it, fold the result in as a new observation, and loop.
This greedy, one-step-lookahead criterion is cheap and usually excellent – but it is greedy, and that has a cost (Exercise 7).
The examples, explained
The chapter’s running example is de Kleer’s polybox (sometimes “polycell”), the canonical model-based-diagnosis circuit. It is worth carrying in your head because every concept above lights up on it.
The polybox: the circuit
Five inputs a, b, c, d, e; three multipliers and two adders:
M1: m1 = a · c
M2: m2 = b · d
M3: m3 = c · e
A1: x = m1 + m2
A2: y = m2 + m3
The shared signal m2 (feeding both adders) is what makes it interesting: a
fault in M2 shows up in both outputs, while M1 only affects x and M3
only affects y.
Worked example A: a single wrong output yields conflicts
Apply a = b = c = d = e = 3. Then the good-circuit predictions are
m1 = m2 = m3 = 9, x = 18, y = 18. Now suppose you measure x = 10 while
y = 12 matches expectation… (the precise numbers in the book differ; the
shape is what matters). The mismatch at x means the components that
cooperated to predict x cannot all be good:
xwas predicted fromm1andm2viaA1, andm1fromM1,m2fromM2. So the support of the badxprediction is{OK(M1), OK(A1), OK(M2)}– a conflict: at least one ofM1,A1,M2is broken.- A second derivation path (using the good output
yto back out whatm2“should” be, then re-derivingx) produces another conflict that pulls inA2andM3: roughly{OK(A1), OK(M2), OK(A2), OK(M3)}.
Those two minimal conflicts are everything the logic extracts from this single observation.
Worked example B: conflicts → minimal diagnoses by hitting sets
Take the minimal hitting sets of {M1, A1, M2} and {A1, M2, A2, M3}:
{A1}hits both (it is in each) – a single-fault diagnosis.{M2}hits both – a single-fault diagnosis.{M1}hits the first but not the second, so it must be paired with something from the second:{M1, A2},{M1, M3}(and{M1, A1},{M1, M2}, but those are subsumed by the single faults{A1}/{M2}).
So the minimal diagnoses are the single faults {A1} and {M2}, plus the
double faults that combine M1 with a component from the second conflict. Notice
how M2 and A1 jump out as the prime suspects: they are the components that
participate in both conflicts, which is exactly what a hitting set rewards. This
is the textbook polybox result, and it is recovered directly by the procedure of
Exercise 9.
Worked example C: probability ranks them, then a measurement decides
With rare independent faults, the single faults {A1} and {M2} outrank every
double fault (p ≫ p²), so they sit at the top of the list. To decide between
them, TGDE looks for a measurement that they disagree about. The natural choice is
an internal point like m2: under “M2 faulty” the value of m2 is suspect,
while under “A1 faulty” m2 is fine and computable. Measuring m2 therefore
splits the two leading hypotheses cleanly – maximal expected information gain –
and one measurement collapses the diagnosis. That predict → conflict →
hitting-set → rank → measure cycle, watched on the polybox, is TGDE.
Exercises walk-through
The full paraphrased problem statements and conceptual answers (derivations,
complexity arguments, algorithm sketches, and design notes) are in
../exercises/ch17/README.md. There is no solutions.py
to run – the chapter is analysis-only here. The highlights:
-
Ex 1 (★) – Counting diagnoses / lattice size. There are
2ⁿcandidate fault sets, and in the worst case (no constraining observations) every one is a diagnosis, so the number of diagnoses can be2ⁿ. Even the minimal frontier can be exponential (Sperner: up to≈ 2ⁿ/√n). The takeaway is the engine’s whole reason for being: never enumerate the lattice – go conflict-driven and prune by probability. -
Ex 2 (★) – Minimal-cardinality ⟹ minimal. Clean proof by contradiction: if a least-cardinality diagnosis
Δhad a proper subset diagnosisΔ′, then|Δ′| < |Δ|would beat the minimum – impossible. The converse fails: conflicts{a,b}and{a,c}give minimal diagnoses{a}(size 1) and{b,c}(size 2); both are subset-minimal but only{a}is minimal-cardinality. That gap is exactly why TGDE computes the two notions separately. -
Ex 3 (★) – At most one predicted value per point per diagnosis. Covered above: a fixed diagnosis makes propagation functional, so each point gets at most one value; two values would be a conflict, ruling that set out as a diagnosis. Single-valuedness and diagnosis-hood are mutually exclusive.
-
Ex 4 (★★) – Compute minimal diagnoses directly over the small fault sets. The incremental hitting-set maintenance shown above (Reiter HS-DAG / de Kleer). The point of the exercise is representational: manipulate the small faulted sets, not the
n−k-sized good environments the naive ATMS interpretation construction would carry. -
Ex 5 (★★) – Make
smallest-diagnosescheap. Replace the exhaustive minimal-cardinality sweep with a best-first / probability-threshold strategy: pull candidate diagnoses in nondecreasing cardinality from a priority queue, and stop once the accumulated probability mass crosses a threshold (e.g. 0.99) or the next candidate’s prior drops belowε; cap cardinality at a smallk_max(2 or 3 simultaneous faults). You trade completeness of the minimal-cardinality set for cost, but the entropy calculation only needs enough probability mass to pick a measurement, so the trade is almost always free in practice. -
Ex 6 (★★) – Hierarchical models. Let a module (e.g. a full adder) carry a single
OK(adder)assumption that is threaded into all its internal rules, so the diagnosis lattice is over modules, not leaf gates – shrinkingnand the lattice dramatically. Support refinement on demand: once a module is implicated, re-instantiate it with per-gate assumptions linked byOK(module) ⇔ ⋀ OK(parts)and re-diagnose just inside it. Coarse-to-fine diagnosis: cheap module pass first, drill into suspects only. -
Ex 7 (★★) – The horizon effect. Greedy one-step lookahead optimises against the current posterior, which (under rare-fault priors) is dominated by single faults. If the true fault is a double fault invisible at the single-fault-discriminating points, TGDE spends measurements crushing single-fault entropy, eliminates all single faults, and only then notices the double fault and measures the point that would have revealed it directly – using more measurements than an oracle that anticipated the double fault. The fix (multi-step lookahead or cardinality-hedging utilities) is more expensive, which is why TGDE accepts the horizon effect.
-
Ex 8 (★) – A misdiagnosis from the probability model, not the logic. TGDE assumes faults are independent and equiprobable, so a double fault gets
∝ p². If two components share a common cause (same power rail, same chip, an over-voltage event), reality makesP({c,d})high, but TGDE scores itp², far below a single fault{a}that also explains the observations – so it confidently reports{a}. Every inference is sound and no diagnosis is missed; the probabilities are simply wrong. (Variant: even with independence, a too-uniform prior on a failure-prone part causes the same kind of error.) Re-running with a correct joint prior fixes it without touching a single rule. -
Ex 9 (★★★) – A complete diagnostic procedure on the CLTMS. Because the CLTMS is a complete propositional reasoner, you can implement diagnosis straight from the definitions: encode each component’s behaviour as clauses guarded by
OK(cᵢ), add inputs/measurements as units, and test diagnosis-hood by satisfiability of the corresponding environment. Extract minimal conflicts from the completed clause set (completeness guarantees none is missed) and take their minimal hitting sets. On the polybox this recovers the textbook{M1},{A1},{M2}diagnoses. Drawbacks: exponential lattice search, the (worst-case exponential) cost of computing prime implicates to make the LTMS complete, and no focusing – it does correctly what GDE does efficiently, a clean reference implementation rather than a practical engine. This is the exercise most directly connected to code we actually ship. -
Ex 10 (★★★) – Choosing the best inputs (active diagnosis). When you cannot measure more internal points, choose new stimuli instead. Make an assumption for each possible value of each input cell, predict outputs over candidate input vectors, and score each vector by the same expected-information criterion. On the full adder this works beautifully – inputs are
a, b, cin ∈ {0,1}, so there are only2³ = 8vectors, and the method reduces to classic digital test-vector generation. On the polybox it works in principle but the hint’s “one assumption per possible input value” does not scale: the cells are integer arithmetic with huge/unbounded domains, so you must discretise to a finite representative test set. Correct algorithm, tractable only on small finite input domains.
Try it in this repo
TGDE itself is not implemented here – there is no GDE, ATMS, interpretation
construction, or probabilistic measurement selector in this package, and no
exercises/ch17/solutions.py. So the honest answer is that Chapter 17 is
conceptual in this repo, and the worked analysis lives in
../exercises/ch17/README.md.
That said, the mechanisms TGDE is built from – assumptions, propagation, contradictions, and conflict-driven pruning – are exactly what the LTMS/LTRE and CLTMS we ship do provide. Here is how to exercise the same ideas with the runnable machinery.
First the setup (from the repo root):
python -m venv .venv && . .venv/bin/activate
pip install -e ".[dev]"
1. Diagnosis in miniature. The closest GDE-shaped artifact in the repo is a one-fault lamp example. Asserting the fault disjunction and then ruling out one cause leaves the other – the single-context shadow of “conflicts narrow the candidate diagnoses”:
python examples/run_kb.py examples/kb/diagnosis.kb
which runs (../examples/kb/diagnosis.kb):
lamp off -> bulb dead | no power
lamp off
~ no power
expect bulb dead true
Asserting lamp off introduces the disjunction of faults (a tiny conflict:
“bulb dead OR no power”); ruling out no power forces the remaining cause. This
is hitting-set reasoning with a single conflict of size two. The genuinely
multi-fault, all-at-once version – predicting across every OK-combination
simultaneously – is exactly what the ATMS/GDE add and what this package does not
implement.
2. The “complete reasoner” route (Exercise 9), for real. Exercise 9’s
procedure builds diagnosis on top of the complete LTMS, and that layer we do
ship as ../src/ltms/cltms.py. You can encode component
health as guarded clauses and watch complete derive the conflicts that plain
BCP would miss:
from ltms import LTMS, Label, add_formula, complete
m = LTMS()
# Health assumptions for two components feeding one observed output.
ok_m1 = m.create_node("ok_m1", assumption=True)
ok_a1 = m.create_node("ok_a1", assumption=True)
bad_x = m.create_node("x_wrong", assumption=True) # the observed bad output
# "If both are OK, the output is right" == "x cannot be wrong if both OK".
# i.e. ~(ok_m1 & ok_a1 & x_wrong) -> ~ok_m1 | ~ok_a1 | ~x_wrong
add_formula(m, ("or", ("not", ok_m1), ("not", ok_a1), ("not", bad_x)), "behavior")
# Observe the bad output.
add_formula(m, bad_x, "measurement")
complete(m) # saturate; the conflict {ok_m1, ok_a1} is now explicit
# Reading the labels shows the conflict: ok_m1 and ok_a1 cannot both stand.
This is the conflict-extraction core of Exercise 9 in miniature: completeness
guarantees the conflict is found, and its minimal hitting sets ({ok_m1} and
{ok_a1}) are the single-fault diagnoses. See Chapter 13 for how
complete and prime implicates work.
3. Conflict-driven, nogood-learning search. The focused half of diagnosis
– commit to a hypothesis, propagate, and on a contradiction learn a nogood so the
dead combination is never re-explored – is exactly what our dependency-directed
search does for real
(../src/ltms/dds.py):
python examples/coloring_dds.py # map colouring via dependency-directed search
python exercises/ch10/solutions.py # N-queens counts via dd_search
Reading dds.py alongside Exercise 9’s analysis is the best way to feel the
conflict/nogood mechanics that GDE leans on, even though GDE applies them in many
contexts at once rather than one at a time.
The full test suite (which covers the LTMS, CLTMS, and DD-search machinery referenced above) stays green with:
pytest
For the genuine multiple-fault, all-contexts-at-once engine, see the scope notes
in Chapter 12 (ATMS) and Chapter 14:
STUDY-NOTES.md §9 sketches where the ATMS would attach (via
the CLTMS’s tms_env, reading ATMS-style environment labels off prime
implicates), which is the layer TGDE would sit on.
Takeaways
-
Diagnosis is the payoff application of truth maintenance. Make “component
cᵢis healthy” an assumptionOK(cᵢ); make each component’s behaviour a justification guarded by that assumption. The whole chapter is the TMS family pointed at hardware. -
A wrong prediction is a conflict. The set of
OKassumptions behind a prediction that disagrees with a measurement cannot all be true – that is a nogood, and it says “at least one of these components is broken.” -
Conflicts and diagnoses are dual. A diagnosis is a hitting set of the conflicts; minimal diagnoses are the minimal hitting sets of the minimal conflicts. This lets GDE/TGDE skip the
2ⁿlattice entirely – it works with the small conflicts and small fault sets instead. -
Store the faults, not the good components. Minimal diagnoses are described by their few accused components; maintaining them incrementally keeps the data small (Exercise 4).
-
Probability turns logic into strategy. Rare independent faults make diagnosis probability fall off exponentially with cardinality, so single faults dominate; TGDE picks the next measurement by expected information gain, measuring the point that best splits the leading hypotheses.
-
Greedy and simple has known failure modes. One-step lookahead can waste measurements when the true fault is a higher-cardinality one (the horizon effect, Ex 7), and the independence/equal-prior assumptions can produce a confident misdiagnosis even when every inference is sound (Ex 8). These are features of the cost/accuracy trade, not bugs.
-
The complete-reasoner version is the clean reference; GDE is the efficient engine. Exercise 9 builds diagnosis directly on the CLTMS – correct but exponential and unfocused. GDE’s ATMS-with-priors approach does the same thing efficiently by avoiding full completion and exhaustive search.
-
In this repo it is conceptual. The runnable shadows are the lamp
diagnosis.kb, the CLTMS for the Exercise-9 conflict-extraction route, and dependency-directed search for the nogood-learning mechanics; the full analysis is in../exercises/ch17/.
Back to the companion index · previous: Chapter 14 — Putting the ATMS to Work · related: Chapter 12 — ATMS, Chapter 13 — CLTMS.