Chapter 4 – Pattern-Directed Inference Systems (TRE)

A friendly tour of TRE, the smallest forward-chaining rule engine in this project, and the substrate every truth-maintenance layer in later chapters is built on. Back to the companion index.

The big idea

A pattern-directed inference system is a rule engine that does not call rules by name. Instead, each rule advertises a pattern it cares about, and the engine automatically wakes a rule whenever a fact matching that pattern shows up. You never write “now run rule 7”; you assert facts, and the relevant rules fire themselves. This is the data-driven, forward-chaining style of inference: new data flows in, and conclusions ripple outward until nothing new can be derived.

TRE (the “Tiny Rule Engine”) is the bare-bones version of this idea. It is deliberately minimal so you can see the whole mechanism at once. In TRE, belief is nothing more than presence in the database. A fact is “believed” if and only if it has been asserted. There is no notion of why a fact is believed, no support, no retraction, and no contradictions. That austerity is the point: TRE is the floor that every later chapter stacks truth maintenance on top of. Chapters 7 onward keep TRE’s pattern-matching skeleton almost unchanged and bolt a TMS (a JTMS, then the LTMS) underneath, so that belief becomes justified belief that can change as assumptions change. Get TRE clear in your head and the rest of the book is “the same engine, but now facts can become unbelieved.”

The implementation lives in ../src/ltms/tre/engine.py. It is about 175 lines, and you can profitably read all of it. This chapter walks through what those lines do and why, then works through the exercises in ../exercises/ch04/.

Concepts, step by step

Terms, patterns, and variables

Everything the engine reasons about is a term: a tuple like ("parent", "ann", "bob") standing for the proposition ann is a parent of bob. A term is either an atom (a plain symbol or string) or a compound (a tuple whose first element is the relation name). A variable is written Var("x") and prints as ?x. A pattern is just a term that may contain variables, for example ("parent", Var("x"), Var("y")). Facts in the database are always ground (variable-free); patterns in rules may have variables that get filled in by matching. These primitives are shared verbatim with every other layer of the project (see ../src/ltms/terms.py and ../src/ltms/unify.py).

Unification: the one matching primitive

Matching a pattern against a fact is done by unification, a single shared function unify(a, b, bindings). It tries to find an assignment of variables that makes the two terms identical, returning the (extended) bindings on success or the sentinel FAIL on failure. One subtlety worth burning into memory:

An empty binding dict is a legitimate success (it means “these unify with no new bindings needed”). Success is therefore tested as result is not FAIL, never by truthiness. Testing if result: would wrongly treat a perfectly good empty-binding match as a failure.

You can see this discipline everywhere in the engine, for example in assert_, fetch, and the private _try_rule_on.

The full unifier is two-sided and includes an occurs-check (it refuses to bind ?x = (g ?x), which would create an infinite term). TRE’s rule matcher actually only needs one-sided matching, because database facts are always ground, but the project reuses the one general unifier for simplicity. The difference between one-sided matching and full two-sided unification is exactly what Exercise 3 explores.

Car-indexing: the only index TRE has

If every assertion had to be tried against every rule, the engine would be hopelessly slow. TRE uses one cheap trick: it indexes facts and rules by their leftmost symbol (in Lisp this is the car of the list, hence “car indexing”). The method is get_dbclass:

def get_dbclass(self, term, env=None):
    if is_compound(term):
        return self.get_dbclass(term[0], env)   # recurse to the leftmost symbol
    ...
    bucket = self.dbclass_table.get(term)        # one bucket per relation name
    ...

Each relation name gets a Dbclass “bucket” holding the facts that use that name and the rules triggered by that name. So ("parent", "ann", "bob") and the pattern ("parent", ?x, ?y) both live in the parent bucket, and they are the only things ever compared with one another. A variable in head position (for example trying to index the pattern (?r ann bob)) is an error unless that variable is already bound, because then there is no symbol to file it under.

The three moving parts of an engine

A Tre instance is one self-contained database (not a global). It holds:

  1. dbclass_table – the dict mapping each leftmost symbol to its Dbclass bucket of facts and rules.
  2. queue – a list of pending rule activations, each a pair (body, bindings). This is drained by run_rules.
  3. counters (rule_counter, rules_run) for naming rules and bookkeeping.

Asserting facts

assert_(fact) adds a fact, deduplicating by equality. The important detail:

def assert_(self, fact):
    bucket = self.get_dbclass(fact)
    if fact in bucket.fact_set:
        return False                     # already known: nothing to do
    bucket.fact_set.add(fact); bucket.facts.append(fact)
    for rule in list(bucket.rules):      # only on a genuinely new fact
        self._try_rule_on(rule, fact)    # ...do we test the bucket's rules
    return True

Two things to notice. First, only a genuinely new fact triggers work (the dedup check returns early). This is what keeps the engine from looping forever re-deriving the same things, and it is why forward chaining terminates on finite, monotone rule sets. Second, assert_ queues matching rules; it does not run them. Running is a separate step.

Adding rules, and why order does not matter

add_rule(trigger, body, environment) installs a rule. A rule body is just a Python callable body(bindings, tre). The crucial line:

def add_rule(self, trigger, body, environment=None):
    ...
    bucket.rules.append(rule)
    for fact in list(bucket.facts):      # back-test against facts already present
        self._try_rule_on(rule, fact)
    return rule

When you add a rule, the engine immediately tests it against facts that are already in the database. Combine this with the assert path (a new fact is tested against rules already present) and you get the heart of TRE:

Both “new fact meets old rules” and “new rule meets old facts” funnel through the single match point _try_rule_on. Therefore the order in which facts and rules arrive never changes the final set of conclusions.

This property is called order-independence or confluence, and it is the subject of Exercise 1. It is what lets you load rules and facts from many sources in any sequence and still trust the result.

_try_rule_on itself is tiny and is the only place matching happens:

def _try_rule_on(self, rule, fact):
    result = unify(fact, rule.trigger, dict(rule.environment))
    if result is not FAIL:
        self.queue.append((rule.body, result))   # enqueue, don't run

There is also a decorator form, @tre.rule(pattern), that is sugar over add_rule.

The control loop

run_rules drains the queue to quiescence (until nothing is left):

def run_rules(self):
    fired = 0
    while self.queue:
        body, bindings = self.queue.pop()   # LIFO
        body(bindings, self)                # may assert facts / add rules -> more work
        fired += 1
    return fired

A rule body can assert new facts and even install new rules, each of which can push more activations onto the queue, so the loop keeps going until the database is closed under the rules. The queue is LIFO here, but the book stresses that the order is logically irrelevant: because of order-independence, FIFO would reach the same final database (it might visit conclusions in a different sequence on the way). run_forms(facts) is a convenience batch driver that asserts each fact and runs to quiescence.

A deliberate non-feature: TRE has no loop detection. A rule like (integer ?x) => (integer (1+ ?x)) is supposed to run forever. Detecting and preventing that is not the engine’s job; writing terminating rule sets is the author’s job.

Conjunctive rules by nesting

TRE has no multi-trigger syntax. There is no built-in way to say “fire when both (parent ?x ?y) AND (ancestor ?y ?z) are present.” You get conjunction by nesting: the outer rule triggers on the first pattern, and inside its body it calls add_rule again to install a second rule watching the second pattern. The inner rule’s closure captures the bindings from the first match, so the two matches share variables. This is exactly the trick the transitive-ancestor example uses (see below), and it is the mechanism Exercise 6 exploits for “lazy” rules. The environment field on a Rule exists precisely to carry those captured bindings forward.

What TRE deliberately leaves out

No support, no justifications, no retraction, no contradiction handling, no negation. Belief is monotone: once asserted, always asserted. Every one of these omissions is filled in by a later layer. The LTMS-backed sibling LTRE (../src/ltms/ltre.py) keeps TRE’s pattern matching but replaces “present in the database” with a three-valued, retractable, justified label managed by Boolean constraint propagation. Several Chapter 4 exercises are demonstrated on LTRE because they need exactly that extra power (case analysis, negation), which TRE alone cannot provide.

The examples, explained

Worked example: deriving ancestors (transitive closure)

../examples/family_tre.py is the canonical TRE demo. It computes the ancestor relation as the transitive closure of parent using two rules:

# Base case: a parent is an ancestor.
tre.add_rule(("parent", X, Y),
             lambda b, t: t.assert_(("ancestor", b[X], b[Y])))

# Recursive case: parent(x,y) AND ancestor(y,z) => ancestor(x,z), via NESTING.
def chain(b, t):
    x = b[X]
    t.add_rule(("ancestor", b[Y], Z),
               lambda b2, t2: t2.assert_(("ancestor", x, b2[Z])))
tre.add_rule(("parent", X, Y), chain)

tre.run_forms([("parent","ann","bob"), ("parent","bob","cy"), ("parent","cy","dee")])

This single example concentrates most of the chapter’s ideas:

  • Forward chaining: facts drive rules; you never call a rule by name.
  • Pattern matching with variables: ?x, ?y, ?z bind to people.
  • Conjunctive rules by nesting: the recursive rule is two chained triggers, the inner one watching (ancestor ?y ?z) for the ?y bound by the outer.
  • Order-independence: the rules are installed before any facts, but run_forms interleaves asserts and runs; you would get the same six ancestor pairs no matter how you ordered things.

Running it derives every ancestor pair: ann of bob, cy, dee; bob of cy, dee; and cy of dee.

The same world as a knowledge-base file

../exercises/ch04/kb/ancestor.kb expresses the identical world in the project’s tiny DSL, so you can see the declarative form next to the Python form:

rule (parent ?x ?y) => (ancestor ?x ?y)
rule (parent ?x ?y) & (ancestor ?y ?z) => (ancestor ?x ?z)
parent ann bob
parent bob cy
parent cy dee
expect ancestor ann dee true
expect ancestor bob dee true

Here & is the surface syntax for the conjunctive (nested) rule, and expect lines are checked assertions. The loader (in ../src/ltms/dsl.py) compiles this to the same engine calls the Python example makes.

Exercises walk-through

The full write-ups, with paraphrased problem statements and runnable code, are in ../exercises/ch04/README.md, and every “demonstrated” exercise is a demo_* function in ../exercises/ch04/solutions.py. Highlights:

Exercise 1 – Why order-independence matters

A paper question, demonstrated in code. The payoff of confluence: rule authors never have to reason about interleavings, debugging is local (a missing conclusion means a missing rule or fact, never a timing accident), and independent knowledge modules compose without caring about load order. The demo builds the same fact and rule in two orders – rule-then-fact and fact-then-rule – and confirms the conclusion set is identical. The trade-off: you may not rely on firing order for control (no cut-like pruning); control must be declarative.

Exercise 2 – Solving a logic-textbook problem

The chosen problem is the constructive dilemma: from A ∨ B, A → C, B → C, derive C. The lesson is a clean split of labor:

  • Modus ponens is free. Asserting man(socrates) and the implication man(x) → mortal(x) into LTRE immediately makes mortal(socrates) true, because Boolean constraint propagation does unit propagation for you.
  • The dilemma is not free. It needs reasoning by cases, which plain propagation cannot do: the demo shows c_before_indirect_proof = False. Deriving C uses try_indirect_proof, which assumes ¬C, propagates, finds the clause set unsatisfiable, and concludes C (c_proved_by_cases = True, then c_now_true = True).

This is the BPS moral in miniature: a forward chainer with a TMS hands you the easy deductions directly, while case splits and assumption-based proofs need the explicit proof procedures of later chapters.

Exercise 3 – Full (two-sided) unification

Part (a) asks why a naive one-way matcher mishandles patterns with variables on both sides, like (Foo ?x ?x) versus (Foo ?y ?z). A one-way matcher treats the second pattern’s variables as constants, so it never enforces that ?y and ?z must collapse to the same variable, and it has no occurs-check.

Part (b) is satisfied by the project’s existing unify, which is Robinson’s algorithm with the occurs-check. The demo exhibits the four telling cases: unify(("Foo",?x,?x), ("Foo",?y,?z)) returns {?x: ?y, ?y: ?z} (so the two right-hand variables are forced equal); (Foo ?x ?x) vs (Foo a b) correctly fails; vs (Foo a a) succeeds; and unify(?x, ("g", ?x)) fails the occurs-check.

Exercise 4 – multi-fetch (the relational join)

multi_fetch(engine, patterns) returns every binding environment under which all the patterns match simultaneously, that is the relational join with shared variables. The implementation threads a list of environments through the patterns one at a time, substituting bindings so far into the next pattern and unifying it against the relevant Dbclass bucket. With parent(ann,bob), parent(cy,dee), employed(ann), employed(ed), the patterns [(parent ?x ?y), (employed ?x)] yield the single joint answer ?x=ann, ?y=bob (the only employed parent). Note how it reuses get_dbclass so it only scans relevant facts.

Exercise 5 – show-rule

show_rule(engine, counter) looks a rule up by the integer in its counter field and prints a readable description, deliberately showing the trigger pattern and the seed environment separately (as the exercise requires), plus the firing condition and the bucket it is indexed under. This works because the Rule dataclass exposes exactly those fields, so the object returned by add_rule carries everything needed for introspection.

Exercise 6 – More efficient AND / BICONDITIONAL introduction

Part (a) rewrites AND-introduction to look for the second conjunct only after the first is proved, avoiding wasted work. The fix is the nested-rule pattern: an outer rule triggers on proved(a), and only inside its body installs the inner rule that watches proved(b). The demo runs it both ways: when the first conjunct is unprovable the second is examined 0 times and no conjunction is made; when the first holds, the second is examined 1 time and the conjunction is asserted.

Part (b) names the assumption this optimization relies on: there must be no cross-dependence or synergy between the two sub-proofs (proving B must not be what makes A provable), the cost ordering must be stable (the first conjunct really is the cheaper or more-likely-to-fail one), and forward-chaining monotonicity must hold. It is violated if some other rule derives A from B: then a fact set supporting B should yield A ∧ B, but the lazy version, seeing A not yet proved and never examining B, never fires the chain that would have produced A, and misses the conjunction.

Exercise 7 – Blackboard systems on TRE knowledge sources

Part (a) asks which TRE design decisions must change before TRE engines could serve as blackboard knowledge sources. Two clear ones: (1) run_rules drains the whole queue to quiescence with no external scheduler or budget, whereas blackboards need opportunistic, interruptible control that can preempt one knowledge source to run a more promising one; and (2) TRE has a single private, monotonic, non-retractable database with no provenance, whereas blackboards need a shared, structured, multi-level workspace with the ability to revise and remove hypotheses (which really wants the JTMS/LTRE layer from later chapters).

Part (b) sketches a blackboard shell: each knowledge source is its own Tre engine, a shared list is the blackboard, and an explicit controller cycles the knowledge sources and moves new blackboard entries between them. The demo posts observed(fire), lets KS1 derive hypothesis(fire), copies it to KS2 alongside evidence(fire), and KS2 produces confirmed(fire). A production shell would add priority queues, per-cycle budgets, change-events, and hypothesis retraction (best done by swapping Tre for the LTMS-backed LTRE).

Try it in this repo

All commands are run from the repository root with the virtual environment active. (These were all executed while writing this chapter and produce the output shown.)

# activate the project's virtual environment
. .venv/bin/activate

# 1) Run every demonstrable Chapter 4 exercise and pretty-print the results.
python exercises/ch04/solutions.py

# 2) The canonical TRE worked example: derive ancestors by forward chaining.
python examples/family_tre.py

# 3) The same ancestor world as a declarative .kb file, with expected results.
python examples/run_kb.py exercises/ch04/kb/ancestor.kb

What you should see:

  • solutions.py prints a dict keyed by exercise. Spot-check that ex1_order_independence reports identical: True, ex2_natural_deduction shows c_before_indirect_proof: False but c_now_true: True, and ex4_multi_fetch gives joint_parent_and_employed: [('ann', 'bob')].
  • family_tre.py prints the six derived ancestor pairs.
  • run_kb.py prints ancestor ann dee: true and ancestor bob dee: true, matching the expect lines in the .kb file.

To experiment, try editing ../exercises/ch04/kb/ancestor.kb: add a new parent line and a matching expect, then re-run command 3 and watch the transitive closure grow. Or open a REPL and build an engine by hand:

from ltms.tre import Tre
from ltms.terms import Var
X = Var("x")
t = Tre()
t.add_rule(("human", X), lambda b, e: e.assert_(("mortal", b[X])))
t.assert_(("human", "socrates"))
t.run_rules()
print(t.fetch(("mortal", X)))   # [('mortal', 'socrates')]

Then swap the order (assert the fact before adding the rule) and confirm you get the same answer – that is Exercise 1, live.

Takeaways

  • TRE is the smallest useful forward-chaining engine: a database of facts and rules indexed by leftmost symbol, a queue of activations, and a loop that drains it to quiescence.
  • Belief = presence in the database. There is no support, no negation, no retraction. That austerity is what makes TRE the clean base every truth maintenance layer extends.
  • Unification is the single matching primitive, and the empty-binding-as- success rule (is not FAIL) is non-negotiable.
  • Both new-fact and new-rule paths run through one match point, which buys order-independence (confluence): same facts plus same rules give the same conclusions regardless of arrival order.
  • Conjunction is nesting: chained triggers whose inner rule captures the outer match’s bindings. This one pattern powers the recursive ancestor rule and the “lazy” optimizations of Exercise 6.
  • Several exercises (case analysis in Ex 2, negation, blackboard retraction in Ex 7) reach past what TRE can do, pointing forward to LTRE and the truth maintenance chapters where belief becomes justified and revisable.

Next, see the truth-maintenance chapters that build directly on this engine: ../src/ltms/jtre.py (JTMS-backed) and ../src/ltms/ltre.py (LTMS-backed).


This site uses Just the Docs, a documentation theme for Jekyll.