Chapter 3 – Classical Problem Solving

The chapter that teaches you to think of “solving a problem” as searching a space of states, and that gives us the vocabulary (states, operators, heuristics, beam/best-first/A*) every later chapter quietly assumes. Back to the companion index.

The big idea

Almost every interesting reasoning task can be cast in the same shape:

you start in some state,
you have a set of operators that turn one state into another,
you want to reach a state that satisfies a goal test,
and you would like to get there without examining the whole, usually astronomically large, space of states.

That last clause is the entire game. The set of reachable states is a graph (or a tree, if you ignore the fact that different operator sequences can land on the same state). Finding a goal is, in principle, just walking that graph. In practice the graph is far too big to walk blindly, so the chapter is really about how to choose which state to look at next so that you stumble onto a goal early. That choice is the difference between a search that finishes in milliseconds and one that never finishes at all.

The chapter calls its concrete program the Classical Problem Solver (CPS) and drives it with two very different example domains:

Route finding on the Boston subway. States are stations, operators are “ride one stop / change lines,” the goal is “you are standing in the destination station.” This is a small, cyclic map problem and it is the friendly case: the graph is finite, the heuristic (how far is this station from the destination, as the crow flies) is honest, and good search behaves beautifully.
Symbolic algebra (solving an equation for an unknown). A state is an equation like x + x = 6; operators are algebraic rewrite rules (x + x -> 2*x, “divide both sides by 2,” and so on); the goal is the form x = (something with no x in it). This is the harder, tree-shaped case: the branching is wide, the “distance to goal” heuristic is only a guess, and the same equation can be rewritten in circles.

One honest caveat up front, because the rest of this companion is built on running code: the CPS itself is not implemented in this repository. Porting CPS is an explicit non-goal in ../BRIEFING.md (the project is about truth maintenance, not classical search). So this chapter is the one place in the companion where the “Try it” section points you at adjacent machinery rather than a faithful re-run of the book. The full conceptual answers, derivations, and algorithm sketches live in ../exercises/ch03/README.md, and this chapter is essentially a guided tour of that material plus the ideas behind it.

Concepts, step by step

1. The state space, made concrete

Three pieces define a search problem, and it pays to name them precisely because the exercises poke at each one in turn:

State. A complete snapshot of “where you are.” For the subway, a state is a station (and, if you care about transfer cost, which line you arrived on). For algebra, a state is the current equation.
Operators. Functions state -> state (or, more usefully, state -> list of states). Each operator has an applicability test (“can I apply this here?”) and an action (“here is the resulting state”). The algebra operators come in three families described below.
Goal test. A predicate on states. For the subway: state == destination. For algebra: the equation reads x = E where E does not mention x.

A search node wraps a state with bookkeeping: a pointer to its parent (so you can reconstruct the path once you find a goal), the path cost g so far, and whatever priority the search uses to order the frontier. The set of nodes you have generated but not yet expanded is the frontier (the book often calls it the open list); the nodes you have already expanded form the closed set, if you bother to keep one.

2. The generic best-first driver

The chapter’s cleverest engineering move is to write one search procedure (call it bsolve / beam-solve) and get every named strategy out of it by changing one thing: the function that decides which frontier node is “best.” That priority function is the only knob.

The skeleton is always the same:

search(start, goal?, successors, priority):
    frontier := { node(start) }            ; ordered by priority
    loop:
        if frontier empty: return FAIL
        n := remove the best node from frontier   ; "best" per priority
        if goal?(n.state): return path(n)
        for s' in successors(n.state):
            add node(s', parent=n) to frontier

Now plug in different priority functions and different frontier disciplines and you recover the classic family:

Strategy	Orders the frontier by	Optimal?	Memory
Breadth-first	depth (shallowest first)	shortest #steps	exponential frontier
Depth-first	depth (deepest first)	no	`O(b*d)`
Uniform-cost	`g` (cost so far)	yes (cheapest)	exponential frontier
Greedy best-first	`h` (estimate to goal)	no	exponential frontier
A*	`f = g + h`	yes if `h` admissible	exponential frontier
Beam (width `b`)	`h` (or `f`), keep only best `b`	no (incomplete)	`O(b)`
Iterative deepening	depth, but re-run with growing limit	shortest #steps	`O(b*d)`
IDA*	`f`, but re-run with growing `f`-bound	yes if `h` admissible	`O(b*d)`

Here b is the branching factor (or the beam width, depending on context) and d is the goal depth. The bolded rows are the ones the chapter and its exercises lean on hardest.

3. Heuristics: `h`, admissibility, and why it matters

A heuristic h(state) is a cheap estimate of the remaining cost to a goal. A good h is what turns hopeless blind search into something fast: it tells the driver “this state looks close, look here first.”

The key property is admissibility: h is admissible if it never overestimates the true remaining cost h*. Admissibility is exactly the condition under which A* is guaranteed to return an optimal (cheapest) solution. The intuition: if h never lies upward, A* can never be fooled into committing to a suboptimal goal before it has ruled out the cheaper ones.

The two domains give you one of each:

Subway: straight-line distance (or minimum hop count) to the destination is admissible – the real route can only be as long or longer than the crow-flies distance. So A* on the subway is optimal.
Algebra: the algebra-distance heuristic counts structural features that separate the current equation from x = ... (roughly, how many times x appears and how deeply it is buried). This is not admissible, because a single rewrite can dispatch several of those features at once. Exercise 2 builds the witness. Consequence: A* on algebra is fast but not guaranteed to find the shortest derivation.

4. Beam search and its fatal flaw (and its virtue)

Beam search is best-first search with a hard memory cap: at every frontier expansion you keep only the best b nodes and throw the rest away forever. With b = 1 it degenerates into greedy hill-climbing.

The virtue is obvious: memory is O(b), a constant, no matter how big the space is. The flaw is just as important and is the subject of Exercise 1: beam search is incomplete. If the only path to the goal runs through a node that did not make the top-b cut at some step, that path is gone, and the search can fail even though a solution exists. On the subway this bites exactly at line transfers: the heuristic does not reward stepping onto the transfer line (it does not get you visibly closer in straight-line terms), so a width-1 beam greedily ignores the transfer and gets stuck. You need just enough width to keep the unglamorous-but-necessary transfer node alive alongside the greedily-preferred one.

5. Pattern matching: `?x`, `??x`, and the rewrite machinery

The algebra operators are rewrite rules, and a rewrite rule needs a pattern matcher. The chapter’s matcher, match, takes a pattern and an expression and returns a binding dictionary (which pattern variable equals which sub-expression) or :FAIL.

Two kinds of pattern variable matter:

?x – an element variable: matches exactly one item.
??x – a segment variable: matches a run of zero or more consecutive items. This is what lets a pattern like (* (?? A) x (?? B)) say “a product that contains an x somewhere, with arbitrary stuff before and after.”

Segment variables are powerful but they introduce choice: there can be many ways to split a list across several ?? variables. The book’s match is one-shot – it commits to the first consistent split it finds and never reconsiders. That is fast, but it can miss a globally consistent match that requires backtracking over an earlier ?? commitment. Exercise 7a is a beautiful, concrete instance of exactly this failure, and 7b/7c are about fixing it (return all matches; return them lazily).

This same match is, incidentally, the seed of natural-language understanding: Exercise 10 reconstructs Bobrow’s STUDENT by using segment-variable patterns to translate English sentence fragments into equations, then handing those equations to the algebra solver.

6. The three operator families

The algebra operators are not an arbitrary grab-bag; they fall into three families, distinguished by what they do to the unknown:

Isolation – strip an operator wrapped around the unknown to expose it. Applied at the top level of the equation. Example: log(arg, base) = rhs -> arg = base^rhs. This is the “undo the outermost operation on both sides” move.
Attraction – pull two separated occurrences of the unknown closer together so they can later be merged. Example: log(u,w) + log(v,w) -> log(u*v, w). The point is to reduce the number of places x appears.
Collection – merge already-adjacent occurrences into one. Example: x + x -> 2*x. After collection there is only one x to isolate.

A solve is, loosely, “attract until x appears once, collect it into a single term, then isolate.” Recognizing this shared structure is what powers Exercise 5: every operator is just a (Name, Before-pattern, After-pattern) triple plus a family label, so you can build all three with one generic constructor and even auto-generate them from a raw algebraic identity.

The examples, explained

The Boston subway: search that behaves

The subway example exists to show search working well and to make the strategy differences visible without algebraic distractions. A few lessons it is designed to deliver:

Why a heuristic helps. Uninformed search wanders; greedy/A* with straight-line distance heads roughly toward the destination and expands far fewer stations.
Why beam width matters. This is where Exercise 1 lives. Because the map has transfers, the example is the canonical demonstration that b = 1 (greedy) can fail while a slightly wider beam succeeds. The takeaway is not a magic number; it is the method: increase b until a route appears, and the smallest such b is your answer for that map.
Why cycles need handling. You can ride back and forth between two stations forever. The map is full of cycles, so a search that does not remember where it has been can loop. That motivates Exercise 3’s closed-set / dedup modification.

Symbolic algebra: search that has to be clever

The algebra example exists to show search under pressure: wide branching, a dishonest heuristic, and a structured operator set. Its lessons:

Operators as data. Each rule is a Before/After pattern pair. Solving is repeated “match a Before pattern, instantiate the matching After pattern.” This is the concrete payoff of the pattern matcher.
The heuristic is a guess, not a guarantee. algebra-distance makes the search go fast but, being inadmissible (Exercise 2), it can lead you to a longer-than-necessary derivation. The example is honest about the speed-vs-optimality trade.
Three families, one engine. The same driver handles isolation, attraction, and collection; only the operator definitions differ. Exercise 5 turns this observation into a tidy generic API.

STUDENT: natural language on top of algebra

The chapter’s most ambitious worked idea (and Exercise 10) is rebuilding STUDENT, the 1960s program that solved algebra word problems. The architecture is a two-stage pipeline:

Sentences -> equations via segment-variable rewrite rules: "twice" becomes 2 *, "X is Y" becomes X = Y, each distinct noun phrase (“Bill,” “Bill’s father”) becomes a variable, and coreference is just string identity of the normalized phrase.
Equations -> answer via the (multi-equation) algebra CPS.

The famous worked instance is the ages puzzle that ends with Bill = 8. Translate the three sentences, substitute, and the system falls out:

U = 2*F                  ; the uncle is twice the father's age
F + 2 = 3*(B + 2)        ; in two years the father is three times Bill's age
U + F + B = 92           ; their ages sum to 92

Substituting (1) into (3) gives 3*F + B = 92; from (2), F = 3*B + 4; substitute and 10*B = 80, so B = 8 (and F = 28, U = 56, which checks: 56 + 28 + 8 = 92). The lesson is that “understanding” the word problem is, in STUDENT, nothing deeper than careful pattern rewriting – which is also why STUDENT is brittle: a phrasing outside its template set simply fails to translate.

Exercises walk-through

All solutions are analysis-only (there is no solutions.py for this chapter, by design) and live in ../exercises/ch03/README.md. Highlights:

Ex 1 (minimum beam width). The answer is empirical and map-dependent, so the README gives the procedure (raise b from 1 upward; report the first width that returns a route; sanity-check the next one or two widths because tie-ordering can wobble) and the reasoning: b = 1 reliably fails at the line transfer, and the realistic minimum is small (about 2-3) – just wide enough to retain the transfer node. The defensible claim without the book’s data file loaded is “b=1 fails, minimum is small.”
Ex 2 (algebra-distance overestimates). A clean inadmissibility witness: x + x = 6. A naive feature count predicts about 3 steps (two occurrences of x plus an isolation), but x + x -> 2*x then “divide by 2” solves it in 2 steps. Since the heuristic exceeds the true cost on at least one state, it is not admissible, so A* with it is not guaranteed optimal. A stronger witness is any single identity that collapses many occurrences at once (e.g. factoring x^2 - x = 0 to x(x-1)=0).
Ex 3 (cycle detection / dedup). Add a closed hash set of canonical state keys; discard any successor already seen. Crucial subtlety: the canonical key. For algebra you must normalize (sort commutative operands, fold constants) so that syntactically different but equal equations collapse; for the subway the key is just the station. Cost: O(N) extra memory, O(1) amortized per-node time; the payoff is potentially exponential pruning of revisited subtrees. Verdict: clearly worth it on the cyclic subway map; it depends on algebra, where a near-tree rewrite space may pay the O(N) memory for little pruning (there, a cheaper “reject states already on the current path” guard at O(depth) space may be preferable).
Ex 4 (pluggable operators). Replace the hard-wired operator list with a registry that definition files append to at load time, and make setup-algebra-problem take the operator set as a parameter (defaulting to a “current set” variable). Adding operators becomes “drop in a file and load it”; switching regimes (e.g. a trig set) becomes one selector call. No existing operator code is ever edited because definitions only ever append.
Ex 5 (generic constructors + sugar). Because every operator is a (Name, Before, After) triple, write one generic make-operator and three thin wrappers (use-isolation-operator, use-attraction-operator, use-collection-operator) that differ only in the family label. 5b adds a defAlgebraOperator macro that dispatches on the family. 5c goes furthest: defAlgebraLaw takes a raw identity LHS = RHS, orients it toward a solved form, infers the occurs-in? guards from which variables appear where, classifies the family (exposes x -> isolation; reduces occurrences -> attraction; merges adjacent -> collection), and emits the operator(s). The README is candid that fully automatic classification is heuristic and may need a hint.
Ex 6 (A*). A* is “best-first keyed on f = g + h.” The README gives the graph-search version with a best-g table so a cheaper route to a state replaces a worse one, and reiterates the admissibility consequence (optimal on the subway, not on algebra).
Ex 7 (exhaustive / lazy match). 7a is the gem: it walks through exactly why the given call returns :FAIL – the segment variable A is used twice and must bind to the same run, but the one-shot matcher greedily commits A = (2) on the first product and never backtracks to discover the globally consistent A = (2 x). 7b rewrites match to return a list of all binding dictionaries (segment variables iterate over every split point; the element matcher returns a set the caller flat-maps over). 7c makes it lazy – return one match plus a thunk that resumes the search – so a backtracking rewriter pulls matches one at a time and stops early, at O(choice-stack-depth) space instead of O(number-of-matches).
Ex 8 (iterative deepening). 8a is iterative-deepening DFS: depth-limited DFS in a loop with a growing cutoff, carefully distinguishing “hit the bound” (:cutoff) from “exhausted the subtree” (:failure) so you know when the space is truly empty. Memory O(b*d), time O(b^d) with only a constant b/(b-1) overhead from re-expanding shallow levels. 8b is the empirical comparison (measure nodes expanded, peak frontier, solution length): ID-DFS wins on memory and gives shortest-step solutions like BFS, but informed search wins on raw expansions where a good heuristic exists. 8c is IDA*: same idea but bound on f = g + h instead of depth, raising the bound to the smallest f that exceeded it each round – A*’s informedness at O(b*d) memory, with the known weakness that near-distinct f-values force many tiny-increment re-traversals.
Ex 9 (lifting algebra’s limits). 9a makes the unknown configurable (thread it through as data, e.g. a *unknown* dynamic variable the guards read, instead of hard-coding x). 9b generalizes a state to a set of equations and makes the operators substitution moves – pick a variable, solve one equation for it (recursively, via the single-equation CPS), substitute into the rest, repeat. This is Gaussian elimination as search, with the substitution order as the choice points. 9c applies it to the diet/weight problem, deriving the equilibrium weight at dW = 0:
```
W_e = 7*C_fd / (7*M + N/150)   =   (1050 * C_fd) / (1050*M + N)
```
with the honest caveat that the OCR’d constants are ambiguous, so the shape is the load-bearing result.
Ex 10 (STUDENT). The English-to-equations-to-answer pipeline described in the examples section above, with the full Bill-is-8 derivation and the warning that STUDENT’s competence is entirely a function of how many phrase templates you encode.

Try it in this repo

Be honest with yourself here: Chapter 3’s CPS, the subway/beam-search code, and the algebra rewrite system are not part of this package (see ../BRIEFING.md – CPS is an explicit non-goal). So there is nothing to “run” that reproduces the book’s figures. The authoritative artifact for this chapter is the analysis in ../exercises/ch03/README.md, and there is no solutions.py.

That said, two threads of this chapter do have living counterparts in the repo, and it is worth seeing them, because they show where the chapter’s ideas reappear once truth maintenance enters the picture.

1. Search-as-state-space, in the dependency-directed search demo. The chapter’s “assign each choice, recurse, back out on failure” search is exactly the skeleton of ../examples/coloring_dds.py, which uses ../src/ltms/dds.py. The twist is that the TMS-backed version does something CPS does not: when a partial assignment fails, it learns a nogood so the same dead combination is never re-explored (dependency-directed backtracking, the smart cousin of the chronological backtracking implicit in Chapter 3’s DFS). Run it:

cd /path/to/ltms        # the repo root
. .venv/bin/activate
python examples/coloring_dds.py

You will see it find proper 3-colorings of a small graph. Read it as “the same generate-and-backtrack search of Chapter 3, but the failures teach the search something.”

2. Pattern matching and unification. Chapter 3’s match (with ?x and ??x) is a one-shot, segment-aware matcher. The package’s ../src/ltms/unify.py is the closely related unifier used throughout the later TMS/TRE machinery (?-prefixed variables, a bindings dict, a FAIL sentinel distinct from “no new bindings”). It does not have segment variables, but it is the same idea – bind pattern variables to make two terms agree – and reading it makes Exercise 7’s binding-dictionary discussion concrete. You can poke at it from a REPL:

python -c "from ltms.unify import unify; print(unify(('f', '?x', 'b'), ('f', 'a', '?y'), {}))"

3. Closest “solved with search” examples. If you want to feel search finding a model, the runnable knowledge-base files exercise the LTRE’s belief search; load one with ../examples/run_kb.py:

python examples/run_kb.py examples/kb/belief_revision.kb

None of this is CPS. But it is the honest nearest neighbor in this codebase, and it shows the through-line: Chapter 3’s “search a space of states with operators and a goal test” never goes away – by Chapter 8+ it just acquires a truth-maintenance system that remembers why states failed.

Takeaways

Problem solving = search. Pin down three things – state, operators, goal test – and you have a search problem; everything else is choosing what to expand next.
One driver, many strategies. Breadth-first, depth-first, uniform-cost, greedy, A*, beam, and the iterative-deepening variants are all the same loop with a different priority function and frontier discipline.
Heuristics buy speed; admissibility buys optimality. A non-overestimating h makes A* optimal. The subway heuristic is admissible (so A* is optimal there); algebra-distance is not (Exercise 2), so A* on algebra is fast but not provably shortest.
Beam search trades completeness for constant memory. Too small a beam and you discard the only path to the goal – exactly what happens at subway transfers with b = 1 (Exercise 1).
Cycle detection is cheap insurance on graphs. A canonical-key closed set turns a possibly-looping search into a terminating one; clearly worth it on the cyclic map, situational on the tree-like algebra space (Exercise 3).
Operators are data. The three algebra families (isolation, attraction, collection) are all (Name, Before, After) triples, which is why they can be built generically and even auto-generated from raw identities (Exercise 5).
Matching is where the choices hide. One-shot greedy matching is fast but incomplete for segment variables (Exercise 7a); real rewrite systems need all-matches or lazy matching (7b/7c).
Stack search on top of search. Systems of equations (Exercise 9b) and STUDENT (Exercise 10) both work by calling the single-problem solver recursively inside a larger search – a pattern you will see again and again.
In this repo, CPS is conceptual. The faithful artifact is ../exercises/ch03/README.md; the nearest runnable relatives are the dependency-directed search demo and the unifier, both of which carry Chapter 3’s search DNA into the truth-maintenance world.