REFACTOR

# Imhotep Refactor Plan

Date: 2026-05-22

Purpose: make Imhotep trustworthy for high-assurance frontend testing. This plan prioritizes hard structural work before polish, documentation, or convenience APIs. The goal is not only to make tests pass, but to remove classes of silent wrong answers, type-system bypasses, extraction drift, and compiler gaps that would undermine the product promise.

## North Star

Imhotep should be able to answer layout, topology, semantic, dimensional, and finite first-order logic assertions against real browser pages with boringly reproducible results.

For high-assurance use, the unacceptable failures are:

- A predicate silently returns the wrong answer because extraction data was missing, stale, reordered, or coerced to a default.
- The compiler accepts syntax that lowers to the wrong predicate, loses options, drops clauses, or changes boolean meaning.
- The runtime hides extraction, cleanup, cache, or evaluator errors behind fallback behavior.
- The public API promises expressive FOL but the internal type system relies on ad-hoc runtime shape mutations.
- Tests pass because fixture coverage is broad enough for happy paths, but not because the core invariants are enforced.

## Refactor Principles

- Remove silent wrong-answer paths before improving ergonomics.
- Prefer typed protocols over property smuggling, duck typing, or `as any` bridges.
- Prefer explicit failure diagnostics over implicit defaults when required data is missing.
- Keep public API shape stable only where it is already credible; do not preserve accidental internal APIs.
- Refactor in small, independently verified batches with build, unit tests, and hard E2E tests after each batch.
- Add characterization tests before changing behavior that is ambiguous but currently depended on by existing tests.

## Required Gates

Every hard refactor batch must pass:

- TypeScript build for touched packages.
- SDK/unit tests for affected packages.
- Hard E2E fixture: 57/57 assertions pass.
- No new `as any` in production source unless documented inline with a boundary justification.
- No new empty catches, swallowed promise rejections, or fallback-to-zero behavior for required extracted data.

Before calling this plan complete:

- Production `as any` count is either zero or every remaining instance is at an explicit external boundary with a comment and test.
- Topology, geometry, style, environment, and cache schemas are versioned and typed end-to-end.
- Dense DSL, fluent API, canonical IR, logic AST, and solver predicates share one typed predicate signature table.
- CI runs the hard fixture and cache-staleness tests, not just unit tests.

## Phase 1: Typed Assertion And Predicate Protocol

This is the first hard refactor because most later reliability work depends on knowing what the compiler and solver are actually carrying.

### Problem

The codebase still contains many production `as any` casts, especially in:

- `packages/imhotep-playwright/src/extraction.ts`
- `packages/imhotep-dsl/src/grammar.ts`
- `packages/imhotep-playwright/src/public.ts`
- `packages/imhotep-dsl/src/compiler.ts`
- `packages/imhotep-solver/src/predicates.ts`
- `packages/imhotep-dsl/src/fluent.ts`
- `packages/imhotep-playwright/src/fol-compiler.ts`

The most dangerous pattern is property smuggling: `_compoundParts`, `_compoundOperator`, `__imhotepSubject`, `options`, injected world fields, and diagnostic metadata attached to objects whose types do not declare those fields.

### Refactor

- Define a single `PredicateSpec` registry type that describes name, arity, accepted term kinds, option schema, result metric schema, required facts, and unary/binary forms.
- Generate or derive compiler validation, extraction fact requirements, and solver invocation from that spec instead of repeating string-dispatch branches.
- Replace smuggled `PredicateCall.options` patterns with a typed `PredicateOptionsByName` map.
- Replace compound relation metadata (`_compoundParts`, `_compoundOperator`, `_isCompound`) with an explicit AST node type.
- Replace `__imhotepSubject` runtime attachment with a typed fluent wrapper object that carries subject identity as a real field.
- Replace metrics `as any` casts with discriminated metric result types per predicate or predicate family.

### Acceptance Criteria

- Adding a new predicate requires one spec entry and compiler errors guide every required implementation point.
- `between`, `inStackingContext`, `clippedBy`, alias predicates, topology predicates, and size predicates all compile through the same typed route.
- Compound `.and` / `.or` behavior is represented in AST and canonical IR, not hidden on object instances.
- No production code reads `_compoundParts`, `_compoundOperator`, `__imhotepSubject`, or undeclared `options` through `as any`.
- Unit tests cover every predicate spec for arity, option validation, required facts, and unary/binary forms.

## Phase 2: Canonical World Schema Consolidation

This is the second hard refactor because extraction correctness determines whether solver answers mean anything.

### Problem

There are multiple world shapes and side channels:

- Solver `GeometryWorld` is augmented with CDP-specific `styles`, `env`, and topology subject ID fields.
- Cached JSON is coerced back to typed objects with `as unknown as`.
- Topology arrays require subject-order remapping, and missing entries can still collapse to `0` in some lookup paths.
- Visual, layout, topology, DOM, style, and environment data are not governed by one schema version.

### Refactor

- Define a versioned `ExtractedWorld` schema as the only persisted and inter-package world format.
- Split the schema into typed sections: `subjects`, `geometry`, `visualGeometry`, `dom`, `styles`, `topology`, `environment`, `provenance`, and `diagnostics`.
- Make `GeometryWorld` either an alias of the consolidated schema or a deliberate solver projection produced by a typed adapter.
- Remove ad-hoc mutations on world objects. Use constructor/adaptor functions that validate required sections.
- Model topology subject identity explicitly with typed mappings rather than transient `_topologySubjectIds` fields.
- Replace lookup helpers that default missing numeric data to `0` with `MissingFact` diagnostics or predicate-level indeterminate failures.

### Acceptance Criteria

- Cached extraction data validates against a schema version before use.
- A missing geometry/topology/style value cannot be mistaken for coordinate `0`, root subject `0`, or no clipping.
- The solver receives either a fully valid world projection or a structured extraction diagnostic.
- No production code mutates extracted world objects with undeclared fields.
- Tests include deliberately corrupted and partial cached worlds to prove failures are explicit.

## Phase 3: Required-Facts Planning Before Extraction

This is the third hard refactor because Imhotep should extract exactly the facts needed by the assertion set, and fail when required facts are unavailable.

### Problem

Required fact inference is currently spread across AST walkers, predicate string checks, compiler branches, and extraction code. This creates gaps when a predicate is added or a unary/binary form changes.

### Refactor

- Introduce a `FactPlan` produced from typed predicate specs and the fully compiled logic AST.
- Represent required facts as a closed set: geometry, visual geometry, DOM ancestry, clipping, stacking context, containing block, scroll/overflow, computed style fields, viewport, environment axes, text metrics, semantic selectors.
- Make extraction consume `FactPlan`, not predicate names.
- Make cache keys include the fact plan, schema version, selectors, environment, viewport, browser identity, and extraction mode.
- Record which facts were requested, fulfilled, skipped, approximated, or failed.

### Acceptance Criteria

- Adding a predicate without declaring required facts is a compile-time error or failing registry test.
- Cache hits are invalidated when environment, viewport, extraction mode, selector plan, or required facts change.
- Predicate evaluators can ask whether a fact is present and why it is missing.
- Compatibility diagnostics distinguish unsupported browser facts from predicate failures.

## Phase 4: Compiler Pipeline With Explicit Intermediate Representations

This is the fourth hard refactor because full expressiveness only matters if every syntax path lowers to the same semantics.

### Problem

Dense DSL, fluent API, canonical IR, FOL AST, and solver formula forms still have duplicated lowering logic and string-dispatch branches. Some parser paths warn and continue when they should produce structured diagnostics. Some assertion types are identified by duck typing rather than discriminated unions.

### Refactor

- Establish a strict pipeline:
  1. Parse or build user assertion.
  2. Validate assertion AST.
  3. Normalize to canonical assertion IR.
  4. Compile to typed logic AST.
  5. Plan facts.
  6. Extract world.
  7. Evaluate.
  8. Produce diagnostics and proof witnesses.
- Make each stage accept and return discriminated unions, not open object shapes.
- Replace dense parser silent drops with structured parser diagnostics that fail the assertion unless the caller explicitly opts into warnings.
- Replace `normalizeRuntime` style duck dispatch with explicit constructors or adapters for each public input kind.
- Consolidate relation lowering so aliases, bounds, directions, axes, spaces, tolerance, and topology references are handled in one typed table.

### Acceptance Criteria

- Dense and fluent syntax for the same assertion produce equivalent canonical IR and equivalent logic AST.
- Parser recovery never silently removes clauses from the evaluated program.
- Every switch over assertion kind is exhaustive with a `never` check.
- Golden tests cover dense DSL, fluent API, canonical IR, and FOL AST equivalence for all built-in predicates.

## Phase 5: Runtime Isolation And No Global Mutable Correctness State

This is the fifth hard refactor because high-assurance suites run many tests in parallel and must not leak state across tests.

### Problem

Several module-scope mutable values affect behavior:

- `defaultPredicatesRegistered`
- global predicate and clause registries
- global/project DSL config
- default execution context
- extraction stats hook guards
- compatibility warning flags
- trace/proof/snapshot/plan counters

Some are harmless counters, but registry and config singletons are correctness-affecting.

### Refactor

- Introduce an explicit `ImhotepRuntime` object that owns registries, config, extraction cache, diagnostics config, and counters.
- Make public helpers create a default runtime, but allow tests and consumers to pass isolated runtimes.
- Deprecate mutable global registry/config APIs in favor of runtime-scoped APIs.
- Move compatibility warning state into runtime diagnostics state.
- Make counters deterministic within a runtime and resettable for tests.

### Acceptance Criteria

- Two tests can run in the same process with different tolerances, registries, cache settings, and diagnostics options without leaking state.
- Predicate registration order and once-only guards cannot change solver behavior across tests.
- Global APIs are compatibility wrappers only and are documented as such.
- Parallel test stress cases prove runtime isolation.

## Phase 6: Extraction Lifecycle, Cleanup, And Error Propagation

This is the sixth hard refactor because browser mutation and CDP lifecycle issues are rare but catastrophic when they contaminate later tests.

### Problem

Extraction injects attributes, globals, and scripts into the page. Cleanup failures are currently logged with low visibility. Some browser-side lifecycle behavior relies on timing or post-hoc cleanup rather than a guaranteed transaction model.

### Refactor

- Treat extraction as a transaction: prepare, inject, collect, validate, cleanup, report.
- Use `try/finally` around every browser/page mutation.
- Track injected attributes and globals in a cleanup manifest.
- Promote cleanup failures to diagnostics attached to the check result, not just console logs.
- Replace lifecycle `setTimeout` cleanup with deterministic teardown where possible.
- Replace custom semaphore where feasible with an explicit queue that supports cancellation and `AbortSignal`.

### Acceptance Criteria

- A failed extraction leaves no Imhotep attributes/globals on the page, or reports exactly what could not be cleaned.
- Cancellation and timeout tests prove no dangling CDP sessions, injected globals, or pending promises remain.
- Cleanup failures are visible to the test runner.
- Repeated extraction on the same page does not accumulate attributes, globals, or cache namespace collisions.

## Phase 7: Predicate Evaluation Semantics And Missing-Fact Discipline

This is the seventh hard refactor because the solver must distinguish false from unknown/unavailable.

### Problem

Some predicate helpers and proof witnesses still default missing coordinates, gaps, topology IDs, or metrics to `0`, `Infinity`, or `NaN`. Some defaults are valid option defaults, but required extracted facts should not behave like user-supplied defaults.

### Refactor

- Separate user option defaults from extracted fact defaults.
- Introduce `FactValue<T> = present | missing | unsupported | approximated` or equivalent.
- Make predicate evaluators return structured outcomes: pass, fail, error, unsupported, indeterminate.
- Keep the public assertion result simple, but preserve detailed outcome reason in diagnostics.
- Audit all `?? 0`, `?? Infinity`, and `?? NaN` in solver and proof code and classify them as option default, diagnostic fallback, or missing-fact bug.

### Acceptance Criteria

- Missing required facts cannot produce a passing predicate.
- Diagnostics explain whether an assertion failed because the relation was false or because facts were unavailable.
- Proof witnesses do not fabricate coordinates or sizes when metrics are missing.
- Regression tests cover partial worlds for every topology and geometry predicate family.

## Phase 8: Diagnostics, Proofs, And Evidence Model

This is the eighth hard refactor because high-assurance users need actionable evidence, not just boolean failures.

### Problem

Diagnostics are already a product strength, but metric propagation and witness construction can still drift from predicate results. Some witness fields default to `0`, making missing data look like real evidence.

### Refactor

- Define typed evidence objects emitted directly by predicate evaluators.
- Make proof/witness formatting consume evidence rather than reconstructing metrics from loose maps.
- Add evidence schemas per predicate family: spatial relation, size/range, topology, overflow/clipping, stacking context, semantic/domain resolution.
- Include extraction provenance: source path, mode, cache hit/miss, schema version, browser, viewport, environment, fulfilled facts.

### Acceptance Criteria

- Every failed assertion has either predicate evidence or an explicit reason evidence is unavailable.
- Witness generation has no fallback-to-zero for required metrics.
- Snapshot/golden diagnostics tests cover every built-in predicate family.

## Phase 9: Test Harness Hardening

This is hard infrastructure work, not polish, because the refactor plan is only credible if the tests can catch regressions.

### Problem

The hard fixture exists and passes, but it is not yet a first-class CI gate. Some tests rely on generated or gitignored artifacts. Cache-staleness, partial-world, environment-change, and cleanup-failure cases need targeted coverage.

### Refactor

- Move hard E2E tests into a source-controlled package path and generate dist artifacts as part of test setup.
- Add CI jobs for hard E2E, cache-staleness, extraction cleanup, and predicate-spec conformance.
- Add equivalence tests that compare dense DSL and fluent API outputs for all predicate families.
- Add mutation-style tests for missing facts, reordered topology, stale cache schema, unknown predicate options, invalid parser tokens, and concurrent extraction.
- Add minimal performance budgets for extraction and solver evaluation so refactors do not hide pathological slowdowns.

### Acceptance Criteria

- CI fails if hard E2E assertions fall below 57/57 or if topology tests use stale cache.
- CI fails on newly introduced production `as any`, empty catches, or required-fact fallback-to-zero patterns unless allowlisted.
- Full local verification command is documented and reproducible from a clean checkout.

## Phase 10: Public API Governance After Internals Are Sound

This phase comes after the hard correctness work. It should not distract from the structural refactors above.

### Refactor

- Define explicit public, experimental, and internal exports for each package.
- Remove or hide internals that expose unstable world, predicate, or compiler shapes.
- Finalize the consumer CLI path and generated config format.
- Align README examples with the actual runtime behavior and supported expressiveness.
- Publish migration notes for deprecated global registries/config APIs.

### Acceptance Criteria

- Public docs only show APIs backed by hard E2E or golden tests.
- Internal refactors can continue without breaking documented consumer contracts.
- Package exports, README examples, CLI scaffolding, and type declarations agree.

## Suggested Execution Order

1. Predicate spec and typed assertion protocol.
2. Compound relation AST and fluent wrapper cleanup.
3. Versioned extracted world schema and cache validation.
4. Fact planning derived from predicate specs.
5. Compiler pipeline normalization and dense/fluent equivalence tests.
6. Runtime isolation and registry/config scoping.
7. Extraction transaction cleanup and visible cleanup diagnostics.
8. Missing-fact discipline in solver and proofs.
9. First-class hard E2E and structural CI gates.
10. Public API/docs/CLI cleanup.

## First Concrete Milestone

Milestone 1 should be deliberately narrow but foundational:

- Add `PredicateSpec` definitions for all built-in predicates.
- Move arity, option schema, alias behavior, unary/binary forms, and required facts into the spec table.
- Add tests that every built-in predicate has a spec and every spec has an evaluator.
- Convert one representative family end-to-end first: `between`, `atLeast`, `atMost`, and dimensional aliases.
- Convert one topology family next: `clippedBy`, `inStackingContext`, `escapeClippingChainOf`.
- Remove the corresponding string-dispatch branches only after equivalence tests prove old and new outputs match.

This milestone creates the spine for the rest of the refactor without requiring a risky rewrite of the whole compiler or solver at once.

## Things To Avoid

- Do not start with README, CLI, package export, or naming polish.
- Do not rewrite the solver before the predicate protocol is typed.
- Do not add new syntax until dense/fluent/canonical/FOL equivalence is tested.
- Do not preserve internal object shapes solely for backward compatibility unless they are documented public API.
- Do not replace `as any` mechanically if the real issue is an untyped protocol; fix the protocol.
- Do not let cache hits bypass schema, environment, viewport, extraction-mode, or fact-plan validation.

## Definition Of Trustworthy Enough

Imhotep is ready to claim high-assurance frontend testing when:

- Syntax expressiveness and runtime semantics are aligned across dense DSL and fluent API.
- Extraction either provides validated facts or emits visible diagnostics.
- Solver predicates do not pass on missing or stale facts.
- Diagnostics carry real evidence, not reconstructed fallback metrics.
- Runtime state is isolated per test context.
- CI continuously exercises unit, compiler equivalence, extraction, topology, cache, cleanup, and hard browser fixtures.