Cache keys now include a schema version prefix so that world-schema
changes automatically invalidate stale cached extraction results.
Previously two incompatible schema versions would share the same
cache key if URL/selectors/env matched, silently returning stale data.
WORLD_CACHE_SCHEMA_VERSION exported publicly so consumers can
increment it when making schema-incompatible changes to extraction.
658 tests pass.
Imhotep-core: add predicate-specs.ts with 34 PredicateSpec entries as
the single source of truth for predicate metadata (name, arity,
aliases, requiredFacts, validOptions, diagnosticCode, relationCode,
decompose rules, category flags). Lookup helpers derive all
per-predicate information from the static table.
Extraction.ts (3 consumers converted):
- computeRequiredFacts: replace getRequiredFactsForPredicate (global
registry) with getPredicateRequiredFacts (static spec table).
Removes registerDefaultPredicates() dependency from fact planning.
- compileCanonicalClauseToFormula: replace 4 string-branch patterns
('between'/'separatedFrom'/'atLeast'/'aspectRatio'/'inStackingContext')
with spec-driven getPredicateDecomposition() and isUnaryPredicate().
Same behavior, zero string dispatch in predicate selection.
- mapFolDiagnostic: replace PREDICATE_TO_DIAGNOSTIC_CODE (13-entry
Record) with getPredicateDiagnosticCode() from spec table.
595 SDK + 57 hard E2E tests pass.
geometry-cache.ts: replace 5 empty catch blocks with console.warn
- statSync failure, rmSync failure (x2), readCachedWorld failure,
readCachedExtractionResult failure were all silently swallowed.
Now emit context-bearing warnings so stale/corrupt caches are visible.
predicates.ts: replace __boxIndex as any mutation with WeakMap
- getBorderRect used (world as any).__boxIndex to cache a subject-to-
box-index map on the world object. Replaced with module-level WeakMap
that auto-collects when the world is GC'd. Eliminates 2 as any casts.
extraction.ts: serialize materializeSemanticSelector + debug cleanup
- 3 Promise.all sites over page.evaluate changed to sequential for..of
to eliminate DOM modification race conditions.
- 2 .catch(()=>{}) cleanup blocks now use console.debug so failed
cleanup is traceable when debugging.
- resolveViewport catch now emits console.warn on zero-viewport fallback.
648 SDK + 57 E2E tests pass.
pipeline.ts: || undefined → ?? undefined (9 occurrences)
- || converts valid subject ID 0 to undefined because 0 is falsy in JS.
This broke clause witnesses and topology references for the first subject.
domain-index.ts: remove .toLowerCase() on CSS selectors
- CSS selectors are case-sensitive (IDs, class names, attribute values).
Lowercasing on lookup but not on storage (selectorIndex) meant case-
sensitive selectors never matched — returning empty arrays silently.
canonical.ts: add warning when visualBoxes falls back to layout boxes
- visualBoxes ?? boxes silently substituted layout coordinates for visual
space, producing incorrect results for transform-dependent assertions.
Now emits console.warn so silent data corruption is visible.
extraction.ts: serialize materializeSemanticSelector calls (3 sites)
- Changed Promise.all over page.evaluate() to sequential for..of. While
Playwright serializes CDP calls internally, concurrent DOM-modifying
evaluate() calls create undefined execution order. Sequential resolution
eliminates theoretical race conditions for semantic selector injection.
engine.ts: include stack trace in evaluator exception diagnostics
- Catch-all converted ALL exceptions (including TypeError from programming
bugs) to IMH_EVALUATOR_EXCEPTION with just err.message. Now includes
stack trace and logs to console.warn for visibility. Distinguishes
TypeError (programming bug) from other evaluation errors.
648 SDK tests + 57 E2E hard tests pass, zero regressions.