426 lines
12 KiB
Markdown
426 lines
12 KiB
Markdown
# IMHOTEP LLM Skill Guide
|
|
|
|
Purpose: teach an LLM/operator how to use Imhotep effectively, quickly, and safely in real repos.
|
|
|
|
Audience: coding agents and engineers adding relational GUI assertions to Playwright suites.
|
|
|
|
Reading mode: action-first. Prefer concrete patterns over theory.
|
|
|
|
## 1) Mental Model
|
|
|
|
Imhotep is for relational UI behavior, not snapshot aesthetics.
|
|
|
|
Use it to answer:
|
|
|
|
1. Is element A left/right/above/below B with meaningful bounds?
|
|
2. Does this relationship hold across states and environments?
|
|
3. Do semantic selectors and layout assertions converge on stable behavior?
|
|
4. Do failures explain "why" with machine-usable diagnostics?
|
|
|
|
Do not use Imhotep as a thin wrapper around raw pixel assertions.
|
|
|
|
## 2) Operator Workflow (Always Start Here)
|
|
|
|
When entering a codebase:
|
|
|
|
1. Add `imhotep-playwright` and Playwright dependencies.
|
|
2. Build one passing relation and one intentional failing relation.
|
|
3. Confirm `checkAll()` returns structured diagnostics.
|
|
4. Add semantic subject assertions (`getByRole`, `getByText`) where possible.
|
|
5. Add transform-space checks (`visual` vs `layout`) for transformed UIs.
|
|
6. Add state and viewport checks only after baseline relation checks are stable.
|
|
|
|
Ship small, truthful checks first; expand breadth iteratively.
|
|
|
|
## 3) Fast Start Template
|
|
|
|
```ts
|
|
import { test, expect } from '@playwright/test'
|
|
import { imhotep } from 'imhotep-playwright'
|
|
|
|
test('layout contract', async ({ page }) => {
|
|
await page.goto('http://localhost:3000')
|
|
// Pass cacheDir: null to avoid geometry cache serialization crash (known issue)
|
|
const ui = await imhotep(page, { deterministic: true, seed: 42, cacheDir: null })
|
|
|
|
ui.expect('[data-testid="primary"]').to.be.leftOf('[data-testid="secondary"]', {
|
|
minGap: 8,
|
|
space: 'visual',
|
|
})
|
|
|
|
const result = await ui.checkAll()
|
|
expect(result.passed).toBe(true)
|
|
})
|
|
```
|
|
|
|
## 4) API Surface You Should Use
|
|
|
|
Primary public methods:
|
|
|
|
1. `imhotep(page, options?)`
|
|
2. `ui.expect(subject)`
|
|
3. `ui.spec(source)`
|
|
4. `ui.checkAll()`
|
|
5. `ui.extract(subject)`
|
|
6. `ui.materializeState(selector, state)`
|
|
7. `ui.applyEnvironment(env)`
|
|
8. `ui.getByRole/getByText/getByLabelText/getByTestId/locator`
|
|
|
|
Property-run entry points:
|
|
|
|
1. `imhotepComponent(component, options)`
|
|
2. `imhotepStory(storyId, options)`
|
|
3. `imhotepFixture(fixturePath, options)`
|
|
|
|
## 5) Authoring Quality Ladder
|
|
|
|
### Bronze (minimum acceptable)
|
|
|
|
1. At least one relation assertion per critical screen
|
|
2. One intentional failing test proving diagnostics are actionable
|
|
|
|
### Silver (production worthy)
|
|
|
|
1. Semantic selectors for user-visible elements
|
|
2. State-aware checks (`hover`, `focus`, `active`, `disabled`, `checked`, `expanded`, `selected`, `pressed`, `visited`) for critical controls
|
|
3. Responsive checks for mobile + desktop viewports
|
|
|
|
### Gold (high confidence)
|
|
|
|
1. Space-aware checks where transforms are present
|
|
2. Property runs over meaningful prop/input domains
|
|
3. Deterministic replay workflows documented in test harness
|
|
4. CI gate on both workspace tests and fixture E2E
|
|
|
|
If tests only assert status booleans and ignore diagnostics, quality is incomplete.
|
|
|
|
## 6) Relation Checklist by Use Case
|
|
|
|
### Control Alignment
|
|
|
|
1. `leftOf/rightOf` with `minGap`
|
|
2. `alignedWith` or `centeredWithin` with tolerance where needed
|
|
|
|
### Containment and Layering
|
|
|
|
1. `inside` / `contains` for container contracts
|
|
2. `overlaps` only when overlap is intentional
|
|
3. `inStackingContext` options for layering constraints
|
|
4. `separatedFrom` for non-overlap with gap constraints
|
|
|
|
### Size Contracts
|
|
|
|
1. `atLeast('44px').wide` for target accessibility
|
|
2. `atMost` and `between` for constrained layouts
|
|
|
|
### Motion/Transform UI
|
|
|
|
1. assert in default `visual` space first
|
|
2. add explicit `space: 'layout'` where pre-transform semantics matter
|
|
|
|
## 7) Semantic Subject Guidance
|
|
|
|
Prefer semantic sources when they are stable and user-facing:
|
|
|
|
1. `getByRole(role, { name })`
|
|
2. `getByLabelText(label)`
|
|
3. `getByText(text)`
|
|
4. `getByTestId(id)` as a pragmatic fallback
|
|
5. `locator(css)` or raw CSS only when semantics are unavailable
|
|
|
|
Use mixed semantic + CSS references when migrating legacy suites incrementally.
|
|
|
|
## 8) Dense String Contracts
|
|
|
|
Use `ui.spec(...)` when contract sets are easier to maintain as grouped text.
|
|
|
|
Rules:
|
|
|
|
1. keep dense specs short and scoped per scenario
|
|
2. keep fluent and dense checks semantically equivalent in critical paths
|
|
3. use diagnostics from `checkAll()` to tighten ambiguous clauses
|
|
|
|
### Basic Relation Syntax
|
|
|
|
Selectors must be single-quoted strings. Relations are keywords, not method calls.
|
|
|
|
```js
|
|
// Spatial relations with gap constraints
|
|
ui.spec(`
|
|
'[data-testid="a"]' leftOf '[data-testid="b"]' gap 8px
|
|
'[data-testid="card"]' inside 'viewport'
|
|
'[data-testid="header"]' above '[data-testid="content"]' gap 16px
|
|
'[data-testid="sidebar"]' leftOf '[data-testid="main"]' gap 8px..24px
|
|
`)
|
|
```
|
|
|
|
Supported relations: `leftOf`, `rightOf`, `above`, `below`, `alignedWith`, `centeredWithin`, `inside`, `overlaps`, `contains`, `separatedFrom`.
|
|
|
|
**Fluent API only:**
|
|
- Aliases: `beside`, `nextTo`, `adjacent`, `touching`, `near`, `under`, `within`
|
|
- `space: 'layout'` / `space: 'visual'` option on relations
|
|
- `.and` / `.or` chaining on fluent relations
|
|
- State materialization: `disabled`, `checked`, `expanded`, `selected`, `pressed`, `visited`
|
|
|
|
**Dense DSL only:**
|
|
- FOL quantifiers (`forall`, `exists`) with boolean connectives (`and`, `or`, `not`, `implies`)
|
|
- `width` / `height` / `size` predicate calls with comparison operators (`>=`, `<=`, `==`, `!=`)
|
|
- Frame attachments: `in viewport:`, `in containingBlock(...):`
|
|
|
|
**Both fluent and dense DSL:**
|
|
- `contains`, `separatedFrom`
|
|
- `between` size assertions
|
|
|
|
### Gap Options
|
|
|
|
```js
|
|
ui.spec(`
|
|
// Exact minimum gap
|
|
'.button' leftOf '.label' gap 8px
|
|
|
|
// Gap range (between min and max)
|
|
'.button' leftOf '.label' gap 8px..16px
|
|
`)
|
|
```
|
|
|
|
### Frame Attachments
|
|
|
|
Use `in frameName:` with indented assertions to scope relations to a specific frame.
|
|
|
|
```js
|
|
ui.spec(`
|
|
in viewport:
|
|
'[data-testid="a"]' leftOf '[data-testid="b"]'
|
|
'[data-testid="modal"]' centeredWithin 'viewport'
|
|
|
|
in containingBlock('[data-testid="parent"]'):
|
|
'.child' inside '.parent'
|
|
`)
|
|
```
|
|
|
|
### Compound Assertions
|
|
|
|
Chain relations with `and` and `or` in dense DSL.
|
|
|
|
```js
|
|
ui.spec(`
|
|
'.header' above '.content' and leftOf '.sidebar'
|
|
'.modal' centeredWithin 'viewport' or inside '.container'
|
|
`)
|
|
```
|
|
|
|
### Size Assertions
|
|
|
|
```js
|
|
ui.spec(`
|
|
// Minimum size
|
|
'[data-testid="btn"]' atLeast 44px wide
|
|
'[data-testid="btn"]' atLeast 44px tall
|
|
|
|
// Maximum size
|
|
'[data-testid="img"]' atMost 200px wide
|
|
|
|
// Size range
|
|
'[data-testid="img"]' between 100px and 200px wide
|
|
|
|
// Predicate-style size checks with comparison operators
|
|
forall $btn in buttons('.primary'):
|
|
width($btn) >= 44
|
|
height($btn) >= 44
|
|
`)
|
|
```
|
|
|
|
### Quantifiers
|
|
|
|
Apply `all`, `any`, or `none` to assert over multiple elements.
|
|
|
|
```js
|
|
ui.spec(`
|
|
all '.item' above '.footer' gap 16px
|
|
none '.error' overlaps '.success'
|
|
`)
|
|
```
|
|
|
|
### First-Order Logic (FOL)
|
|
|
|
Use `forall` and `exists` with boolean connectives for complex relational contracts.
|
|
|
|
```js
|
|
ui.spec(`
|
|
// All buttons are at least 44px wide
|
|
forall $btn in buttons('.primary'):
|
|
width($btn) >= 44
|
|
|
|
// Existence: at least one card contains a title
|
|
exists $card in cards('.card'):
|
|
descendants($card, '.title')
|
|
|
|
// Boolean connectives: and, or, not, implies
|
|
forall $a in elements('.a'):
|
|
forall $b in elements('.b'):
|
|
leftOf($a, $b) and above($a, $b)
|
|
|
|
forall $modal in elements('.modal'):
|
|
not overlaps($modal, '.backdrop')
|
|
|
|
forall $x in elements('.x'):
|
|
forall $y in elements('.y'):
|
|
inside($x, '.container') implies leftOf($x, $y)
|
|
`)
|
|
```
|
|
|
|
Supported connectives: `and`, `or`, `not`, `implies`.
|
|
|
|
Supported domain constructors: `elements(selector)`, `buttons(selector)`, `cards(selector)`.
|
|
|
|
Nested quantifiers for multi-variable formulas: use nested `forall` blocks instead of comma-separated variables.
|
|
|
|
Supported predicates in FOL: `leftOf`, `rightOf`, `above`, `below`, `inside`, `overlaps`, `alignedWith`, `centeredWithin`, `contains`, `separatedFrom`, `width`, `height`, `size`.
|
|
|
|
### Common Mistakes and Corrections
|
|
|
|
- **Bare selectors without quotes**: Selectors must be single-quoted strings.
|
|
```js
|
|
// ❌ Wrong — bare selector
|
|
[data-testid="x"] leftOf [data-testid="y"]
|
|
|
|
// ✅ Correct — quoted selector
|
|
'[data-testid="x"]' leftOf '[data-testid="y"]'
|
|
```
|
|
|
|
- **Using `is` keyword**: The parser does not accept `is` or `have` as connecting words.
|
|
```js
|
|
// ❌ Wrong — 'is' is not a valid keyword
|
|
'a' is leftOf 'b'
|
|
|
|
// ✅ Correct — direct relation keyword
|
|
'a' leftOf 'b'
|
|
```
|
|
|
|
- **Missing gap unit**: Gap values require a unit.
|
|
```js
|
|
// ❌ Wrong — missing unit
|
|
'a' leftOf 'b' gap 8
|
|
|
|
// ✅ Correct — gap with unit
|
|
'a' leftOf 'b' gap 8px
|
|
```
|
|
|
|
- **Wrong quote style**: Use single quotes for selectors; double quotes inside are fine.
|
|
```js
|
|
// ❌ Wrong — double-quoted selector
|
|
"[data-testid='x']" leftOf "[data-testid='y']"
|
|
|
|
// ✅ Correct — single-quoted selector with double quotes inside
|
|
'[data-testid="x"]' leftOf '[data-testid="y"]'
|
|
```
|
|
|
|
## 9) Diagnostics You Should Watch
|
|
|
|
Key codes and meanings:
|
|
|
|
1. `IMH_SELECTOR_ZERO_MATCHES`: selector resolved to no elements
|
|
2. `IMH_EXTRACT_PROTOCOL_ERROR`: extraction path failed
|
|
3. relation-specific failures (example: `IMH_RELATION_LEFT_OF_FAILED`)
|
|
|
|
Operator rule:
|
|
|
|
Do not silence diagnostics; treat them as contract feedback.
|
|
|
|
## 10) Determinism and Replay
|
|
|
|
When reproducibility matters:
|
|
|
|
1. initialize with deterministic options (`seed`)
|
|
2. preserve failing diagnostics payloads in CI artifacts
|
|
3. rerun with same seed before changing assertions
|
|
|
|
If a failure is flaky, first classify whether it is:
|
|
|
|
1. extraction instability,
|
|
2. real layout nondeterminism,
|
|
3. threshold too strict for CI hardware.
|
|
|
|
## 11) CI Integration Pattern
|
|
|
|
Recommended gates:
|
|
|
|
1. `npm run build`
|
|
2. `npm test --workspaces`
|
|
3. `npx playwright test`
|
|
|
|
For local-path package evaluation in temp projects:
|
|
|
|
1. install all required local packages, not just `imhotep-playwright`
|
|
2. if symlink duplication appears, set `NODE_OPTIONS=--preserve-symlinks`
|
|
|
|
## 12) Anti-Patterns (Do Not Do This)
|
|
|
|
1. Writing only `expect(result.passed).toBe(true)` with no diagnostic assertions.
|
|
2. Converting every relation to hardcoded pixel math.
|
|
3. Ignoring transform-space semantics in transformed UIs.
|
|
4. Treating selector zero matches as acceptable in passing tests.
|
|
5. Suppressing fail-closed errors without root-cause triage.
|
|
|
|
## 13) Debugging Playbook
|
|
|
|
When a relation unexpectedly fails:
|
|
|
|
1. inspect `result.diagnostics` first
|
|
2. inspect `result.clauseResults[*].status/truth/metrics`
|
|
3. run `ui.extract(subject)` for both sides to inspect geometry/origin
|
|
4. verify state and viewport preconditions are applied
|
|
5. for transformed elements, compare `space: 'visual'` vs `'layout'`
|
|
|
|
When failure is `error` instead of `fail`:
|
|
|
|
1. suspect extraction or unsupported path
|
|
2. verify selector materialization and runtime context
|
|
3. fail closed and do not coerce to pass
|
|
|
|
## 14) Property Run Guidance
|
|
|
|
Use property runs for invariant classes, not one-off screenshots.
|
|
|
|
Examples:
|
|
|
|
1. minimum tap target sizes across prop combinations
|
|
2. spacing constraints across variant inputs
|
|
3. containment/alignment under generated data
|
|
|
|
For sampled runs:
|
|
|
|
1. store seed and failing case metadata
|
|
2. shrink only with oracle-preserving checks
|
|
|
|
## 15) Contract Evolution Strategy
|
|
|
|
When tightening contracts in existing suites:
|
|
|
|
1. start with smoke relation checks per page
|
|
2. add semantic subjects gradually
|
|
3. introduce state assertions where user behavior depends on state
|
|
4. introduce responsive and transform-space assertions next
|
|
5. move shared checks into helper modules only after semantics stabilize
|
|
|
|
## 16) Documentation Pointers
|
|
|
|
1. `README.md` for usage and quickstart
|
|
2. `SKILLS.md` for authoring patterns and DSL syntax
|
|
3. `BUILD.md` for build/test/e2e commands
|
|
4. `CHANGELOG.md` for release notes and known limitations
|
|
5. `SECURITY.md` for trust boundaries
|
|
|
|
## 17) Final Rule for LLM Operators
|
|
|
|
Imhotep is valuable only when it encodes user-visible layout truths.
|
|
|
|
Ask for every critical view:
|
|
|
|
1. Which spatial relationships must always hold?
|
|
2. Which relationships change with state or viewport?
|
|
3. Which semantic subjects best represent user intent?
|
|
4. What diagnostic evidence will prove regressions quickly?
|
|
|
|
Write those assertions first. Keep them deterministic. Fail closed.
|