fix: harden engine, enrich failure diagnostics, close adoption gaps

- P0: CLI verify now honors test budget with seeded multi-sample - P0: Observe sampling enforced via Math.random() gate in hook-validator - P1: Remove misleading undici-mock-agent isolation option - P1: Qualify reuses shared discoverRouteDetails() with warnings - P1: Chaos/scenario config exposed via preset schema - P1: README/docs limitations updated to current state - P2: Nested response annotations prefer 2xx deterministically - P2: --changed documented as heuristic in verify.md - Add observe sink tests (sampling 0/1, sink failure non-interference) - Add verify runs regression tests (scale, determinism, variants) - Add configured-scenario qualify test (independent of OAuth fixture) - Add coverageBreakdown to qualify artifacts (per-gate route coverage) - Add production-style observe example with real sink in docs/observe.md - Add nightly/staging vs PR gating guidance to docs/qualify.md - Enrich VerifyFailure with formula-aware diagnostics: status:201 => 'HTTP 200', body field checks => actual values - Remove stale observe CLI activation message - Document outbound mocks as process-global in getting-started.md - Refresh APOPHIS_ADOPTION_AUDIT.md with current state 903 tests pass, build clean, typecheck clean.
2026-05-21 20:39:36 -07:00
parent 55b0262799
commit d0523fcc2d
128 changed files with 4004 additions and 3631 deletions
@@ -2,7 +2,7 @@

 Run scenario, stateful, and chaos checks against non-production Fastify services.

-Qualify extends the invariant-driven approach from [Invariant-Driven Automated Testing](https://arxiv.org/abs/2602.23922) (Malhado Ribeiro, 2021) with multi-step protocol flows, stateful sequences, and controlled fault injection.
+Qualify extends invariant-driven testing with multi-step protocol flows, stateful sequences, and controlled fault injection.

 ## What Qualify Does

@@ -15,9 +15,51 @@ Qualify extends the invariant-driven approach from [Invariant-Driven Automated T

 ## When to Use It

- **Nightly CI**: Scenario and stateful checks for critical flows
- **Staging**: Protocol flow validation before production
- **Specialist teams**: Auth, billing, workflow systems
+Qualify is heavier than verify. Use it where the depth is worth the runtime cost:
+
+| Workflow | Recommended | Why |
+|---|---|---|
+| **Pull request** | No — use `verify` | `verify` is fast (<5s for typical services) and catches behavioral regressions per-route. Qualify adds multi-minute scenario/stateful/chaos runs that are too slow for PR feedback loops. |
+| **Nightly** | Yes | Full scenario, stateful, and chaos execution against staging. Catch protocol-level regressions that single-route verification cannot see. |
+| **Pre-release** | Yes | Run qualify against the exact artifact that will be promoted to production. Treat a passing qualify run as a release gate for critical flows. |
+| **Specialist workflows** | Yes | Auth flows, billing sequences, idempotency guarantees, and pagination consistency need multi-step qualification that verify cannot express. |
+| **Chaos engineering** | Nightly or ad-hoc | Chaos injection increases latency. Run it in dedicated CI slots, not on every commit. |
+
+### Quick workflow setup
+
+```javascript
+// apophis.config.js — two profiles for different cadences
+export default {
+  mode: 'qualify',
+  profiles: {
+    'nightly': {
+      name: 'nightly',
+      mode: 'qualify',
+      preset: 'deep',
+      features: ['scenario', 'stateful', 'chaos'],
+      routes: [],
+    },
+    'pre-release': {
+      name: 'pre-release',
+      mode: 'qualify',
+      preset: 'deep',
+      features: ['scenario', 'stateful'],
+      routes: [],
+    },
+  },
+  presets: {
+    deep: { timeout: 15000, chaos: false },
+  },
+}
+```
+
+Run nightly: `apophis qualify --profile nightly`
+Run pre-release: `apophis qualify --profile pre-release --format json-summary`
+
+For pull requests, use verify instead:
+```bash
+apophis verify --profile ci
+```

 ## Scenario Examples

@@ -246,7 +288,205 @@ export default {

 ## Gate Execution Counts

-Human output shows per-gate execution counts (scenario, stateful, chaos, adversity) so you can verify which gates actually ran.
+Human output shows per-gate execution counts (scenario, stateful, chaos) so you can verify which gates actually ran.
+
+## Custom Scenarios (config-defined)
+
+Define arbitrary multi-step scenarios directly in your `apophis.config.js` without writing code:
+
+```javascript
+// apophis.config.js
+export default {
+  mode: 'qualify',
+  scenarios: [
+    {
+      name: 'idempotency-check',
+      steps: [
+        {
+          name: 'create-order',
+          request: {
+            method: 'POST',
+            url: '/orders',
+            body: { product: 'widget', quantity: 3 },
+          },
+          expect: ['status:201', 'response_body(this).id != null'],
+          capture: { orderId: 'response_body(this).id' },
+        },
+        {
+          name: 'duplicate-create',
+          request: {
+            method: 'POST',
+            url: '/orders',
+            headers: { 'x-idempotency-key': 'dup-001' },
+            body: { product: 'widget', quantity: 3 },
+          },
+          expect: ['status:200', 'response_body(this).id == "$create-order.orderId"'],
+        },
+      ],
+    },
+    {
+      name: 'pagination-flow',
+      steps: [
+        {
+          name: 'list-page-1',
+          request: { method: 'GET', url: '/items?page=1&limit=5' },
+          expect: ['status:200', 'response_body(this).items != null'],
+          capture: { firstPageCount: 'response_body(this).items.length' },
+        },
+        {
+          name: 'list-page-2',
+          request: { method: 'GET', url: '/items?page=2&limit=5' },
+          expect: ['status:200'],
+        },
+      ],
+    },
+  ],
+  profiles: {
+    'nightly': {
+      name: 'nightly',
+      mode: 'qualify',
+      preset: 'deep',
+      routes: ['POST /orders', 'GET /orders', 'GET /items'],
+    },
+  },
+  presets: {
+    deep: { name: 'deep', timeout: 15000, chaos: true },
+  },
+  environments: {
+    local: { name: 'local', allowQualify: true, allowChaos: true },
+  },
+};
+```
+
+Scenario step fields:
+
+| Field | Required | Description |
+|---|---|---|
+| `name` | yes | Human-readable step label |
+| `request.method` | yes | HTTP method (GET, POST, PUT, DELETE, PATCH) |
+| `request.url` | yes | URL path (e.g. `/orders`, `/items?page=1`) |
+| `request.body` | no | JSON request body |
+| `request.headers` | no | Custom headers (e.g. `x-idempotency-key`) |
+| `expect` | yes | APOSTL formulas that must return truthy for step to pass |
+| `capture` | no | Map of `{ key: "apostl_formula" }` — captured values are substituted via `$stepName.key` in later steps |
+
+Captured values are interpolated in subsequent step URLs, bodies, and headers using `$stepName.key` syntax.
+
+## Chaos Configuration
+
+Fine-tune chaos behavior via preset fields:
+
+```javascript
+presets: {
+  'chaos-lab': {
+    name: 'chaos-lab',
+    timeout: 10000,
+    chaos: true,
+    chaosStrategy: 'sample',   // 'one' | 'all' | 'sample' | 'routes'
+    chaosSampleSize: 5,        // routes to sample when strategy = 'sample'
+    chaosSampleRoutes: [       // explicit routes when strategy = 'routes'
+      'GET /api/users',
+      'POST /api/orders',
+    ],
+  },
+}
+```
+
+| Field | Default | Description |
+|---|---|---|
+| `chaosStrategy` | `'one'` | Route selection strategy |
+| `chaosSampleSize` | `3` | Routes to sample (strategy `sample`) |
+| `chaosSampleRoutes` | — | Explicit route list (strategy `routes`) |
+
+## Artifact Interpretation
+
+Each qualify run produces an artifact JSON document. Key sections:
+
+### executionSummary
+
+```json
+{
+  "executionSummary": {
+    "totalPlanned": 15,
+    "totalExecuted": 12,
+    "totalPassed": 10,
+    "totalFailed": 2,
+    "scenariosRun": 3,
+    "statefulTestsRun": 5,
+    "chaosRunsRun": 4,
+    "chaosRoutesPlanned": 2,
+    "chaosRoutesExecuted": 2,
+    "totalSteps": 12
+  }
+}
+```
+
+Use `totalExecuted` vs `totalPlanned` to see how many checks actually ran (gate gating, route filtering, chaos selection). A non-zero `totalPlanned` with zero `totalExecuted` means all gates were disabled or no routes matched.
+
+### executedRoutes / skippedRoutes
+
+```json
+{
+  "executedRoutes": ["POST /orders", "GET /orders/:id", "GET /items"],
+  "skippedRoutes": [
+    { "route": "DELETE /items/:id", "reason": "No scenario covers this route" },
+    { "route": "GET /health", "reason": "Not selected by chaos strategy: one" }
+  ]
+}
+```
+
+`executedRoutes` lists every route that had at least one scenario step, stateful command, or chaos injection. `skippedRoutes` explains why every other discovered route was excluded.
+
+### profileGates
+
+```json
+{
+  "profileGates": {
+    "scenario": true,
+    "stateful": true,
+    "chaos": false
+  }
+}
+```
+
+Shows which gates were active. Combine with `executionSummary` per-gate counts to verify each active gate produced results.
+
+### stepTraces
+
+Each entry records an individual step execution:
+
+```json
+{
+  "stepTraces": [
+    {
+      "step": 0,
+      "name": "create-order",
+      "route": "POST /orders",
+      "durationMs": 12,
+      "status": "passed"
+    }
+  ]
+}
+```
+
+Filter by `status` to isolate failures. Look at `durationMs` for performance regressions.
+
+### failures
+
+```json
+{
+  "failures": [
+    {
+      "route": "POST /orders",
+      "contract": "status:201",
+      "category": "runtime",
+      "replayCommand": "apophis replay --artifact reports/apophis/qualify-2026-05-21T...json"
+    }
+  ]
+}
+```
+
+`replayCommand` gives a copy-pasteable command to re-run the exact same seed with the stored artifact for triage.

 ## Zero-Execution Guardrail