fix: harden engine, enrich failure diagnostics, close adoption gaps

- P0: CLI verify now honors  test budget with seeded multi-sample
- P0: Observe sampling enforced via Math.random() gate in hook-validator
- P1: Remove misleading undici-mock-agent isolation option
- P1: Qualify reuses shared discoverRouteDetails() with warnings
- P1: Chaos/scenario config exposed via preset schema
- P1: README/docs limitations updated to current state
- P2: Nested response annotations prefer 2xx deterministically
- P2: --changed documented as heuristic in verify.md

- Add observe sink tests (sampling 0/1, sink failure non-interference)
- Add verify runs regression tests (scale, determinism, variants)
- Add configured-scenario qualify test (independent of OAuth fixture)
- Add coverageBreakdown to qualify artifacts (per-gate route coverage)
- Add production-style observe example with real sink in docs/observe.md
- Add nightly/staging vs PR gating guidance to docs/qualify.md

- Enrich VerifyFailure with formula-aware diagnostics:
  status:201 => 'HTTP 200', body field checks => actual values
- Remove stale observe CLI activation message
- Document outbound mocks as process-global in getting-started.md
- Refresh APOPHIS_ADOPTION_AUDIT.md with current state

903 tests pass, build clean, typecheck clean.
This commit is contained in:
John Dvorak
2026-05-21 20:39:36 -07:00
parent 55b0262799
commit d0523fcc2d
128 changed files with 4004 additions and 3631 deletions
+245 -5
View File
@@ -2,7 +2,7 @@
Run scenario, stateful, and chaos checks against non-production Fastify services.
Qualify extends the invariant-driven approach from [Invariant-Driven Automated Testing](https://arxiv.org/abs/2602.23922) (Malhado Ribeiro, 2021) with multi-step protocol flows, stateful sequences, and controlled fault injection.
Qualify extends invariant-driven testing with multi-step protocol flows, stateful sequences, and controlled fault injection.
## What Qualify Does
@@ -15,9 +15,51 @@ Qualify extends the invariant-driven approach from [Invariant-Driven Automated T
## When to Use It
- **Nightly CI**: Scenario and stateful checks for critical flows
- **Staging**: Protocol flow validation before production
- **Specialist teams**: Auth, billing, workflow systems
Qualify is heavier than verify. Use it where the depth is worth the runtime cost:
| Workflow | Recommended | Why |
|---|---|---|
| **Pull request** | No — use `verify` | `verify` is fast (<5s for typical services) and catches behavioral regressions per-route. Qualify adds multi-minute scenario/stateful/chaos runs that are too slow for PR feedback loops. |
| **Nightly** | Yes | Full scenario, stateful, and chaos execution against staging. Catch protocol-level regressions that single-route verification cannot see. |
| **Pre-release** | Yes | Run qualify against the exact artifact that will be promoted to production. Treat a passing qualify run as a release gate for critical flows. |
| **Specialist workflows** | Yes | Auth flows, billing sequences, idempotency guarantees, and pagination consistency need multi-step qualification that verify cannot express. |
| **Chaos engineering** | Nightly or ad-hoc | Chaos injection increases latency. Run it in dedicated CI slots, not on every commit. |
### Quick workflow setup
```javascript
// apophis.config.js — two profiles for different cadences
export default {
mode: 'qualify',
profiles: {
'nightly': {
name: 'nightly',
mode: 'qualify',
preset: 'deep',
features: ['scenario', 'stateful', 'chaos'],
routes: [],
},
'pre-release': {
name: 'pre-release',
mode: 'qualify',
preset: 'deep',
features: ['scenario', 'stateful'],
routes: [],
},
},
presets: {
deep: { timeout: 15000, chaos: false },
},
}
```
Run nightly: `apophis qualify --profile nightly`
Run pre-release: `apophis qualify --profile pre-release --format json-summary`
For pull requests, use verify instead:
```bash
apophis verify --profile ci
```
## Scenario Examples
@@ -246,7 +288,205 @@ export default {
## Gate Execution Counts
Human output shows per-gate execution counts (scenario, stateful, chaos, adversity) so you can verify which gates actually ran.
Human output shows per-gate execution counts (scenario, stateful, chaos) so you can verify which gates actually ran.
## Custom Scenarios (config-defined)
Define arbitrary multi-step scenarios directly in your `apophis.config.js` without writing code:
```javascript
// apophis.config.js
export default {
mode: 'qualify',
scenarios: [
{
name: 'idempotency-check',
steps: [
{
name: 'create-order',
request: {
method: 'POST',
url: '/orders',
body: { product: 'widget', quantity: 3 },
},
expect: ['status:201', 'response_body(this).id != null'],
capture: { orderId: 'response_body(this).id' },
},
{
name: 'duplicate-create',
request: {
method: 'POST',
url: '/orders',
headers: { 'x-idempotency-key': 'dup-001' },
body: { product: 'widget', quantity: 3 },
},
expect: ['status:200', 'response_body(this).id == "$create-order.orderId"'],
},
],
},
{
name: 'pagination-flow',
steps: [
{
name: 'list-page-1',
request: { method: 'GET', url: '/items?page=1&limit=5' },
expect: ['status:200', 'response_body(this).items != null'],
capture: { firstPageCount: 'response_body(this).items.length' },
},
{
name: 'list-page-2',
request: { method: 'GET', url: '/items?page=2&limit=5' },
expect: ['status:200'],
},
],
},
],
profiles: {
'nightly': {
name: 'nightly',
mode: 'qualify',
preset: 'deep',
routes: ['POST /orders', 'GET /orders', 'GET /items'],
},
},
presets: {
deep: { name: 'deep', timeout: 15000, chaos: true },
},
environments: {
local: { name: 'local', allowQualify: true, allowChaos: true },
},
};
```
Scenario step fields:
| Field | Required | Description |
|---|---|---|
| `name` | yes | Human-readable step label |
| `request.method` | yes | HTTP method (GET, POST, PUT, DELETE, PATCH) |
| `request.url` | yes | URL path (e.g. `/orders`, `/items?page=1`) |
| `request.body` | no | JSON request body |
| `request.headers` | no | Custom headers (e.g. `x-idempotency-key`) |
| `expect` | yes | APOSTL formulas that must return truthy for step to pass |
| `capture` | no | Map of `{ key: "apostl_formula" }` — captured values are substituted via `$stepName.key` in later steps |
Captured values are interpolated in subsequent step URLs, bodies, and headers using `$stepName.key` syntax.
## Chaos Configuration
Fine-tune chaos behavior via preset fields:
```javascript
presets: {
'chaos-lab': {
name: 'chaos-lab',
timeout: 10000,
chaos: true,
chaosStrategy: 'sample', // 'one' | 'all' | 'sample' | 'routes'
chaosSampleSize: 5, // routes to sample when strategy = 'sample'
chaosSampleRoutes: [ // explicit routes when strategy = 'routes'
'GET /api/users',
'POST /api/orders',
],
},
}
```
| Field | Default | Description |
|---|---|---|
| `chaosStrategy` | `'one'` | Route selection strategy |
| `chaosSampleSize` | `3` | Routes to sample (strategy `sample`) |
| `chaosSampleRoutes` | — | Explicit route list (strategy `routes`) |
## Artifact Interpretation
Each qualify run produces an artifact JSON document. Key sections:
### executionSummary
```json
{
"executionSummary": {
"totalPlanned": 15,
"totalExecuted": 12,
"totalPassed": 10,
"totalFailed": 2,
"scenariosRun": 3,
"statefulTestsRun": 5,
"chaosRunsRun": 4,
"chaosRoutesPlanned": 2,
"chaosRoutesExecuted": 2,
"totalSteps": 12
}
}
```
Use `totalExecuted` vs `totalPlanned` to see how many checks actually ran (gate gating, route filtering, chaos selection). A non-zero `totalPlanned` with zero `totalExecuted` means all gates were disabled or no routes matched.
### executedRoutes / skippedRoutes
```json
{
"executedRoutes": ["POST /orders", "GET /orders/:id", "GET /items"],
"skippedRoutes": [
{ "route": "DELETE /items/:id", "reason": "No scenario covers this route" },
{ "route": "GET /health", "reason": "Not selected by chaos strategy: one" }
]
}
```
`executedRoutes` lists every route that had at least one scenario step, stateful command, or chaos injection. `skippedRoutes` explains why every other discovered route was excluded.
### profileGates
```json
{
"profileGates": {
"scenario": true,
"stateful": true,
"chaos": false
}
}
```
Shows which gates were active. Combine with `executionSummary` per-gate counts to verify each active gate produced results.
### stepTraces
Each entry records an individual step execution:
```json
{
"stepTraces": [
{
"step": 0,
"name": "create-order",
"route": "POST /orders",
"durationMs": 12,
"status": "passed"
}
]
}
```
Filter by `status` to isolate failures. Look at `durationMs` for performance regressions.
### failures
```json
{
"failures": [
{
"route": "POST /orders",
"contract": "status:201",
"category": "runtime",
"replayCommand": "apophis replay --artifact reports/apophis/qualify-2026-05-21T...json"
}
]
}
```
`replayCommand` gives a copy-pasteable command to re-run the exact same seed with the stored artifact for triage.
## Zero-Execution Guardrail