feat: plugin contract e2e, qualify --changed, production observe, regressions

2026-05-22 11:05:52 -07:00
parent d0523fcc2d
commit 1de735ee08
34 changed files with 1392 additions and 122 deletions
@@ -8,11 +8,11 @@ This audit is based on code inspection plus command verification, not documentat

 ## Executive Summary

-APOPHIS has real product value. It is not just a schema wrapper: it gives Fastify teams a way to express and verify behavioral API promises that OpenAPI/JSON Schema cannot cover, especially cross-route invariants such as create/read consistency, delete semantics, auth/session flows, state transitions, idempotency, outbound dependency expectations, and replayable counterexamples.
+APOPHIS has real product value. It is not just a schema wrapper: it gives Fastify teams a way to express, verify, and observe behavioral API promises that OpenAPI/JSON Schema cannot cover, especially cross-route invariants such as create/read consistency, delete semantics, auth/session flows, state transitions, idempotency, outbound dependency expectations, and replayable counterexamples.

 I would adopt APOPHIS today as a focused behavioral verification tool for Fastify v5 ESM services. I would start with CI `verify` and a small number of high-value contracts, then expand into `qualify` and runtime observation once the team has clear operating guidance.

-I would not yet treat it as a complete production observability platform or a turnkey organization-wide release gate. The core implementation is strong, but the remaining value gap is mostly around operational maturity: standalone observe activation, deeper tests around recent CLI behavior, richer scenario authoring, and clearer release-gate recommendations.
+I would not yet treat it as a complete production observability platform or a turnkey organization-wide release gate. The core implementation is strong, but the remaining value gap is mostly around operational maturity: standalone observe process management, richer scenario authoring, and organization-specific release-gate policy.

 Adoption verdict: strong team pilot candidate, credible standardization candidate after the remaining gaps below are addressed.

@@ -34,10 +34,10 @@ Observed results:
 |---|---:|
 | Typecheck | pass |
 | Build | pass |
-| Source tests | 587 pass, 0 fail |
-| CLI tests | 311 pass, 0 fail |
+| Source tests | 590 pass, 0 fail |
+| CLI tests | 320 pass, 0 fail |
 | Docs smoke tests | 4 pass, 0 fail |
-| Total tests | 902 pass, 0 fail |
+| Total tests | 921 pass, 0 fail |

 The working tree contains many broader project changes unrelated to this audit. This document evaluates the current working tree state.

@@ -51,7 +51,7 @@ Mostly yes for behavioral verification. Partially for production observation and
 | Deterministic CI verification | Yes, materially. CLI `verify` now honors configured `runs`, uses seeded request generation, emits artifacts, supports route filters, replay metadata, and machine-readable output. |
 | Cross-route behavior | Yes for supported formula operations and route-call semantics. This is the most differentiated value. |
 | Runtime validation | Yes when the plugin is explicitly configured outside production. Production enforcement is intentionally blocked. |
-| Runtime observation | Partially. Programmatic plugin observation exists and emits non-blocking sink events with sampling. The CLI validates/report readiness but does not attach to or run a service. |
+| Runtime observation | Yes for programmatic production-safe hooks. APOPHIS emits non-blocking sink events with sampling in production when `observe.enabled` and sinks are configured. The CLI validates/report readiness but does not attach to or run a service. |
 | Stateful/scenario/chaos qualification | Partially. The runner and artifacts are useful, route discovery is now shared with verify, and config supports scenarios/chaos knobs. Scenario authoring is still young and needs more real-world examples/tests. |
 | Outbound dependency mocking | Useful but intentionally process-global. The misleading scoped `undici-mock-agent` option has been removed. Teams still need careful test isolation. |
 | Team-safe onboarding | Good. The package has CLI help, init/doctor/replay/verify/qualify/observe, config validation, machine output, docs smoke tests, packaging tests, and production safety checks. |
@@ -102,12 +102,17 @@ The following earlier adoption risks have been addressed in the current working
 |---|---|
 | CLI `verify` runs | `VerifyRunnerDeps` accepts `runs`; `verifyCommand()` passes resolved config; `runVerify()` executes contracts for `contractRuns`. |
 | Observe sampling | `hook-validator.ts` gates sink emission using `opts.observe.sampling` before emitting pass/violation/error events. |
+| Production observe activation | `apophisPlugin` now keeps blocking runtime validation disabled in production while allowing non-blocking observe sinks to emit pass/violation/error events. |
 | Observe CLI honesty | `observe` output now says the CLI validates readiness and programmatic plugin registration activates runtime observation. |
 | Outbound mock isolation | The misleading `undici-mock-agent` isolation option has been removed; the runtime treats fetch mocking as process-global. |
 | Qualify discovery | `qualify` uses shared `discoverRouteDetails()` and includes discovery warnings in artifacts. |
 | Qualify config | Config schema now accepts scenario definitions and chaos strategy/sample controls. |
 | Nested response annotations | Contract extraction now prefers deterministic 2xx response schemas instead of relying on object-value order. |
 | `--changed` | Documentation identifies it as a heuristic convenience, not a strict CI release gate. |
+| Plugin contracts (end-to-end) | Full pipeline: config schema, plugin registration, compose+merge in all runners, precondition→skip, auto-inject headers, source attribution (`formulaSources`), failure counting, `drainWarnings()` collection, production safety. Wired through verify, qualify (scenario/stateful/chaos), and replay. |
+| Artifact pipeline CI/CD | 6 CI-facing regression tests: json-summary parseable, ndjson-summary parseable, `--quiet` persistence, skipped field presence, exit code 0 on pass, qualify json-summary. Verify→replay round-trip test with plugin contracts. |
+| CLI output hygiene | Console.warn bleeding fixed (`drainWarnings`); `json-summary`→`human` format normalization bug fixed; `--quiet` no longer suppresses machine format output. |
+| Qualify --changed | Qualify now supports `--changed` flag with same git-diff heuristic as verify. Prints match count, exits 0 when no changed routes. |

 ## Remaining Adoption Gaps

@@ -121,7 +126,7 @@ The implementation supports runtime observation only when the application explic
 - Integration tests prove sink sync failures and async rejections never change route responses.
 - Integration tests prove sampling: 0 suppresses all events; sampling: 1 emits expected `contract.pass`/`contract.violation` events.

-**Still open:** A future `apophis observe --app ./app.ts` mode that activates a running service observer.
+**Still open:** A future `apophis observe --app ./app.ts` mode that imports and starts a service for local/staging smoke observation. Production observation itself is now programmatic and active through plugin registration.

 ### P1: Recent `verify` Runs Behavior Now Has Regression Tests

@@ -130,7 +135,7 @@ The implementation supports runtime observation only when the application explic
 - Regression test proves `runs: 5` scales multiplicatively from `runs: 1`.
 - Regression test proves `runs: 10` is deterministic at the same seed.

-**Still open:** Variant-aware runs test (verifying run budget is per-variant or shared).
+**Completed:** Variant-aware runs regression proves the run budget is applied per variant.

 ### P1: Qualify Product Shape Improved

@@ -139,7 +144,7 @@ The implementation supports runtime observation only when the application explic
 - Configured-scenario qualify test added (independent of OAuth fixture routes).
 - `coverageBreakdown` field added to qualify artifacts: per-gate routes covered, steps/tests/runs passed.

-**Still open:** Clear guidance for nightly/staging use versus pull-request gating in qualify docs.
+**Completed:** `docs/qualify.md` now documents pull-request versus nightly/staging gate guidance.

 ### P1: Outbound Mocks Process-Global, Honestly Documented

@@ -210,14 +215,14 @@ High-value first contracts:
 | Fastify fit | 8/10 | Strong plugin/inject/decorator alignment; discovery order still matters. |
 | Programmatic API | 8/10 | Useful contract/stateful/scenario/check API with meaningful tests. |
 | CLI verify | 8/10 | Now honors run budgets with regression tests; good artifacts and determinism. |
-| Observe | 7/10 | Runtime sink primitives, sampling, and sink-failure-resilience exist with tests. Production-style docs added. Standalone operational story not complete. |
+| Observe | 8/10 | Production-safe non-blocking sink emission, sampling, and sink-failure-resilience exist with tests. Standalone process-management story is still future work. |
 | Qualify | 7/10 | Improved discovery/config/scenarios. Coverage breakdown in artifacts. Needs richer scenario examples and gating guidance. |
 | Outbound mocking | 7/10 | Useful and honest about process-global behavior. Docs and README explicit. True scoped mocking remains future work. |
 | Docs | 8/10 | Broad and increasingly precise. Observe and qualify docs expanded with real code examples. |
 | Packaging | 9/10 | Strong for a Node/Fastify package. |
 | Team readiness | 8/10 | Ready for pilot and selective CI use with regression-locked verification behavior. |

-Overall: 8/10 for real team pilot use. Potential 9/10 if observe gains a clearer production story and qualify gets first-class CI workflow guidance.
+Overall: 8.5/10 for real team pilot use. Potential 9/10 if standalone observe process management and richer scenario libraries become first-class.

 ## Highest-Impact Next Work

@@ -226,9 +231,13 @@ Overall: 8/10 for real team pilot use. Potential 9/10 if observe gains a clearer
 3. ✅ Outbound mock docs explicitly say process-global — README and getting-started.md updated.
 4. ✅ Qualify scenario config documented with full examples in qualify.md.
 5. ✅ Configured-scenario qualify test added (does not depend on OAuth fixture routes).
-6. Add full production-style observe example with a real collector sink implementation.
-7. Improve qualify artifact coverage summaries to distinguish route-contract, scenario, stateful, and chaos coverage more clearly.
-8. Consider true scoped outbound mocking (undici dispatcher) only if concurrent in-process dependency tests become a core promise.
+6. ✅ Full production-style observe example with real collector sink implementation added to docs/observe.md.
+7. ✅ Plugin contract support end-to-end: docs, tests, all runners wired.
+8. ✅ Artifact pipeline CI/CD regression tests: json-summary, ndjson-summary, --quiet, skipped field, exit codes.
+9. ✅ Qualify --changed implemented.
+10. Add standalone observe process management (`apophis observe --app ./app.ts`) for local/staging observation.
+11. Add route ownership / file-to-route maps for precise `--changed` filtering.
+12. Consider true scoped outbound mocking (undici dispatcher) only if concurrent in-process dependency tests become a core promise.

 ## Bottom Line