# APOPHIS Remediation And Refactor Plan Date: 2026-05-21 Source assessment: `APOPHIS_ADOPTION_AUDIT.md` Goal: resolve the adoption-blocking gaps identified in the audit and move APOPHIS from "credible pilot" to "safe team-standard tool" for Fastify v5 ESM services. ## Summary The audit findings are fixable without rewriting APOPHIS. The necessary refactor is not a large architecture replacement; it is mostly consolidation of duplicate execution paths, making discovery state explicit, and tightening product claims around observe/qualify. Recommended sequencing: 1. Fix route discovery observability and messages first. 2. Make `createFastify()` fail loudly when requested runtime registration fails. 3. Fix route filter wildcard matching. 4. Unify CLI `verify` with the programmatic PETIT execution path. 5. Decide whether `observe` is a real runtime feature now or a config-validation command for this release. 6. Expand or relabel `qualify` chaos coverage. 7. Document outbound mock isolation and introduce a scoped alternative. 8. Clean up docs, config extension points, and TypeScript loading semantics. ## Target End State APOPHIS should provide these guarantees to adopters: - If contracts exist on routes, `doctor` and `verify` can tell whether they were discovered with full schema metadata or only as schema-less fallback paths. - `apophis verify` and `fastify.apophis.contract()` exercise routes through the same request generation and contract evaluation engine. - `createFastify({ apophis: { runtime: 'error' } })` either enables runtime validation or fails loudly. - Route filters are literal-safe with only documented wildcard semantics. - `observe` docs match actual behavior, or observe emits real non-blocking events to configured sinks. - `qualify` reports actual coverage and does not imply broad chaos coverage when only one route is sampled. - Outbound mocks are clearly documented as process-global unless a scoped runtime is used. - Config remains strict for APOPHIS-owned fields while allowing documented user metadata. ## Track 1: Route Discovery Metadata Visibility Priority: P0 Risk: medium Primary files: - `src/domain/discovery.ts` - `src/infrastructure/discovery-hook.ts` - `src/cli/core/app-loader.ts` - `src/plugin/builders.ts` - `src/cli/commands/doctor/checks/routes.ts` - `src/test/integration.test.ts` - `src/test/cli/doctor-consistency.test.ts` - `docs/getting-started.md` ### Problem APOPHIS can discover route paths through `printRoutes()` after routes are registered, but it cannot recover route schemas or behavioral annotations from that fallback. Users can see "routes found" but "no contracts found" without understanding that discovery degraded. ### Refactor Path 1. Replace `discoverRoutes()` internal return flow with metadata-aware discovery. Add an internal result type: ```ts export interface DiscoveryResult { routes: RouteContract[] source: 'captured' | 'legacy-routes-array' | 'print-routes' | 'none' hasSchemaMetadata: boolean warnings: string[] } ``` 2. Keep `discoverRoutes()` for compatibility, but implement it as `discoverRouteDetails(instance).routes`. 3. In `discoverRoutesFallback()`, set `source: 'print-routes'` and `hasSchemaMetadata: false`. 4. Update `buildContract()` empty-suite error to distinguish: - no routes discovered - routes discovered without schema metadata - routes discovered with schemas but no `x-ensures`/`x-requires` 5. Add a doctor route check that reports: - pass: captured routes with schemas - warn: schema-less fallback route discovery - fail: no routes discovered 6. Update CLI `verify` artifacts/warnings to include discovery source when no contracts are found. 7. Update docs to say `printRoutes()` fallback can identify paths but cannot recover contract annotations. ### Compatibility Notes Do not remove `discoverRoutes()`. It is used widely. Add `discoverRouteDetails()` and migrate only the call sites that need diagnostics. ### Acceptance Criteria - `discoverRouteDetails()` returns `captured + hasSchemaMetadata: true` for routes captured by the plugin/discovery hook. - `discoverRouteDetails()` returns `print-routes + hasSchemaMetadata: false` for fallback routes. - `apophis doctor` warns when only schema-less fallback discovery is available. - `apophis verify` no-contract output says whether contracts are absent or unavailable due to schema-less discovery. - Existing route discovery tests still pass. ## Track 2: `createFastify()` Runtime Registration Semantics Priority: P1 Risk: low Primary files: - `src/fastify-factory.ts` - `src/test/integration.test.ts` - `docs/getting-started.md` - `README.md` ### Problem `createFastify()` silently catches all plugin registration failures when runtime validation is requested. This can leave users believing runtime validation is active when it is not. ### Refactor Path 1. Change `CreateFastifyOptions`: ```ts apophis?: { runtime?: 'off' | 'warn' | 'error' discoveryOnly?: boolean } ``` 2. Always install route discovery. 3. If `discoveryOnly === true`, skip plugin registration. 4. If `runtime` is set and not `off`, register the plugin and let failures throw. 5. Add one test that simulates plugin registration failure and asserts a thrown error when runtime is requested. 6. Add one test that `discoveryOnly: true` does not attempt runtime plugin registration. ### Acceptance Criteria - Runtime registration failures are visible. - Discovery-only behavior remains explicitly available. - Docs show `discoveryOnly` only for advanced/diagnostic use. ## Track 3: Safe Route Filter Matching Priority: P1 Risk: low Primary files: - `src/cli/commands/verify/runner.ts` - `src/infrastructure/wildcard-match.ts` - `src/test/cli/verify-ux.test.ts` ### Problem `matchRoutePattern()` builds regexes without escaping non-wildcard regex metacharacters. ### Refactor Path 1. Add a route-specific helper instead of reusing URL target semantics directly: ```ts export function matchesWildcardPattern(value: string, pattern: string): boolean ``` 2. Escape regex metacharacters first. 3. Replace escaped `\*` with `.*` and escaped `\?` with `.` only after escaping. 4. Anchor the regex. 5. Use this helper in `verify/runner.ts`. 6. Add tests for literal `.`, `+`, `(`, `)`, `[`, `]`, `$`, `^`, and `\` in route filters. ### Acceptance Criteria - Literal route filters match literal paths. - `*` and `?` retain documented wildcard behavior. - Invalid user patterns cannot throw regex syntax errors. ## Track 4: CLI Verify And PETIT Engine Unification Priority: P1 Risk: high Primary files: - `src/cli/commands/verify/runner.ts` - `src/quality/petit-runner.ts` - `src/quality/petit-command-step.ts` - `src/quality/route-filter.ts` - `src/domain/request-builder.ts` - `src/domain/schema-to-arbitrary.ts` - `src/cli/commands/verify/index.ts` - `src/test/cli/verify-ux.test.ts` - `src/test/cross-operation-support.test.ts` ### Problem The CLI `verify` path has a simpler request builder and execution loop than the programmatic contract runner. This creates inconsistent behavior and makes the main user workflow less capable than the library API. ### Refactor Path 1. Extract a shared verification core from `runPetitTests()`. Candidate API: ```ts export interface ContractExecutionOptions extends TestConfig { routeFilters?: string[] profileRoutes?: string[] failOnNoRoutes?: boolean failOnNoContracts?: boolean } export async function runContractVerification( fastify: FastifyInjectInstance, options: ContractExecutionOptions, deps?: ContractExecutionDeps, ): Promise ``` 2. Move route filtering into shared helpers. Current filtering exists in both CLI verify and PETIT route filtering. Collapse these into one module that supports: - route patterns - scope filters - profile routes - HEAD exclusion - skipped-route reporting 3. Make CLI `runVerify()` call the shared core. 4. Preserve CLI artifact shape by mapping `ContractVerificationResult` into current `VerifyRunResult`. 5. Keep the existing simple example-body execution only if explicitly needed as a `--sample example` mode. Do not make it the default if the product claim is property-based verification. 6. Add config support for `runs` in the CLI verify path. Today `verifyCommand()` passes timeout but does not fully pass the resolved preset/profile run configuration into the execution core. 7. Add tests for CLI verify that cover: - generated body values from schema constraints - querystring generation - path parameter generation - variants - non-POST bodies where Fastify route semantics allow them - route-level timeouts - extension build-request hooks, if verify should support them ### Suggested Intermediate Step Before full unification, replace CLI `buildRouteRequest()` with `buildRequest()` plus `convertSchema()` sampling. This reduces behavior drift while the larger shared core is extracted. ### Acceptance Criteria - A route tested by `fastify.apophis.contract({ seed, runs })` and `apophis verify --seed` uses the same request generation semantics. - CLI verify respects configured `runs`. - CLI verify can exercise query, path, body, headers/variants, and cross-operation contracts. - Golden CLI output remains stable except for intentional added diagnostics. ## Track 5: Observe Mode Decision And Implementation Priority: P0 Risk: medium if docs-only, high if implementing runtime observe Primary files: - `src/cli/commands/observe/index.ts` - `src/cli/commands/observe/validator.ts` - `src/plugin/index.ts` - `src/infrastructure/hook-validator.ts` - `src/types/core.ts` - `docs/observe.md` - `README.md` ### Problem Docs imply production runtime visibility and drift detection. Current code validates observe configuration and prints what would be activated. ### Path A: Honest Docs For Current Release This is the safer short-term path. 1. Rename docs language from "observe activates runtime visibility" to "observe validates runtime-observe readiness". 2. Make `observe` command output explicitly say "no runtime observer is started by this command". 3. Keep `observe --check-config` as the canonical current behavior. 4. Add README "Current Limitations" entry for observe. Acceptance criteria: - No docs claim request telemetry is emitted unless a tested implementation exists. - Platform teams can tell observe is a readiness/config command. ### Path B: Implement Real Observe This is the product-complete path. 1. Define sink interface: ```ts export interface ObserveSink { emit(event: ObserveEvent): void | Promise } ``` 2. Define event schema: ```ts interface ObserveEvent { type: 'contract.pass' | 'contract.violation' | 'contract.error' route: string method: string statusCode: number durationMs: number formula?: string error?: string sampled: boolean timestamp: string } ``` 3. Extend plugin options with observe config: ```ts observe?: { enabled?: boolean sampling?: number blocking?: false sinks?: ObserveSink[] } ``` 4. Reuse runtime validation hook evaluation, but in observe mode never throw and never delay responses waiting for sinks. 5. Implement bounded async sink dispatch: - fire-and-forget by default - max queue length - dropped-event counter - sink failure isolation 6. Add integration tests: - passing contract emits pass event when sampled - failing contract emits violation event and response remains successful - sink throw does not affect response - sampling 0 emits nothing - sampling 1 emits deterministically in tests Recommended decision: do Path A immediately, then Path B as a dedicated feature milestone. ## Track 6: Qualify Chaos Coverage And Cleanup Outcomes Priority: P1 Risk: medium Primary files: - `src/cli/commands/qualify/runner.ts` - `src/cli/commands/qualify/chaos-handler.ts` - `src/cli/commands/qualify/index.ts` - `src/cli/core/config-loader.ts` - `src/types/formula.ts` - `src/infrastructure/cleanup-manager.ts` - `src/test/cli/qualify-signal.test.ts` ### Problem Qualify chaos currently picks one route deterministically. Cleanup failures are represented but not wired to real cleanup outcomes. ### Refactor Path 1. Extend `ChaosConfig`: ```ts strategy?: 'one' | 'all' | 'sample' | 'routes' sampleSize?: number routes?: string[] ``` 2. Implement `selectChaosRoutes(routes, chaosConfig, seed)` as a pure helper. 3. Run chaos for all selected routes and return `chaosResults: ChaosRunResult[]` instead of a single optional result. 4. Preserve a compatibility field temporarily if needed: ```ts chaosResult?: ChaosRunResult chaosResults: ChaosRunResult[] ``` 5. Update artifacts and renderers to report: - planned chaos route count - executed chaos route count - applied chaos count - skipped chaos routes and reasons 6. Wire `CleanupManager` into qualify runner instead of simulated empty cleanup failures. 7. Update docs to describe default strategy. Recommended default: `sample` with `sampleSize: 1`, explicitly labeled as sampled. ### Acceptance Criteria - Qualify artifacts can prove exactly which routes were chaos-tested. - Users can request all-route chaos explicitly. - Cleanup failures reflect actual cleanup manager failures. - Existing deterministic seed tests remain stable. ## Track 7: Outbound Mock Isolation Priority: P1 Risk: medium Primary files: - `src/infrastructure/outbound-mock-runtime.ts` - `src/infrastructure/production-safety.ts` - `src/quality/petit-runner.ts` - `src/quality/stateful-runner.ts` - `docs/quality.md` - `docs/getting-started.md` ### Problem Outbound mocks patch `globalThis.fetch`. This is simple but process-global and unsafe for parallel suites. ### Refactor Path 1. Document current global behavior immediately. 2. Add a runtime mode: ```ts isolation?: 'global-fetch' | 'undici-mock-agent' ``` 3. Implement `undici-mock-agent` for consumers that use undici/fetch-compatible clients. 4. Add a global install guard with owner metadata: ```ts activeRuntimeId: string installedAt: Error stack or timestamp ``` Error messages should say which runtime currently owns the global patch. 5. Add tests for overlapping installs and restore order. 6. Add docs with recommended test-runner settings: - run outbound mock tests serially - isolate by process - prefer scoped MockAgent where possible ### Acceptance Criteria - Users are warned when they choose global mocks. - Overlapping global installs fail with actionable diagnostics. - Scoped mock path exists for teams that can use it. ## Track 8: Documentation Maturity Alignment Priority: P2 Risk: low Primary files: - `README.md` - `docs/getting-started.md` - `docs/observe.md` - `docs/qualify.md` - `docs/verify.md` - `docs/troubleshooting.md` ### Problem Docs sometimes describe intended platform behavior rather than current implementation behavior. ### Refactor Path 1. Add `Current Limitations` to README: - route discovery fallback loses schemas - observe is config readiness unless real observe is implemented - qualify chaos default is sampled unless configured otherwise - outbound mocks are process-global unless scoped mode is used 2. Add `Recommended Integration`: - use `createFastify()` for new apps - register APOPHIS/discovery before routes for existing apps - run `doctor` and confirm schema-backed discovery 3. Update `getting-started.md` to avoid saying nested response annotations are selected by actual status code unless implemented. 4. Add a migration note for teams with already-registered routes. 5. Keep docs smoke tests updated. ### Acceptance Criteria - No doc makes a runtime/coverage claim that lacks implementation and tests. - Quickstart remains short but links to limitations. - Docs smoke tests pass. ## Track 9: TypeScript Entrypoint Loading Semantics Priority: P2 Risk: medium Primary files: - `src/cli/core/app-loader.ts` - `src/cli/core/config-loader.ts` - `src/test/cli/init.test.ts` - `docs/cli.md` - `docs/troubleshooting.md` ### Problem The loader says TypeScript entrypoints require `tsx`, but it uses dynamic import directly. This is environment-sensitive. ### Refactor Path Choose one policy. Policy A: JS-only installed CLI. - Reject `.ts` app/config entrypoints unless `process.execArgv` includes a TS loader. - Error tells users to export a JS app entrypoint or run through `tsx`. Policy B: Built-in TS loading. - Detect `.ts` entrypoints. - Use `tsx` programmatically or spawn a subprocess through `tsx`. - Keep dependency/devDependency implications clear. Recommended policy: A for now. It is simpler, honest, and avoids magic loader behavior in installed CLIs. ### Acceptance Criteria - Installed CLI behavior is deterministic. - `.ts` entrypoint errors are clear and tested. - Docs match actual supported path. ## Track 10: Config Extensibility Namespace Priority: P2 Risk: low Primary files: - `src/cli/core/config-loader.ts` - `src/test/cli/config-validation.test.ts` - `docs/cli.md` ### Problem Strict unknown-key rejection is useful, but teams need a place for internal metadata. ### Refactor Path 1. Add top-level `metadata?: object` to config schema. 2. Allow `x-*` keys at top-level and inside profiles/presets/environments without validation beyond JSON object compatibility. 3. Document that APOPHIS will never interpret `metadata` or `x-*` fields unless promoted in a future major version. 4. Keep all APOPHIS-owned fields strict. ### Acceptance Criteria - Unknown typo like `rouets` still fails. - `metadata.owner = 'platform'` passes. - `x-team-policy` passes. - Tests cover top-level and nested metadata. ## Cross-Cutting Test Plan Run after each track: ```bash npm run typecheck npm run build ``` Run before merging a track: ```bash npm run test:src npm run test:cli npm run test:docs ``` Add targeted tests for each track before refactoring where feasible. For high-risk tracks, add characterization tests first. ## Suggested Milestones ### Milestone 1: Safety And Honesty Tracks: - Track 1: discovery visibility - Track 2: `createFastify()` loud failures - Track 3: route filter escaping - Track 8 Path A docs updates for current limitations Outcome: fewer silent failures; docs align with current behavior. ### Milestone 2: Verify Quality Tracks: - Track 4: CLI verify/PETIT unification - Track 9: TypeScript loading policy - Track 10: config metadata namespace Outcome: the main CLI workflow reflects the real engine and is easier to adopt in teams. ### Milestone 3: Platform Features Tracks: - Track 5 Path B: real observe implementation - Track 6: qualify chaos coverage and cleanup outcomes - Track 7: outbound mock isolation Outcome: production/platform-facing claims become technically defensible. ## Work Breakdown Estimate | Track | Size | Risk | Suggested Owner | |---|---:|---|---| | Discovery metadata visibility | M | Medium | Core/Fastify integration | | `createFastify()` semantics | S | Low | Core/Fastify integration | | Route filter escaping | S | Low | CLI | | CLI verify/PETIT unification | L | High | Quality engine + CLI | | Observe decision/docs | S | Low | Docs + CLI | | Real observe implementation | L | High | Runtime/platform | | Qualify chaos coverage | M | Medium | CLI quality | | Outbound mock isolation | M | Medium | Runtime/testing infra | | Docs maturity alignment | S | Low | Docs | | TypeScript loading policy | M | Medium | CLI | | Config metadata namespace | S | Low | CLI config | ## Dependency Graph - Track 1 should happen before Track 4, because verify unification needs clear discovery diagnostics. - Track 3 can happen anytime. - Track 2 can happen anytime. - Track 5 Path A should happen immediately if Path B will not ship in the same release. - Track 6 can happen independently of Track 4 but should reuse shared route filtering after Track 4 if possible. - Track 7 can happen independently but affects PETIT/stateful/qualify if scoped mocks are introduced. - Track 8 depends on decisions from Tracks 5, 6, and 7. - Track 9 should happen before making stronger CLI adoption claims. - Track 10 can happen anytime. ## Release Recommendation For the next release, do not attempt every deep refactor at once. Ship a release focused on safety, honesty, and mainline CLI correctness: 1. Discovery visibility and doctor warning. 2. `createFastify()` loud failure semantics. 3. Safe route filter matching. 4. Docs current-limitations update. 5. Initial CLI verify request-generation convergence, even if full PETIT unification takes another release. Then schedule real observe, full verify unification, qualify coverage, and outbound mock isolation as dedicated milestones with their own acceptance tests.