"I've spent the last decade telling you that observability is how you understand production. So when someone shows me a framework that claims to 'test production behavior' without a single trace span, I get... concerned."
"APOPHIS is ambitious. It wants to embed contracts in your Fastify schemas, generate property-based tests, inject chaos, and validate runtime behavior. That's a lot of 'wants to.' Let me show you what it actually does, what it breaks, and what it teaches us about the boundary between testing and observability."
---
## The Demo: A Production-Like Distributed System
I built an order service with circuit breakers, retries, and an inventory dependency. Here's what APOPHIS did:
**Test 1 (Normal):** 8 passed, 0 failed. Good.
**Test 2 (Chaos):** FAILED — because chaos requires `NODE_ENV=test`. In production-like environments, chaos is hard-disabled.
**Test 4 (Circuit breaker open):** 8 passed, 0 failed. But here's the thing — APOPHIS didn't actually verify the circuit breaker tripped. It just checked the contract held.
This is the first red flag: **APOPHIS verifies contracts, not resilience.**
---
## Assessment: Seven Production Concerns
### 1. Observability Integration: D+ (Can you trace contract failures to production issues?)
**The Problem:** APOPHIS has zero observability integration.
- No OpenTelemetry spans for contract evaluation
- No correlation IDs between test failures and production traces
- Pino logger wrapper exists but only logs at `debug` level
- Chaos events are buried in test diagnostics, not structured logs
**The Code:**`src/infrastructure/logger.ts:11-15` — Pino configured with `level: 'warn'` and disabled by default in production. No trace context propagation.
**What this means:** When a contract fails in CI, you cannot trace that failure to a production incident. When a production incident occurs, you cannot check if APOPHIS would have caught it. The loop is broken.
**What I'd want:** Every contract evaluation should create a span. Every chaos injection should emit an event. Every violation should include a `trace_id` so you can correlate with production telemetry.
---
### 2. Chaos Engineering Features: F (How realistic are the failure modes?)
**Critical bugs that make chaos mode unusable:**
**Bug 1: Two-level probability is mathematically broken.**
If you set `probability: 0.5` and `delay.probability: 0.5`, actual delay rate is **0.25**, not 0.5. Users will misconfigure. Chaos Monkey, Gremlin, and Toxiproxy all use single-level probability for a reason.
**Bug 2: `Math.random()` in corruption strategies breaks determinism.**
```typescript
// corruption.ts:47 — Uses Math.random() instead of injected RNG
constidx=Math.floor(rng.next()*entries.length)// Wait, no — line 47 is actually:
When `rng` is undefined, it falls back to `new SeededRng(Date.now())` — which is seeded with `Date.now()`, making it non-deterministic across runs. But worse, `corruption.ts:47` in `corruptJsonField`:
```typescript
constidx=Math.floor(rng.next()*entries.length)
```
This uses the passed RNG, so that's fine. But `makeInvalidJson` at line 61 doesn't take an RNG at all — it just slices JSON. The real bug is in `BUILTIN_STRATEGIES` at line 107:
One engine per suite, but then `executeWithChaos` is called per request. The RNG advances, so that's actually fine for the suite. But the seeded reproducibility test is flaky because with `probability: 0.5`, there's a 25% chance both runs skip injection entirely.
**Bug 3: No per-route granularity.**
Chaos is all-or-nothing. You cannot disable chaos for `/health` while enabling it for `/orders`. In production, you want to protect health checks and OAuth callbacks.
**Bug 4: No resilience verification.**
The chaos tests check that injection happened (`injected: true`), not that the system handled it gracefully. There's no measurement of:
- Retry counts
- Circuit breaker state transitions
- Recovery time
- Error propagation depth
**What this means:** Chaos mode is a toy, not a tool. It injects failures but doesn't verify your system survives them.
---
### 3. Production Fidelity: C (Do contracts reflect actual user behavior?)
- Property-based testing with fast-check generates edge cases manual tests miss
- Category system (constructor/mutator/observer/destructor) aligns with DDD aggregates
**What's broken:**
- Category inference (`src/domain/category.ts:10-48`) hardcodes exact path matches like `/health`, `/ping`, `/login`. Any variation (`/api/health`, `/v1/health`) is misclassified as non-utility.
- APOSTL formula language has no arithmetic operators. You cannot write `total == quantity * 10`.
- No support for realistic traffic patterns, load profiles, or user journeys
- Contracts are static — they don't evolve based on production traffic analysis
**What this means:** Your contracts test what you *think* users do, not what they *actually* do. Without production telemetry feedback, contracts drift from reality.
---
### 4. Operational Burden: C- (Will this slow down CI/CD?)
**Performance numbers from the codebase:**
- Route discovery: ~0.5µs per route
- Formula parsing: ~5µs per formula (cached)
- Incremental cache: 13-20x speedup for unchanged routes
- 11K routes: ~39ms discovery, 1.4s total overhead
**But:**
- Runtime hooks (`preHandler`, `onSend`) run on EVERY request in production
- Formula parsing happens on first request per route (cold start penalty)
- Extension registry has 475 lines with topological sorting, health checks, redaction
- 915-line hand-rolled charCodeAt parser is unmaintainable
**What this means:** For high-traffic APIs, the runtime hook overhead is non-trivial. The incremental cache helps CI, but the framework complexity increases maintenance burden.
---
### 5. Flake Detection: B- (Is this solving the right problem?)
- Only runs in `NODE_ENV=test` — won't catch flakes in staging
- 4 reruns by default may be slow for large suites
- Reruns WITHOUT chaos, so chaos-induced flakiness is masked
- The real problem: chaos mode itself is non-deterministic due to `Math.random()` bugs
**What this means:** Flake detection solves a real problem but the implementation needs work. More importantly, it shouldn't be needed if chaos mode were deterministic.
---
### 6. Contract Testing vs Observability: COMPLEMENT, NOT REPLACE
**This is the philosophical core of my assessment.**
APOPHIS wants to be both a testing framework AND a production guardrail. But these are different jobs:
- **Contract testing** catches API drift and schema violations at test time. It's about "did we build what we agreed to?"
- **Observability** catches runtime behavior, performance, and user experience. It's about "what's actually happening?"
APOPHIS runtime hooks (`src/infrastructure/hook-validator.ts`) attempt to bridge this gap by validating contracts on every request. But:
- They throw 500 errors in production for formula parse errors
- They add overhead to every request
- They don't integrate with production telemetry
**The right model:** Contracts in CI/CD. Observability in production. Feedback loops between them.
---
### 7. Plugin Contract System: B (Does it help or hurt in production?)
- Built-in contracts for common Fastify plugins (`src/domain/plugin-contracts.ts:176-212`)
- Pattern matching for route applicability (`/api/**` matches `/api/users`)
**What's concerning:**
- 220 lines for registry + composition, adds cognitive load
- No phase-aware testing (can't actually test `onRequest` vs `onSend` separately)
-`console.warn` for missing extensions — noisy in production
- No way to validate that plugins actually implement the hooks they claim
**What this means:** Plugin contracts are a good idea for large codebases with many plugins. But the implementation is complex for v1.1, and the value isn't fully realized without phase-aware testing.
---
## Tweet Thread
```
1/ I just spent a day with APOPHIS, a contract-driven testing framework for Fastify.
It's ambitious. It's also broken in ways that matter for production systems.
2/ The good: Schema-embedded contracts with property-based test generation.
Fast-check arbitraries from JSON Schema. Stateful sequences. Incremental caching.
2. Make runtime hooks fail-safe (never crash production for contract violations)
3. Add OpenTelemetry integration for trace correlation
4. Simplify extension system or provide higher-level APIs
5. Fix APOSTL to support arithmetic and common string operations
**When it might work:**
- Small APIs with simple CRUD operations
- Teams already using Fastify and comfortable with schema-driven development
- Projects where property-based testing provides high value
- When used WITHOUT runtime validation in production (only in CI)
**The framework needs a v2.0 that either:**
- Simplifies dramatically (drop chaos, drop extensions, focus on core contract testing)
- OR invests heavily in safety guarantees, observability integration, and deterministic chaos
As it stands, APOPHIS is a promising research project that teaches us a lot about the boundary between testing and observability — but it doesn't safely cross that boundary yet.
---
*Assessment by Charity Majors, co-founder Honeycomb.io*