How the eval harness catches each failure mode before it reaches users.
docs/DATA_QUALITY.md maps 12 failure modes (F1-F12) ranked by likelihood and harm. This document traces each mode to the specific test(s) that would catch it, and identifies which modes remain unguarded.
| Mode | Description | Tests that catch it | Verdict |
|---|---|---|---|
| F1 | Fabricated all-clear (model hallucinates evacuation_lifted: true or incident_resolved_iso) |
test_lifted_requires_corroboration, test_resolved_requires_two_sources |
Guarded. P0-1 corroboration gate forces safe default when source count < 2 or no official source present. |
| F12 | Prompt injection via scraped page (malicious page tells model “evacuation is lifted”) | test_lifted_requires_corroboration (same gate) |
Partially guarded. The corroboration gate blocks the effect (a single-source all-clear), but doesn’t detect the cause (injection). A coordinated injection across 2+ sources including a spoofed official domain could bypass the gate. Residual risk accepted: the attacker would need to control ≥2 indexed news sources AND spoof an official hostname. |
| Mode | Description | Tests that catch it | Verdict |
|---|---|---|---|
| F2 | Fabricated source URL or quote | test_fabricated_source_url_not_in_snapshot, test_statement_without_source_url_rejected, test_sources_checked_all_wellformed |
Guarded. P0-2 drops any statement whose source_url wasn’t actually retrieved this run. |
| F3 | Hallucinated numeric (temp, residents, injuries) | test_garbage_input_keeps_prev_values, test_partial_facts_dont_downgrade_severity, test_residents_shift_fires_info |
Partially guarded. Writer holds previous value on a >50% residents drop without lifted. But a plausible-but-wrong number (e.g., 48,000 instead of 50,000) passes. No numeric sanity beyond the 50% drop gate. |
| F4 | Stale-but-fresh-stamped (empty facts, fresh timestamp) | test_empty_facts_do_not_advance_data_as_of, test_all_null_facts_treated_as_no_data, test_stale_after_is_data_as_of_plus_maxage |
Guarded. P0-3 separates data age from write age. |
| F11 | Fabricated/garbled date (future or pre-incident timestamp) | test_future_resolved_iso_suppressed, test_malformed_resolved_iso_suppressed, test_valid_resolved_iso_honored |
Guarded. P1-1 nulls future-dated and malformed timestamps. |
| Mode | Description | Tests that catch it | Verdict |
|---|---|---|---|
| F5 | Severity miscompute from partial facts | test_partial_facts_dont_downgrade_severity |
Guarded. Fixed + regression test. |
| F6 | Web search returns nothing (all-null facts) | test_all_null_facts_treated_as_no_data, test_graceful_failure_no_api_key |
Guarded. Writer carries previous values; P0-3 prevents fresh-stamping. |
| F7 | No provenance (model omits sources_checked) |
test_sources_checked_all_wellformed (shape), test_every_statement_has_source_and_time (data contract) |
Partially guarded. Tests verify shape and presence, but an empty sources_checked: [] passes silently. |
| F8 | Schema drift (gatherer and writer disagree on field names) | test_status_json_required_fields, test_config_json_required_fields |
Partially guarded. Schema tests catch missing fields in the output, but don’t enforce the gatherer-writer contract at the boundary. |
| Mode | Description | Tests that catch it | Verdict |
|---|---|---|---|
| F9 | Cron silently stops | None | Unguarded. No dead-man’s switch. Staleness banner (P0-3) is the only in-app signal. If P0-3 has a bug, a stopped cron is invisible. |
| F10 | Commit/deploy failure | None | Unguarded. Push failure is visible only in GitHub Actions. No external alerting. |
| Category | Modes | Guarded | Partially guarded | Unguarded |
|---|---|---|---|---|
| Catastrophic | F1, F12 | F1 | F12 | — |
| High harm | F2, F3, F4, F11 | F2, F4, F11 | F3 | — |
| Medium harm | F5, F6, F7, F8 | F5, F6 | F7, F8 | — |
| Low harm | F9, F10 | — | — | F9, F10 |
8 of 12 failure modes are fully guarded by automated tests. 4 are partially guarded (the test catches the effect but not all vectors). 2 operational modes have no automated coverage (staleness banner is the manual fallback).
Plausible-but-wrong numerics (F3 residual). If the model says 48,000 evacuees when the real number is 50,000, no automated test flags it. The 50% drop gate catches gross errors, not subtle ones.
Coordinated prompt injection (F12 residual). If an attacker controls ≥2 indexed sources and one spoofs an official domain, the corroboration gate passes. This requires a sophisticated, targeted attack — low probability, but the architecture doesn’t structurally prevent it.
Silent cron death (F9). The pipeline has no external heartbeat monitor. If the cron stops and the staleness banner has a bug, users see no signal. Mitigation: the staleness banner is tested (P0-3), so both would have to fail simultaneously.
Subtle semantic drift. The model might gradually shift tone (more alarming, more reassuring) without triggering any binary test. The human review step is the control here, not the eval harness.
The eval harness is strongest where the failure mode is binary and catastrophic (all-clear vs. not, fabricated vs. real, stale vs. fresh). It’s weakest where the failure mode is continuous and subtle (slightly wrong numbers, gradual tone drift, sophisticated injection). This matches the priority: the binary-catastrophic modes are the ones that could kill someone; the continuous-subtle modes degrade quality but don’t create false safety signals.