Boring Demo Generator

Analog Intruder: 📎 Paperclip Halo

Precision is cute; contingency is the point. Give me a claim, I’ll hand you a demo that survives a fluorescent room and a hostile audience.

Claim (what needs proving)

Context (where/constraints)

Environment

Risk tolerance

Time-box (minutes)

What to observe (metrics/telemetry)

Hold these constant (control the chaos) Artifacts to collect (receipts)

Signed at capture (filename + timestamp + operator) Raw video/photo + sensor logs Config snapshot (model hash, version, settings) Hash of inputs + outputs (SHA-256) One-page boring demo receipt (PDF/MD)

Pass/Fail strictness

Save on TAAFT

Tip: paste this plan into your change log. If someone can’t reproduce it after coffee, it wasn’t boring enough.

Make hype survive daylight.
Give me a falsifiable claim, a real context, and a timer. I’ll spit out a boring, reproducible demo plan that doesn’t need narration or vibes.

Short version:

Claim + Context = what you’re proving and where it breaks first.
Environment = how honest you’re being about reality.
Risk tolerance = guardrails + refusal behaviors.
Time-box = demo size (forces decisions).
Metrics + Constants = what you watch vs what you freeze.
Artifacts = receipts you’ll publish so a stranger can rerun it.

What each field actually does

Claim – The falsifiable sentence the whole plan orbits. Weak claim ⇒ weak plan. Write it like a bug you can close.
Context – Constraints that make the claim non-magical (lighting, device, language, aisle width). If it’s not here, it’s “variable.”
Environment
- Lab/Sim → permissive, fast iteration, more “controls,” less credibility.
- Staging/Prod Clone → realistic constraints, fewer excuses.
- Live Floor (Guardrails On) → harsh light; adds explicit safety lines.
Risk tolerance
- Low → inserts “fail-safe / show refusal” language and spotter/kill-switch lines.
- Medium → allows narrow ranges, retries with notes.
- High → you’re signing for paperwork; plan says that out loud.
Time-box – Caps scope. The generator builds a smallest-falsifier path that fits your minutes.
Metrics – What gets logged live (latency, FN/FP, near-misses, confidence drift). Omit this and you’ll narrate instead of prove.
Hold these constant – The “no fiddling” list (model hash, firmware, camera height…). If it wiggles here, result’s trash.
Artifacts (checkboxes) – Appends receipt steps to the plan:
- Signed at capture → filename + timestamp + operator, up front.
- Config snapshot → model/version/settings/hash, reproducibility anchor.
- Raw media + logs → audit trail, not vibes.
- Hashes → immutability receipts.
- One-page receipt → the shareable proof (links everything).

Pass / Fail (how the tool judges you)

Pass: A skeptical stranger can watch the clip and match counts to the log without narration.
Fail: Hidden knob twists, one-off “it worked once,” or any unlogged intervention.
Borderline: If a surprise happens and you didn’t stop to note it, that’s a fail—you moved the goalposts.

Outputs & buttons

Generate Plan → builds Setup → Procedure → Pass/Fail → Safety → Receipt Chain.
Export Markdown / JSON → paste in changelog/PRs.
Share link → URL encodes inputs so teammates can open with fields prefilled.

Presets (use these when you don’t want to think)

Smoke test (kills hype fast)
- Env: Lab/Sim · Risk: Low · Time-box: 15
- Metrics: latency; failure counts; refusal behavior
- Artifacts: Signed at capture, Config snapshot, One-pager
Stakeholder demo (credible, safe)
- Env: Staging/Prod Clone · Risk: Low · Time-box: 20–30
- Metrics: p95 latency; TP/FP/FN; interventions
- Artifacts: + Raw media + logs
Precision benchmark (nerd fight ready)
- Env: Staging · Risk: Medium · Time-box: 30
- Metrics: PR curve deltas; confidence drift; throttle events
- Artifacts: + Hashes for inputs/outputs; dataset slice manifest

Mental model

Freeze fewer things than you want, log more than you think, and make the result boring enough to survive a hostile room. If you need to narrate, you didn’t prove it.