Boring Demo Generator
Analog Intruder: 📎 Paperclip HaloPrecision is cute; contingency is the point. Give me a claim, I’ll hand you a demo that survives a fluorescent room and a hostile audience.
Tip: paste this plan into your change log. If someone can’t reproduce it after coffee, it wasn’t boring enough.
Make hype survive daylight.
Give me a falsifiable claim, a real context, and a timer. I’ll spit out a boring, reproducible demo plan that doesn’t need narration or vibes.
Short version:
- Claim + Context = what you’re proving and where it breaks first.
- Environment = how honest you’re being about reality.
- Risk tolerance = guardrails + refusal behaviors.
- Time-box = demo size (forces decisions).
- Metrics + Constants = what you watch vs what you freeze.
- Artifacts = receipts you’ll publish so a stranger can rerun it.
What each field actually does
- Claim – The falsifiable sentence the whole plan orbits. Weak claim ⇒ weak plan. Write it like a bug you can close.
- Context – Constraints that make the claim non-magical (lighting, device, language, aisle width). If it’s not here, it’s “variable.”
- Environment
- Lab/Sim → permissive, fast iteration, more “controls,” less credibility.
- Staging/Prod Clone → realistic constraints, fewer excuses.
- Live Floor (Guardrails On) → harsh light; adds explicit safety lines.
- Risk tolerance
- Low → inserts “fail-safe / show refusal” language and spotter/kill-switch lines.
- Medium → allows narrow ranges, retries with notes.
- High → you’re signing for paperwork; plan says that out loud.
- Time-box – Caps scope. The generator builds a smallest-falsifier path that fits your minutes.
- Metrics – What gets logged live (latency, FN/FP, near-misses, confidence drift). Omit this and you’ll narrate instead of prove.
- Hold these constant – The “no fiddling” list (model hash, firmware, camera height…). If it wiggles here, result’s trash.
- Artifacts (checkboxes) – Appends receipt steps to the plan:
- Signed at capture → filename + timestamp + operator, up front.
- Config snapshot → model/version/settings/hash, reproducibility anchor.
- Raw media + logs → audit trail, not vibes.
- Hashes → immutability receipts.
- One-page receipt → the shareable proof (links everything).
Pass / Fail (how the tool judges you)
- Pass: A skeptical stranger can watch the clip and match counts to the log without narration.
- Fail: Hidden knob twists, one-off “it worked once,” or any unlogged intervention.
- Borderline: If a surprise happens and you didn’t stop to note it, that’s a fail—you moved the goalposts.
Outputs & buttons
- Generate Plan → builds Setup → Procedure → Pass/Fail → Safety → Receipt Chain.
- Export Markdown / JSON → paste in changelog/PRs.
- Share link → URL encodes inputs so teammates can open with fields prefilled.
Presets (use these when you don’t want to think)
- Smoke test (kills hype fast)
- Env: Lab/Sim · Risk: Low · Time-box: 15
- Metrics: latency; failure counts; refusal behavior
- Artifacts: Signed at capture, Config snapshot, One-pager
- Stakeholder demo (credible, safe)
- Env: Staging/Prod Clone · Risk: Low · Time-box: 20–30
- Metrics: p95 latency; TP/FP/FN; interventions
- Artifacts: + Raw media + logs
- Precision benchmark (nerd fight ready)
- Env: Staging · Risk: Medium · Time-box: 30
- Metrics: PR curve deltas; confidence drift; throttle events
- Artifacts: + Hashes for inputs/outputs; dataset slice manifest
Mental model
Freeze fewer things than you want, log more than you think, and make the result boring enough to survive a hostile room. If you need to narrate, you didn’t prove it.