Tiny, replayable tables from stamped runs
Purpose. Present compact, stamp-replayable results for each chosen task. Replace the example numbers with your stamped CSV replays; keep tables minimal and comparable across vendors.
Decoding rerank (short answers; beam or candidates)
Dataset: 500 prompts, temp 0.7, beam=5. Selector A: argmax(prob). Selector B: RSI (or RSI_env) bands.
Metric Baseline SSM-AI Δ (B−A)
First-pass correctness (%) 61.8 66.4 +4.6
Over-confident errors in A+/A++ (%) 22.3 8.7 −13.6
Mean retries per prompt 0.42 0.29 −31%
Latency p50 / p95 (s) 1.8/3.9 1.7/3.7 −0.1/−0.2
Band histogram (A++/A+/A0/A−/A--) 9/28/52/8/3 12/34/47/5/2 —
RAG QA (top-k docs + cite integrity)
Selector A: baseline retrieval score. Selector B: RSI pooling of doc alignments (support − penalties).
Metric Baseline SSM-AI Δ (B−A)
Exact Match / F1 (%) 48.2/63.1 50.7/65.0 +2.5/+1.9
Valid citations (%) 71.0 79.6 +8.6
Off-topic responses (%) 11.4 7.2 −4.2
Tokens per solved task (k) 8.9 7.6 −15%
Tool loop (agent micro-workflow: parse → call → verify)
Policy: act on bands of RSI_env (A++/A+/A0/A−/A–).
Metric Baseline SSM-AI (band policy) Δ (B−A)
Bad escalations per 1000 14.1 8.9 −37%
Time-to-first-correct (s) 12.4 10.2 −18%
Retries per task 0.61 0.44 −28%
Calls per solved task 2.8 2.3 −18%
Reporting notes (must hold).
• Collapse parity: phi((m,a)) = m everywhere (classical values unchanged).
• Boundedness: |a|<1, |RSI|<1, |RSI_env|<1.
• Determinism: replay from stamps (same manifest) reproduces tables bit-for-bit within dtype tolerance.
• Paired A/B: identical inputs; only selector/policy differs.
• Band transparency: publish band histogram for RSI or RSI_env.
One-line takeaway. With fixed manifests and stamped logs, the lane raises correctness and reduces waste (retries/tokens/calls) while keeping classical outputs identical via phi((m,a)) = m—small, audit-ready wins anyone can replay.
Navigation
Previous: SSM-AI – Empirical Validation & Mini Benchmarks —Ablations (small knobs, big clarity) (6.3)
Next: SSM-AI – Reproducibility: Five-Step Replay from Stamps (6.5)
Directory of Pages
SSM-AI — Table of Contents