SSM-AI – Appendix E — Vendor Bake-off Protocol (E4–E6)

Tie-breaks, significance in u-space, and a 10-step runbook.

E4) Tie-breakers & significance (simple, portable)

Primary rank. By RSI_pool_env (or RSI_pool for intrinsic capability).
Tie-break 1. Higher fraction of A++, then A+.
Tie-break 2. Lower tokens and lower lat_ms at equal RSI_pool_env.
Significance (bootstrap in u-space). Convert each decision to u := atanh(RSI_env). Resample both vendors with replacement N times (e.g., N=1000), compute mean(u_A) - mean(u_B). Report two-sided p and a 95% CI on the difference in u-space; optionally map CI ends back via tanh for display.

# Bootstrap difference in u-space (sketch)
def diff_ci(uA, uB, N=1000):
    diffs = []
    for _ in range(N):
        sA = mean(random_resample(uA))  # with replacement
        sB = mean(random_resample(uB))
        diffs.append(sA - sB)
    diffs.sort()
    lo = diffs[int(0.025*N)]
    hi = diffs[int(0.975*N)]
    p  = min(sum(d <= 0 for d in diffs), sum(d >= 0 for d in diffs)) / N
    return (lo, hi, p)  # CI in u-space; display can show tanh(lo), tanh(hi)

E5) Worked mini-example (calculator-fast, gated per decision)

# Vendor A (3 decisions). RSI = [0.604368, 0.291313, 0.910425], w=1
u = [0.700000, 0.300000, 1.530000]
U_pool = 2.530000 ; W_pool = 3
RSI_pool = tanh(2.53/3) = tanh(0.843333) ≈ 0.687571  → band A+

# Vendor B (3 decisions). RSI = [0.691069, 0.462117, 0.291313], w=1
u = [0.850000, 0.500000, 0.300000]
U_pool = 1.650000 ; W_pool = 3
RSI_pool = tanh(1.65/3) = tanh(0.550000) ≈ 0.500520  → band A0

# Apply gate per decision, g = 0.80, mode "mul", then pool in u-space
A: RSI_env = [0.483494, 0.233050, 0.728340]
u_env ≈ [0.527534, 0.237412, 0.925183]
U_env ≈ 1.690129 ; W=3
RSI_pool_env = tanh(1.690129/3) = tanh(0.563376) ≈ 0.510478  → A0

B: RSI_env = [0.552855, 0.369694, 0.233050]
u_env ≈ [0.622484, 0.388069, 0.237412]
U_env ≈ 1.247965 ; W=3
RSI_pool_env = tanh(1.247965/3) = tanh(0.415988) ≈ 0.393545  → A0

# Band distributions on RSI_env (g=0.80)
A: A+ : 1 ; A0 : 2 ; A++/A-/A-- : 0
B: A+ : 1 ; A0 : 2 ; A++/A-/A-- : 0

Takeaway: A leads both ungated (0.687571 vs 0.500520) and gated (0.510478 vs 0.393545).

E6) Run procedure (10 steps, copy-paste)

1) Freeze manifest and publish knobs_hash.
2) Select traffic or eval set; fix seeds.
3) Enable stamps and ledger (per Appendix C).
4) Run Vendor A and Vendor B with identical prompts/tools and the same manifest.
5) For each decision, log RSI, RSI_env := gate(RSI), band, U_dec := atanh(RSI), W_dec := w.
6) Roll up per bucket: U_pool := SUM U_dec, W_pool := SUM W_dec,
   RSI_pool := tanh(U_pool / max(W_pool, eps_w)).
7) Repeat with per-decision RSI_env → RSI_pool_env (gated readiness).
8) Produce band histograms, cost/lat overlays, and bootstrap CI in u-space.
9) Stamp a one-line summary per bucket:
   "SSMBO|bucket=decode_hour_14|A.U=...|A.W=...|A.RSI_pool=...|
    B.U=...|B.W=...|B.RSI_pool=...|g=...|mode=mul|manifest=knobs_hash"
10) Publish a one-page table per surface: RSI_pool_env, band shares, tokens, lat_ms, CI.

One-line takeaway. Rank by the same bounded chooser, compute significance in u-space, and publish a stamped, reproducible bake-off — order-invariant, shard-safe, and classical m remains untouched via phi((m,a)) = m.

Navigation
Previous: SSM-AI – Appendix E — Vendor Bake-off Protocol (E1–E3)
Next: SSM-AI – Appendix E — Vendor Bake-off Protocol (E7–E9)

Directory of Pages
SSM-AI — Table of Contents