Tie-breaks, significance in u-space, and a 10-step runbook.
E4) Tie-breakers & significance (simple, portable)
- Primary rank. By
RSI_pool_env(orRSI_poolfor intrinsic capability). - Tie-break 1. Higher fraction of
A++, thenA+. - Tie-break 2. Lower tokens and lower
lat_msat equalRSI_pool_env. - Significance (bootstrap in u-space). Convert each decision to
u := atanh(RSI_env). Resample both vendors with replacementNtimes (e.g.,N=1000), computemean(u_A) - mean(u_B). Report two-sidedpand a 95% CI on the difference in u-space; optionally map CI ends back viatanhfor display.
# Bootstrap difference in u-space (sketch)
def diff_ci(uA, uB, N=1000):
diffs = []
for _ in range(N):
sA = mean(random_resample(uA)) # with replacement
sB = mean(random_resample(uB))
diffs.append(sA - sB)
diffs.sort()
lo = diffs[int(0.025*N)]
hi = diffs[int(0.975*N)]
p = min(sum(d <= 0 for d in diffs), sum(d >= 0 for d in diffs)) / N
return (lo, hi, p) # CI in u-space; display can show tanh(lo), tanh(hi)
E5) Worked mini-example (calculator-fast, gated per decision)
# Vendor A (3 decisions). RSI = [0.604368, 0.291313, 0.910425], w=1
u = [0.700000, 0.300000, 1.530000]
U_pool = 2.530000 ; W_pool = 3
RSI_pool = tanh(2.53/3) = tanh(0.843333) ≈ 0.687571 → band A+
# Vendor B (3 decisions). RSI = [0.691069, 0.462117, 0.291313], w=1
u = [0.850000, 0.500000, 0.300000]
U_pool = 1.650000 ; W_pool = 3
RSI_pool = tanh(1.65/3) = tanh(0.550000) ≈ 0.500520 → band A0
# Apply gate per decision, g = 0.80, mode "mul", then pool in u-space
A: RSI_env = [0.483494, 0.233050, 0.728340]
u_env ≈ [0.527534, 0.237412, 0.925183]
U_env ≈ 1.690129 ; W=3
RSI_pool_env = tanh(1.690129/3) = tanh(0.563376) ≈ 0.510478 → A0
B: RSI_env = [0.552855, 0.369694, 0.233050]
u_env ≈ [0.622484, 0.388069, 0.237412]
U_env ≈ 1.247965 ; W=3
RSI_pool_env = tanh(1.247965/3) = tanh(0.415988) ≈ 0.393545 → A0
# Band distributions on RSI_env (g=0.80)
A: A+ : 1 ; A0 : 2 ; A++/A-/A-- : 0
B: A+ : 1 ; A0 : 2 ; A++/A-/A-- : 0
Takeaway: A leads both ungated (0.687571 vs 0.500520) and gated (0.510478 vs 0.393545).
E6) Run procedure (10 steps, copy-paste)
1) Freeze manifest and publish knobs_hash.
2) Select traffic or eval set; fix seeds.
3) Enable stamps and ledger (per Appendix C).
4) Run Vendor A and Vendor B with identical prompts/tools and the same manifest.
5) For each decision, log RSI, RSI_env := gate(RSI), band, U_dec := atanh(RSI), W_dec := w.
6) Roll up per bucket: U_pool := SUM U_dec, W_pool := SUM W_dec,
RSI_pool := tanh(U_pool / max(W_pool, eps_w)).
7) Repeat with per-decision RSI_env → RSI_pool_env (gated readiness).
8) Produce band histograms, cost/lat overlays, and bootstrap CI in u-space.
9) Stamp a one-line summary per bucket:
"SSMBO|bucket=decode_hour_14|A.U=...|A.W=...|A.RSI_pool=...|
B.U=...|B.W=...|B.RSI_pool=...|g=...|mode=mul|manifest=knobs_hash"
10) Publish a one-page table per surface: RSI_pool_env, band shares, tokens, lat_ms, CI.
One-line takeaway. Rank by the same bounded chooser, compute significance in u-space, and publish a stamped, reproducible bake-off — order-invariant, shard-safe, and classical m remains untouched via phi((m,a)) = m.
Navigation
Previous: SSM-AI – Appendix E — Vendor Bake-off Protocol (E1–E3)
Next: SSM-AI – Appendix E — Vendor Bake-off Protocol (E7–E9)
Directory of Pages
SSM-AI — Table of Contents