Fair, bounded, reproducible comparisons across vendors/models.
Purpose. Standardize cross-vendor/model evaluations using the same observation-only math and stamped ledgers. Classical numbers remain untouched (phi((m,a)) = m). Selection and reporting use bounded alignment and order-invariant fusion.
E1) Scope & prerequisites (freeze before you run)
- Manifest freeze (non-negotiable). Fix:
eps_a,eps_w, weights policy,combine_policy="M2", division policy, band thresholds, lens params (Unit,c), gate mode ("mul"or"u_scale"). Publishknobs_hash. - Traffic & sets. Choose exactly one:
shadow_traffic(live mirror) orfrozen_eval_set(static prompts/queries/docs). - Randomness. Fix decoding seeds; log seeds in stamps.
- Stamping. Every decision emits a one-line stamp and a ledger row (per Appendix C).
- Observation-only. No post-hoc calibration per vendor. Classical
mis never altered inside SSM-AI (phi((m,a)) = m).
E2) What to log per decision (minimum row)
iso_utc, svc, req_id, item_id, RSI, w, g, RSI_env, band,
U_dec := atanh(RSI), W_dec := w, manifest := knobs_hash, seed, stamp
# Optional overlays: tokens, lat_ms, cost_unit, and any classical metric m your stack already emits
E3) How to aggregate per vendor (bounded, order-invariant)
Ungated pool (intrinsic capability).
U_pool := SUM U_dec
W_pool := SUM W_dec
RSI_pool := tanh( U_pool / max(W_pool, eps_w) )
Gated pool (live readiness). Apply the gate per decision, then fuse in u-space.
# Gate each decision
"mul" : RSI_env := g * RSI
"u_scale" : RSI_env := tanh( g * atanh(RSI) )
# Then pool the gated decisions
U_env := SUM atanh(RSI_env)
RSI_pool_env := tanh( U_env / max(W_pool, eps_w) )
Band distribution. Count/fraction of A++/A+/A0/A-/A-- over RSI_env.
Cost/latency overlays. Report medians/means alongside RSI_pool_env (never mix them into the bounded index).
Notes (do these, avoid those).
- Always pool in u-space (
atanh) to preserve order/shard invariance. - Never average directly in a-space; only
(U,W)may be merged across shards. - Collapse parity holds at every step:
phi((m,a)) = m.
One-line takeaway. Freeze the manifest, stamp every decision, and compare vendors on the same bounded chooser: pool by (U,W) in u-space for intrinsic (RSI_pool) and gated readiness (RSI_pool_env)—order-invariant, shard-safe, reproducible, and m stays pristine.
Navigation
Previous: SSM-AI – Appendix D — Reference Pseudocode, Checklist & Manifest Keys (D4–D6)
Next: SSM-AI – Appendix E — Vendor Bake-off Protocol (E4–E6)
Directory of Pages
SSM-AI — Table of Contents