SSM-AI – Reproducibility: Five-Step Replay from Stamps (6.5)

From stamps to results: a five-step replay

Inputs (minimal).
manifest.json, decisions.csv (stamped logs), and task gold labels. All numbers are ASCII decimals; classical outputs remain unchanged by design (phi((m,a)) = m).

Step 1 — Clamp & map (per decision / per signal).
Map signed contrasts to bounded alignments; clamp before atanh.

# lane clamps
a_in  := tanh(-c * e_in)
a_out := tanh(+c * e_out)

# guards
a_in  := clamp(a_in,  -1+eps_a, +1-eps_a)
a_out := clamp(a_out, -1+eps_a, +1-eps_a)

# u-space
u_in  := atanh(a_in)
u_out := atanh(a_out)

Step 2 — Chooser (RSI).
Fuse in u-space; order/shard invariance by construction.

U_in  := sum_i w_i * u_in_i
V_out := sum_j w_j * u_out_j
W_in  := sum_i w_i
RSI   := tanh( (V_out - U_in) / max(W_in, eps_w) )

# zero-evidence guard
if W_in == 0: RSI := 0

Step 3 — Calm gate (alignment-only).
Apply gate to alignment only; m stays pristine.

# either simple multiply
RSI_env := g_t * RSI

# or curvature-preserving mode
RSI_env := tanh( g_t * atanh(RSI) )

Step 4 — Banding (A++/A+/A0/A-/A–).
Use fixed thresholds (with optional hysteresis); band on RSI_env if declared.

A++: x >= +0.90
A+ : +0.60 <= x < +0.90
A0 : -0.60 <  x < +0.60
A- : -0.90 <  x <= -0.60
A--: x <= -0.90

Step 5 — Record & aggregate.
Keep replay deterministic; compute quality/efficiency/stability vs gold.

# per decision (example fields; ASCII only)
ts, run_id, item_id, U, W, RSI, RSI_env, band, g_t,
lanes(F,D,L,E,V,Q), lens_id, Unit, c, eps_a, eps_w,
weights_policy, division_policy, dtype, knobs_hash,
file_sha256_in, file_sha256_out

# aggregates
- accuracy / F1 (task-level)
- retries, tokens, tool calls per solved task
- band histogram for RSI or RSI_env
- batch vs stream vs shuffled RSI difference (~0 within dtype eps)
- corr(RSI, correctness)

CSV column hints (compact).
For step-level experiments, a minimal long-form CSV works well:

item_id, e_in, e_out, w, g_t, band_prev

For doc-wise RAG pooling:

query_id, doc_id, e_out, e_in, w_doc, g_t, stamp

Verifier skeleton (pseudo).

load manifest.json
for each decision in decisions.csv:
  compute a_in, a_out -> clamp
  u_in := atanh(a_in); u_out := atanh(a_out)
  RSI  := tanh((sum w*u_out - sum w*u_in)/max(sum w, eps_w))
  RSI_env := gate(RSI, g_t, mode)
  band := thresholds(RSI_env)
aggregate metrics, print tables

QA invariants (must pass).

  • Collapse parity: phi((m,a)) = m everywhere.
  • Boundedness: |a| < 1, |RSI| < 1, |RSI_env| < 1.
  • Determinism: same manifest + same CSV ⇒ identical outputs (within dtype tolerance).
  • Zero-evidence: if W_in == 0, then RSI = 0, band = "A0" with reason insufficient_evidence.
  • Invariance: batch == stream == shuffled within numeric epsilon.

One-line takeaway. Replay is five lines of math: clamp → map to u → fuse → gate → band. Publish ASCII stamps and a fixed manifest so anyone can reproduce deltas while classical values remain identical via phi((m,a)) = m.


Navigation
Previous: SSM-AI – Empirical Validation & Mini Benchmarks —Results (6.4)
Next: SSM-AI – Scalability & Numerical Precision (7)


Directory of Pages
SSM-AI — Table of Contents