SSM-AI – Empirical Validation & Mini Benchmarks — Ablations (small knobs, big clarity) (6.3)

Tiny sweeps that explain the lift

Purpose. Run small, orthogonal sweeps to show which knobs drive quality/efficiency gains while keeping classical outputs intact via phi((m,a)) = m.

Knobs to sweep (independent).
Weights policy (gamma) in w := |m|^gamma{0, 0.5, 1, 2} (use w := 1 for pure comparability).
Lens gain (c) in a := tanh(c*e){0.7, 1.0, 1.3}.
Gate mode{"mul", "u_scale"} where RSI_env := g*RSI (mul) or RSI_env := tanh(g*atanh(RSI)) (u_scale).
Prior strength (beta) in u-space → {0, 0.1, 0.2} with bounded b ∈ [-1,+1].

Fixed invariants.
All runs must satisfy boundedness and parity: |a|<1, |RSI|<1, |RSI_env|<1, and phi((m,a)) = m.

Sweep scaffolding (copy-paste).

# Grid (example)
Gamma := {0, 0.5, 1.0, 2.0}
C     := {0.7, 1.0, 1.3}
Gate  := {"mul","u_scale"}
Beta  := {0.0, 0.1, 0.2}

for g in Gamma:
  for c in C:
    for mode in Gate:
      for beta in Beta:
        run_id := stamp(g,c,mode,beta)
        # compute RSI, RSI_env with declared manifest + (g,c,mode,beta)
        # log metrics and bands for Δ vs baseline

Record these for each cell (Δ = SSM-AI − Baseline).

first_pass_correct_delta
retries_delta
t_first_correct_delta
exposed_risk_delta          # incorrect in A-/A--
tokens_per_solved_delta
calls_per_solved_delta
stability_eps               # expect ~0 across batch/stream/shuffled
pearson_r, spearman_r
sat_rate := mean(|a| > 0.9)     # want < 0.10
dead_zone := mean(|a| < 0.1)    # want < 0.70

Interpretation rules (quick).
• If sat_rate > 0.10, reduce c or increase Unit.
• If dead_zone > 0.70, increase c or lower Unit; try gamma := 0 (uniform).
• If high-confidence damping is too harsh, switch gate to "u_scale".
• If vendor comparability is primary, prefer gamma := 0 and publish it.
• If signal strength should matter, use gamma := 1 and report sensitivity.

Mini ablation table (example layout).

Gamma  C    Gate      Beta   ΔFirst%   ΔRetries   ΔTokens   sat_rate  dead_zone  Notes
0.0    1.0  mul       0.0     +3.1%     -18%       -9%       0.06       0.58     good default
1.0    1.0  u_scale   0.1     +4.2%     -22%      -12%       0.08       0.55     gentler at A+/A++
2.0    1.3  mul       0.2     +4.4%     -25%      -10%       0.14*      0.41     *saturated → reduce c

One-line takeaway. Small, orthogonal sweeps over gamma, c, gate mode, and beta reveal a stable operating point with saturation < 10% and dead-zone < 70%, while collapse parity phi((m,a)) = m guarantees classical values never change.


Navigation
Previous: SSM-AI – Empirical Validation & Mini Benchmarks — Metrics (calculator-fast; no retraining) (6.2)
Next: SSM-AI – Empirical Validation & Mini Benchmarks — Results (tiny tables; reproduce from stamps) (6.4)


Directory of Pages
SSM-AI — Table of Contents