Carry (U,W), clamp before atanh, keep m untouched.
7.1 Long-Path Guidance (agents, streams, shards)
• Carry only (U, W); never just a_out. Merge shards by summing U and W, then invert once.
• Checkpoint/rollback with additive deltas. Store per-step ΔU := w*u, ΔW := w to undo exactly (see earlier checkpoint logic).
• Chunking doesn’t matter. Order/shard invariance holds because composition is additive in u := atanh(a).
• Precision tip. For very long runs, optional pairwise/Kahan-style summation on U reduces float error; W can use standard summation.
# Streaming fuse (order/shard invariant)
U += w*atanh(a) # with a pre-clamped
W += w
a_out := tanh( U / max(W, eps_w) )
# Shard merge
U_total := SUM_shards U_k
W_total := SUM_shards W_k
a_out := tanh( U_total / max(W_total, eps_w) )
# Checkpoint-friendly deltas
ΔU := w*atanh(a)
ΔW := w
7.2 Dtype & Epsilon (recommended defaults)
• Clamp margin for alignment eps_a: float32 → 1e-6, float64 → 1e-12.
• Denominator guard for means eps_w: float32 → ≥ 1e-8, float64 → 1e-12.
• Gate epsilon eps_g (all dtypes): 1e-12.
• Safe atanh input (always clamp). Keep |a| < 1 - eps_a in all lanes and choosers.
• When to prefer float64. Paths > 10^3 steps, wide dynamic w, cross-vendor bake-offs, or CPU offline analytics.
• Zero-evidence guard. If W_in == 0, set RSI := 0, band := "A0", reason insufficient_evidence.
• Collapse parity. phi((m,a)) = m under all dtype settings.
# Dtype-aware clamps
a_c := clamp(a, -1+eps_a, +1-eps_a) # eps_a: 1e-6 (f32), 1e-12 (f64)
# Streaming fuse recall (calculator-fast)
U += w*atanh(a_c)
W += w
a_out := tanh( U / max(W, eps_w) ) # eps_w: ≥1e-8 (f32), 1e-12 (f64)
# Zero-evidence handling
if W == 0:
RSI := 0.0
band := "A0" # insufficient_evidence
# Collapse parity (must always hold)
phi((m,a)) = m
7.3 Stability Near Edges (a → ±1)
• Never feed raw ±1. Always clamp first: a_c := clamp(a, -1+eps_a, +1-eps_a).
• Curvature awareness. atanh(a) grows rapidly near edges; keep lenses in the responsive band (typical |c*e| in [0.3, 1.2]).
• Lane mul/div policy (M2). Division policy acts on routing only; never on m.
# Lane M2 (lane-only; magnitudes are classical)
a_mul := tanh( atanh(a1) + atanh(a2) )
a_div := tanh( atanh(a1) - atanh(a2) )
# Division policy (control, not value)
division_policy := "strict" # default
# Near-zero denominators: fall back to classical actuation; still stamp the lane.
# High-confidence gating with curvature preservation
# mode "u_scale":
RSI_env := tanh( g_t * atanh(RSI) )
One-line takeaway. Scale confidently: sum in u, carry (U,W), clamp before atanh, and keep epsilons dtype-appropriate — long, sharded paths remain bounded, reproducible, and fast, with phi((m,a)) = m always preserved.
Navigation
Previous: SSM-AI – Reproducibility: Five-Step Replay from Stamps (6.5)
Next: SSM-AI – Scalability: HW Parity & Performance (7.4, 7.5)
Directory of Pages
SSM-AI — Table of Contents