SSM-AI – Appendix G — SSMH Acceleration Parity (G8–G11)

Tiny RTL, manifest knobs, QA parity — deployable on FPGA/ASIC.

G8) Resource sketch (indicative, not a promise)

  • LUTs/BRAM: 1–2 BRAMs per function for coefficient tables (tanh, atanh).
  • DSPs: 4–8 for polynomial MACs, w*u, and optional gate.
  • ALMs/CLBs: adders, clamps, band comparators, stamp joiner (small).
  • Clocks: 100–300 MHz on mid-range FPGAs; higher on ASIC.
  • Scaling: parallelize lanes linearly; merges are add-only on (U, W).
  • Reset/persist: keep (U, W) registers and manifest constants; replay from stamps is calculator-fast.

G9) Manifest fields for hardware builds (copy-paste)

{
  "ssmh": {
    "fx": {"u_fmt": "Q6.26", "a_fmt": "Q2.14", "w_fmt": "Q8.8"},
    "eps_a": 1e-6, "eps_w": 1e-12,
    "gate": {"mode": "mul"},                      // or "u_scale"
    "bands": {"A++":0.90,"A+":0.60,"A0":-0.60,"A-":-0.90,"A--":-1.00},
    "luts": {"atanh_seg": 5, "tanh_seg": 4, "poly_degree": 2},
    "acc_width": {"U_bits": 40, "W_bits": 24},
    "weights": {"policy": "uniform"},             // or {"policy":"abs_m_gamma","gamma":1.0}
    "combine_policy": "M2",
    "division_policy": "strict",
    "stamp": {"profile": "SSMCLOCK1", "fields": ["U","W","RSI","g","RSI_env","band","fx","manifest"]}
  }
}


G10) RTL-style pseudocode (concise, lane-only)

# Inputs per item: a_in_fx (Q2.14), w_fx (Q8.8), g_fx (Q1.15 optional)
# State: U_fx (wide), W_fx (wide), constants: eps_a_fx, eps_w_fx, bands, formats

a_c   = clamp_fx(a_in_fx, -1 + eps_a_fx, 1 - eps_a_fx)       # CLAMP
u     = atanh_lut_poly(a_c)                                  # ATANH_LUT (monotone)
wu    = mul_fx(w_fx, u)                                      # W_MUL  (widen then round)
U_fx  = sat_add_fx(U_fx, wu)                                 # ACC_UW
W_fx  = sat_add_fx(W_fx, w_fx)

u_bar = div_guard_fx(U_fx, W_fx, eps_w_fx)                   # DIV_SAT (if W==0 → 0)
a_out = tanh_lut_poly(u_bar)                                 # TANH_LUT (monotone)

# Gate (choose one via manifest)
RSI      = a_out
RSI_env  = mul_fx(g_fx, RSI)                                 # mode "mul"
# or
# RSI_env = tanh_lut_poly( mul_fx(g_fx, atanh_lut_poly(RSI)) ) # mode "u_scale"

band  = band_of(RSI_env, thr_pp=0.90, thr_p=0.60, thr_n=-0.60, thr_nn=-0.90)

stamp = make_stamp("SSMCLOCK1", iso_utc(), {
  "U":U_fx, "W":W_fx, "RSI":RSI, "g":g_fx, "RSI_env":RSI_env,
  "band":band, "fx":fmt_string(), "manifest":knobs_hash()
})


G11) Acceptance and QA (hardware)

  • Bit-parity mode: feed float golden vectors, quantize, run HW, dequantize, compare to float64.
    • Targets: |delta_RSI| <= 5e-4 (16-bit) or <= 5e-7 (32-bit); bands identical.
  • Throughput: sustain 1 sample/cycle after pipe fill at target clock.
  • Merge test: split stream across two lanes, sum (U, W), reproduce single-lane result.
  • Order/shard invariance: permutations and shard merges leave a_out unchanged.
  • Reset/restart: persisting (U, W) across resets yields identical a_out.
  • Safety: |a| < 1 via clamp; u_bar := U / max(W, eps_w); saturate, never wrap.
  • Stamp determinism: identical manifest + inputs ⇒ identical stamps (except declared Q-format rounding fields).
  • Parity with software: semantics identical (kernel and bands); classical values stay pristine: phi((m,a)) = m.

One-line takeaway. Declare knobs, drop in the lane, and verify parity: tiny LUT-MAC RTL with (U, W) fusing gives deterministic, order/shard-invariant alignment that matches software while phi((m,a)) = m keeps classical numbers untouched.

Stamp example (hardware snapshot).
SSMCLOCK1|iso_utc|svc=lane_hw|U=0.700000|W=1.000000|RSI=0.604368|g=0.81|RSI_env=0.489538|band=A0|fx=Q6.26|manifest=knobs_hash


Navigation
Previous: SSM-AI – Appendix G — SSMH Acceleration Parity (G4–G7)
Next: SSM-AI – Appendix H — Comparisons & Synergies (H1–H3)


Directory of Pages
SSM-AI — Table of Contents