Tiny RTL, manifest knobs, QA parity — deployable on FPGA/ASIC.
G8) Resource sketch (indicative, not a promise)
- LUTs/BRAM: 1–2 BRAMs per function for coefficient tables (tanh, atanh).
- DSPs: 4–8 for polynomial MACs,
w*u, and optional gate. - ALMs/CLBs: adders, clamps, band comparators, stamp joiner (small).
- Clocks: 100–300 MHz on mid-range FPGAs; higher on ASIC.
- Scaling: parallelize lanes linearly; merges are add-only on
(U, W). - Reset/persist: keep
(U, W)registers and manifest constants; replay from stamps is calculator-fast.
G9) Manifest fields for hardware builds (copy-paste)
{
"ssmh": {
"fx": {"u_fmt": "Q6.26", "a_fmt": "Q2.14", "w_fmt": "Q8.8"},
"eps_a": 1e-6, "eps_w": 1e-12,
"gate": {"mode": "mul"}, // or "u_scale"
"bands": {"A++":0.90,"A+":0.60,"A0":-0.60,"A-":-0.90,"A--":-1.00},
"luts": {"atanh_seg": 5, "tanh_seg": 4, "poly_degree": 2},
"acc_width": {"U_bits": 40, "W_bits": 24},
"weights": {"policy": "uniform"}, // or {"policy":"abs_m_gamma","gamma":1.0}
"combine_policy": "M2",
"division_policy": "strict",
"stamp": {"profile": "SSMCLOCK1", "fields": ["U","W","RSI","g","RSI_env","band","fx","manifest"]}
}
}
G10) RTL-style pseudocode (concise, lane-only)
# Inputs per item: a_in_fx (Q2.14), w_fx (Q8.8), g_fx (Q1.15 optional)
# State: U_fx (wide), W_fx (wide), constants: eps_a_fx, eps_w_fx, bands, formats
a_c = clamp_fx(a_in_fx, -1 + eps_a_fx, 1 - eps_a_fx) # CLAMP
u = atanh_lut_poly(a_c) # ATANH_LUT (monotone)
wu = mul_fx(w_fx, u) # W_MUL (widen then round)
U_fx = sat_add_fx(U_fx, wu) # ACC_UW
W_fx = sat_add_fx(W_fx, w_fx)
u_bar = div_guard_fx(U_fx, W_fx, eps_w_fx) # DIV_SAT (if W==0 → 0)
a_out = tanh_lut_poly(u_bar) # TANH_LUT (monotone)
# Gate (choose one via manifest)
RSI = a_out
RSI_env = mul_fx(g_fx, RSI) # mode "mul"
# or
# RSI_env = tanh_lut_poly( mul_fx(g_fx, atanh_lut_poly(RSI)) ) # mode "u_scale"
band = band_of(RSI_env, thr_pp=0.90, thr_p=0.60, thr_n=-0.60, thr_nn=-0.90)
stamp = make_stamp("SSMCLOCK1", iso_utc(), {
"U":U_fx, "W":W_fx, "RSI":RSI, "g":g_fx, "RSI_env":RSI_env,
"band":band, "fx":fmt_string(), "manifest":knobs_hash()
})
G11) Acceptance and QA (hardware)
- Bit-parity mode: feed float golden vectors, quantize, run HW, dequantize, compare to float64.
- Targets:
|delta_RSI| <= 5e-4(16-bit) or<= 5e-7(32-bit); bands identical.
- Targets:
- Throughput: sustain 1 sample/cycle after pipe fill at target clock.
- Merge test: split stream across two lanes, sum
(U, W), reproduce single-lane result. - Order/shard invariance: permutations and shard merges leave
a_outunchanged. - Reset/restart: persisting
(U, W)across resets yields identicala_out. - Safety:
|a| < 1via clamp;u_bar := U / max(W, eps_w); saturate, never wrap. - Stamp determinism: identical manifest + inputs ⇒ identical stamps (except declared Q-format rounding fields).
- Parity with software: semantics identical (kernel and bands); classical values stay pristine:
phi((m,a)) = m.
One-line takeaway. Declare knobs, drop in the lane, and verify parity: tiny LUT-MAC RTL with (U, W) fusing gives deterministic, order/shard-invariant alignment that matches software while phi((m,a)) = m keeps classical numbers untouched.
Stamp example (hardware snapshot).SSMCLOCK1|iso_utc|svc=lane_hw|U=0.700000|W=1.000000|RSI=0.604368|g=0.81|RSI_env=0.489538|band=A0|fx=Q6.26|manifest=knobs_hash
Navigation
Previous: SSM-AI – Appendix G — SSMH Acceleration Parity (G4–G7)
Next: SSM-AI – Appendix H — Comparisons & Synergies (H1–H3)
Directory of Pages
SSM-AI — Table of Contents