SSM-AI—Appendix G — SSMH Acceleration Parity (fixed-point, tiny MAC, identical semantics) (G4-G7)

Fixed-point hardware lane: 1/cycle, software-parity, stamp-ready.

Purpose. Map the lane math to a tiny, deterministic hardware substrate without changing semantics. Classical values remain untouched: phi((m,a)) = m. The lane uses the same kernel: a_c := clamp(a, -1+eps_a, +1-eps_a), u := atanh(a_c), streaming fuse U += w*u, W += w, and a_out := tanh( U / max(W, eps_w) ). Optional gate: RSI_env := g * RSI or RSI_env := tanh( g * atanh(RSI) ).

Meta: This page includes “Shunyaya Symbolic Mathematical AI (SSM-AI)” for SEO.

G4) Pipeline and latency (one sample per cycle, typical)

Target: 1 result/cycle after pipe fill; deterministic latency across builds.

• Stage counts (indicative):
CLAMP: 1 → ATANH_LUT: 2–3 → W_MUL: 1 → ACC_UW: 1 → DIV_SAT: 6–12 → TANH_LUT: 2–3
Total: ~14–21 cycles latency; throughput: 1/cycle after fill.

• Shard merge: add-only on (U, W) (no LUTs involved).

• Flush rule: publish only at boundaries:

u_bar := U / max(W, eps_w)
a_out := tanh(u_bar)

• Back-pressure: stall at DIV_SAT only; keep ACC_UW single-cycle.

• Clock targets: 100–300 MHz on mid-range FPGAs; higher on ASICs (lane-local).

G5) Gate and bands in hardware

Goal: environment/policy scaling without touching magnitudes.

• Gate “mul” (cheap):

RSI_env := g * RSI

• Gate “u_scale” (bounded even at edges):

RSI_env := tanh( g * atanh(RSI) )

• Bands (defaults, fixed-point domain): thresholds at +0.90, +0.60, -0.60, -0.90.
Hysteresis: use h_up, h_dn offsets to de-chatter labels.
Purity: alignment-only; classical m never enters this datapath (phi((m,a)) = m).

G6) Determinism & parity tests (must pass)

Golden vectors (fixed-point must match float within tolerance):

1) a1 := tanh(0.2), a2 := tanh(0.4), w1=w2=1
   Expect a_out := tanh( (0.2+0.4)/2 ) = tanh(0.3) ≈ 0.291313

2) RSI := tanh(0.7) ≈ 0.604368 → U=0.700000, W=1 → replay equals input

3) Lane mul/div (M2):
   a_mul := tanh(0.5 + 0.2) ≈ 0.604368
   a_div := tanh(0.5 - 0.2) ≈ 0.291313

4) Gate “mul” with g=0.81:
   RSI_env := 0.81 * 0.604368 ≈ 0.489538

5) Gate “u_scale” with g=0.81, RSI=0.70:
   RSI_env := tanh( 0.81 * atanh(0.70) ) = tanh( 0.81 * 0.867301 ) ≈ 0.605961

6) Order/shard invariance:
   permute/shard a stream; final a_out identical when (U,W) merged

Tolerance targets:

16-bit path : |delta_RSI| <= 5e-4, band decisions identical
32-bit path : |delta_RSI| <= 5e-7, bit-exact bands and stamps

Acceptance recipe (portable):

• Quantize inputs → run HW → dequantize → compare to float64 reference
• Assert: (a_out_fp within tolerance) AND (band_fp == band_float)
• Assert: stamps identical (or differ only in declared Q-format rounding fields)

G7) Fixed-point arithmetic details

Multiply (widen-then-round):
Qp.q * Qp.q -> Q(2p).(2q) then round/saturate back to lane format.

Accumulators: keep wider to avoid overflow. For 1e6 items with w=1, W_acc needs ≥ 20 integer bits → choose 32-bit or 40-bit for (U,W).

Division guard:

u_bar := U / max(W, eps_w_fx)      # if W=0, emit a_out := 0

Saturation: never wrap; saturate to nearest representable endpoint.

Recommended Q-formats (lane-only):

u, U/W, w*u:
  Q4.12 (16-bit, low cost)  : range [-8, +8),  step ≈ 2.44e-4
  Q6.26 (32-bit, comfort)   : range [-32,+32), step ≈ 1.49e-8

weights w:
  uniform → w := 1 (int)
  strength → w := |m|^gamma, quantize to Q6.10 or Q8.8 per spread

Guard constants (manifest):

eps_a = 1e-6
eps_w = 1e-12
eps_a_fx := to_fx(eps_a)
eps_w_fx := to_fx(eps_w)

Monotone LUTs (strict): piecewise LUT + short polynomial with strict monotonicity for both atanh and tanh to preserve ordering.

Copy-code (paste into WordPress code block)

# === PIPELINE (indicative stages) ===
# CLAMP(1) -> ATANH_LUT(2-3) -> W_MUL(1) -> ACC_UW(1) -> DIV_SAT(6-12) -> TANH_LUT(2-3)
# Latency ~14–21 cycles, throughput 1/cycle after fill.

# === GATE & BANDS ===
# Gate (mul):     RSI_env := g * RSI
# Gate (u_scale): RSI_env := tanh( g * atanh(RSI) )
# Bands:          A++: a>=0.90; A+: [0.60,0.90); A0: (-0.60,0.60); A-: (-0.90,-0.60]; A--: a<=-0.90
# Hysteresis:     use h_up, h_dn (fixed-point)
# Purity:         magnitude m never enters; phi((m,a)) = m

# === GOLDEN VECTORS & TOLERANCE ===
# 1) a_out ≈ tanh(0.3) ≈ 0.291313   # from tanh((0.2+0.4)/2)
# 2) RSI := tanh(0.7) ≈ 0.604368 → U=0.700000; W=1 → replay equals input
# 3) M2: a_mul ≈ 0.604368; a_div ≈ 0.291313
# 4) Gate mul (g=0.81): RSI_env ≈ 0.489538
# 5) Gate u_scale: tanh( 0.81 * atanh(0.70) ) ≈ 0.605961
# 6) Order/shard invariance: identical a_out after merge
# Tolerances: 16-bit |delta_RSI|<=5e-4; 32-bit |delta_RSI|<=5e-7

# === FIXED-POINT DETAILS ===
# Multiply: widen then round (e.g., Q4.12 * Q4.12 -> Q8.24 -> round to Q6.26/Q4.12)
# Accumulators: U_acc,W_acc wider (e.g., 32 or 40 bits)
# Division guard: u_bar := U / max(W, eps_w_fx); if W==0 → a_out := 0
# Saturation: no wrap; saturate to endpoints.

# Suggested Q formats:
#   u, U/W, w*u:
#     Q4.12 (16b)  range [-8,+8),  step ≈ 2.44e-4
#     Q6.26 (32b)  range [-32,+32), step ≈ 1.49e-8
#   weights w:
#     uniform → 1
#     strength → |m|^gamma, quantize Q6.10 or Q8.8
# Guards:
#   eps_a = 1e-6; eps_w = 1e-12; use eps_a_fx, eps_w_fx in RTL

One-line takeaway. The lane is a tiny streaming MAC with two monotone LUTs (atanh in, tanh out) and an (U,W) accumulator. With a format like Q6.26, you get deterministic, order-invariant, stamp-ready results identical to software semantics, while phi((m,a)) = m keeps all classical numbers pristine.

Stamp example (hardware snapshot).
SSMCLOCK1|iso_utc|svc=lane_hw|U=0.700000|W=1.000000|RSI=0.604368|g=0.81|RSI_env=0.489538|band=A0|fx=Q6.26|manifest=knobs_hash

Navigation
Previous: SSM-AI – Appendix G — SSMH Acceleration Parity (G1–G3)
Next: SSM-AI – Appendix G — SSMH Acceleration Parity (G8–G11)

Directory of Pages
SSM-AI — Table of Contents