7.7 — Nonlinear units (tanh/atanh): LUT + poly spec
• Domain. |a| <= 1 - eps_a, u in [-Umax, +Umax] where Umax := atanh(1 - eps_a).
• Method. Piecewise polynomial with a small LUT of segment coefficients (Chebyshev/minimax recommended), or CORDIC.
• Indicative budgets.
– MCU soft-IP. 16-bit u fixed-point (Q2.14), target max |delta_a| <= 1e-4 near u ≈ 0, latency 2–3 cycles.
– FPGA. Use 18×18 DSP slices for mul/add; LUTRAM for coeffs; 1–2 cycles map, 1 cycle compose.
– ASIC. 1 cycle feasible for map with modest gates; often 2–3 cycles favored for power.
• Coeff width. 16–20 bits is ample for band steps of 0.1–0.3.
• Rounding. Deterministic (ties-to-even) at each pipeline stage; manifest ROUND_MODE.
7.8 — Timing and area (design targets, adjustable)
• Compose in u. SUADD/SUSUB/SUSCALE = 1 cycle.
• Map / inverse-map. 1–3 cycles each depending on polynomial degree; throughput 1 per cycle with pipeline.
• SSACC. 1 cycle per update; flush latency dominated by divider + tanh.
• Area levers. Polynomial degree, LUT size, whether u is stored or recomputed, and u word length.
• Clock/power levers. Clock-gate Stage 2 on pass-through; gate SSACC when acc_en = 0.
7.9 — Quantization guidance (first-order envelope)
• Envelope. If u has step delta_u, then |delta_a| <= sech^2(u) * |delta_u|.
• Worst case near zero. |delta_a| <= |delta_u|; errors shrink toward the clamped edges.
• Choose delta_u. Make |delta_a| <= band_step/4 near u ≈ 0. Heuristic: delta_u ≈ band_step.
• Example mapping. With eps_a = 1e-6, Umax ≈ 7.2543. For 16-bit signed:scale := (2^15 - 1) / Umax ≈ 4515.
Fixed-point codes use u_fp := round(u * scale).
Navigation
Back: Shunyaya Symbolic Mathematical Hardware – SSACC Tile, ISA & Register Map (7.4–7.6)
Next: Shunyaya Symbolic Mathematical Hardware – Reset, Bring-Up & Integration (7.10–7.12)
Directory of Pages
SSMH – Table of Contents