SSM-AI – Outcomes to Track on Day 1 (1.5)

What to measure immediately — with identical numbers and bounded clarity

Why this page matters.
On day 1 you haven’t retrained anything and your classical values stay identical (phi((m,a)) = m). Yet you can still see measurable wins. Track these five outcomes with stamped evidence; all are driven by the lane a in (-1,+1) and the chooser RSI in (-1,+1).


1) Retries ↓ and Hand-offs ↓

Use bands and a calm gate to avoid low-value loops.

RSI := tanh( (SUM w*atanh(a_out) - SUM w*atanh(a_in)) / max(SUM w, eps_w) )
RSI_env := g_t * RSI
policy_retry := (RSI_env >= 0.60)   # example: A+ threshold

Metric: retry_rate_before - retry_rate_after (absolute and %).
Evidence: stamped step logs with (RSI, RSI_env, band, stamp).


2) Time-to-First-Correct ↑

Pick earlier by using a bounded chooser instead of ad-hoc thresholds.

T_first_correct := time_of_first_answer_meeting_ground_truth
delta_T := T_baseline - T_with_RSI

Metric: median delta_T and 90p across tasks.
Evidence: side-by-side traces; same prompts/models; only lane+bands added.


3) Over-Confidence Exposure ↑

Reveal fragile selections that look strong in m but sit in low bands.

flag_overconf := (m >= m_thresh) AND (band in {A-, A--})
rate_overconf := COUNT(flag_overconf)/COUNT(all_decisions)

Metric: rate_overconf and its week-over-week change.
Evidence: per-decision (m, a, band) table; reviewers sample flagged cases.


4) Spend / Token / Latency ↓

Bounded routing reduces waste (fewer retries, fewer tool branches).

cost_saving := spend_baseline - spend_lane
token_saving := tokens_baseline - tokens_lane
latency_saving := p50_ms_baseline - p50_ms_lane

Metric: absolute and % deltas; report per surface (decode, RAG, tools).
Evidence: daily CSVs stamped and roll-ups via U/W (order-invariant).


5) Auditability ↑ (Reproducible Decisions)

Same inputs + same manifest ⇒ same outputs; every choice is replayable.

# Order-invariant fuse (carry only U,W)
U += w*atanh(a)
W += w
a_out := tanh( U / max(W, eps_w) )

# One-line stamp (example)
stamp := "SSMCLOCK1|" + iso_utc + "|" + rasi_idx + "|" + theta_deg + "|" + sha256(file) + "|" + chain

Metric: % of decisions with full evidence row (m, a, U, W, RSI, RSI_env, band, stamp).
Evidence: conformance sheet (batch==stream==shuffled; collapse parity checks).


Starter dashboard (paste-ready fields)

date, surface, decisions, retries, retry_rate, tokens, ms_p50, ms_p90,
m_avg, a_avg, band_counts[A--,A-,A0,A+,A++], RSI_avg, RSI_env_avg,
overconf_count, overconf_rate, spend, knobs_hash, build_id

Read in 30 seconds: fewer retries, earlier first-correct, stable token/latency drop, visible over-confidence, complete stamps.

One-line takeaway.
Keep m pristine via phi((m,a)) = m. Add a bounded lane and chooser, then measure retries, first-correct time, over-confidence, cost/latency, and audit coverage—with stamped, order-invariant evidence.


Navigation
Previous: SSM-AI – Positioning vs Related Methods (1.4)
Next: SSM-AI – Limits & Failure Modes (1.6)


Directory of Pages
SSM-AI — Table of Contents