SSM-JTK – Data & Calibration — Train/Test windows & midpoint anchor (2.5)

Train/Test windows (disjoint spans)
Declare two non-overlapping spans and keep the same daily timestamp and sampling mode across both.
train_window = [train_start, train_stop] (e.g., 2020-01-01..2024-12-31)
test_window = [test_start, test_stop ] (e.g., 2015-01-01..2019-12-31)

  • No leakage. Do not mix rows across windows.
  • Same clock. Use the same daily timestamp (e.g., 05:30 IST or 00:00 UTC) for all data.
  • Same schedule. If you trained on daily, evaluate on daily for fair comparisons.

Midpoint anchor (numerical stability)
Anchor time at the midpoint of the training span and index all days as offsets from that anchor.
t0 = midpoint(train_start, train_stop)
t = days_since(date, t0)
A simple midpoint is: midpoint(a,b) = a + 0.5*(b − a) (in days).
Why midpoint?

  • Reduces phase coupling between small harmonics and the linear trend.
  • Mitigates edge bias in OLS when the span is long.
  • Improves conditioning (lower correlation between a0 and n in free-n fits).
  • Keeps t values balanced around zero, which helps both fit and BIC selection.

Practical tips

  • Slow outers. Prefer longer benches for training; keep the test window disjoint but representative.
  • Inners & retro loops. Ensure the train window includes at least one clean retro cycle.
  • Freeze t0. When you adjust the train span or sampling, keep the same t0 if you want apples-to-apples comparisons; otherwise re-derive t0 and document it.
  • Reproducibility. Record train_range, test_range, sampling mode, and t0 in your report so anyone can regenerate your metrics.

Evaluator reminder (for either family)
L_hat_deg(date) = wrap360( a0_deg + n_deg_per_day*t + sum_k[ c_k*sin(w_k*t) + d_k*cos(w_k*t) ] ) with t = days_since(date, t0).

Navigation
Back: SSM-JTK – Data & Calibration — Sampling modes + Guardrails (2.4)
Next: SSM-JTK – Data & Calibration — Kernel families & carriers (2.6)