Models

Time-series foundation models and statistical baselines evaluated on TS-Arena. Click a family for a description, links to the original paper / repo / website, and the specific checkpoints we run.

Foundation ModelAmazon Science·5 versions
Chronos
Tokenises time-series values into a fixed vocabulary and trains transformer language models on them with cross-entropy loss. Two generations on the leaderboard: Chronos-Bolt (patch-based encoder-decoder, ~250× faster than the original Chronos) and Chronos-2 (encoder-only, supports univariate, multivariate, and covariate-informed forecasting in one model).
Chronos-Bolt TinyChronos-Bolt MiniChronos-Bolt SmallChronos-Bolt BaseChronos-2
Foundation ModelIBM Research·1 version
FlowState
Tiny TSFM (~2M params) combining a state-space-model encoder with a functional-basis decoder. Inhabits a timescale-invariant coefficient space, so a single training run helps inference at every sampling rate. SOTA on GIFT-ZS and Chronos-ZS at minuscule cost. NeurIPS 2025.
FlowState
Foundation ModelSalesforce AI Research·4 versions
Moirai
Masked-encoder universal forecasting transformer trained on LOTSA (~27B observations across nine domains). Multiple patch-size projection layers handle frequency diversity; an any-variate attention mechanism handles arbitrary numbers of covariates; a mixture-distribution head models flexible predictive distributions.
Moirai 1.1-R SmallMoirai 1.1-R BaseMoirai 1.1-R LargeMoirai 2.0-R Small
Foundation ModelAuton Lab, CMU·3 versions
MOMENT
Encoder-only family of open time-series foundation models, pretrained on the Time-Series Pile. Building blocks for forecasting, classification, anomaly detection, and imputation; effective zero-shot and tunable with light task-specific data.
MOMENT-1 SmallMOMENT-1 BaseMOMENT-1 Large
Foundation ModelTHUML, Tsinghua University·1 version
Sundial
Generative TSFM pretrained on TimeBench (~1 trillion time points). Introduces TimeFlow Loss to predict next-patch distributions directly, removing the need for discrete tokenisation and enabling non-deterministic, probabilistic forecasts. ICML 2025 Oral.
Sundial Base 128M
Foundation ModelPrior Labs (Frank Hutter)·1 version
TabPFN-TS
Treats forecasting as tabular regression: lightweight temporal features (lags, calendar) are fed into the pretrained tabular foundation model TabPFN-v2. No time-series-specific pretraining, yet SOTA on covariate-informed forecasting in GIFT-Eval at only 11M parameters.
TabPFN-TS
Foundation ModelTime-MoE collaboration·2 versions
Time-MoE
Decoder-only foundation model with a sparse mixture-of-experts FFN. Only a subset of experts is activated per token, enabling billion-scale capacity at modest inference cost. Pretrained on Time-300B (>300B time points across nine domains).
Time-MoE 50MTime-MoE 200M
Foundation ModelGoogle Research·2 versions
TimesFM
Decoder-only foundation model pre-trained on ~100B real-world time points. Patched-decoder attention generalises across history lengths, horizons, and frequencies; zero-shot performance closes the gap to fully supervised baselines at a fraction of the parameter count.
TimesFM 2.0 (500M)TimesFM 2.5 (200M)
Foundation ModelIBM Research / IBM Granite·4 versions
TinyTimeMixer (TTM)
Compact pretrained model (~1M parameters) built on the TSMixer MLP-mixer architecture. Uses adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle multi-frequency pretraining. Runs on CPU, beats much larger models in zero/few-shot forecasting. NeurIPS 2024.
TTM R1 (ctx 512 / horizon 96)TTM R1 (ctx 1024 / horizon 96)TTM R2 (ctx 512 / horizon 96)TTM R2 (ctx 1024 / horizon 96)
Foundation ModelNX-AI (Hochreiter group)·1 version
TiRex
35M-parameter xLSTM-based zero-shot forecaster. Uses Contiguous Patch Masking during training to stabilise long-horizon autoregressive generation. Reports SOTA on GIFT-Eval and Chronos-ZS, outperforming much larger transformer models.
TiRex
Foundation ModelDatadog·1 version
Toto
Decoder-only multivariate transformer optimised for observability metrics. Pre-trained on a mix of Datadog telemetry, open datasets, and synthetic data — 4–10× larger than the pretraining corpora of competing TSFMs. Ships with the BOOM observability benchmark.
Toto Open Base 1.0
Foundation ModelVisionTS authors·2 versions
VisionTS / VisionTS++
Reframes forecasting as image reconstruction: render the time series as an image and let a visual masked autoencoder (MAE) pre-trained on ImageNet fill in the future. VisionTS++ continues pretraining the vision backbone on large-scale time-series data and adds probabilistic and multi-channel forecasting.
VisionTS++ BaseVisionTS++ Large
Statistical BaselineClassical forecasting·4 versions
Statistical Baselines
Reference rule-based baselines that every foundation model should beat. They have no learned parameters; they exist on the leaderboard so that absolute scores have an interpretable floor — if a TSFM cannot outperform Seasonal Naive, something is wrong.
NaiveSeasonal NaiveSimple Moving AverageSeasonal Average