TS-Arena

Time-MoE is a family of decoder-only foundation models published as an ICLR 2025 Spotlight. Each transformer block replaces the dense feed-forward layer with a mixture-of-experts router: only a few experts are active per token, so the parameter count grows independently of the per-token compute. This is the same idea that powers Switch Transformer / Mixtral in language modelling, applied to autoregressive time-series forecasting with context up to 4096 steps.

Pretraining uses Time-300B — over 300 billion time points across nine domains — and the released model line scales to 2.4B parameters in the paper. The TS-Arena leaderboard runs the 50M and 200M active-parameter checkpoints.

Versions on TS-Arena

Each version below corresponds to one registered model id in the leaderboard. Click through to its detail page for per-model rankings, forecasts, and history.

Time-MoE 50M

time-moe-50m

50M params…

Smallest active variant; trained on Time-300B.

Time-MoE 200M

time-moe-200m

200M params…

Larger active variant of the same architecture and data.