TinyTimeMixer (TTM)
Compact pretrained model (~1M parameters) built on the TSMixer MLP-mixer architecture. Uses adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle multi-frequency pretraining. Runs on CPU, beats much larger models in zero/few-shot forecasting. NeurIPS 2024.
TinyTimeMixer (TTM) is IBM's answer to the “do we really need a billion parameters?” question. The backbone is the all-MLP TSMixer design — interleaved feature- and patch-mixing blocks with gated attention — and the released checkpoints sit at roughly one million parameters each.
Three pretraining tricks let a single TTM checkpoint handle many frequencies and horizons: adaptive patching (patch size scales with input resolution), diverse resolution sampling, and resolution prefix tuning. The result is a CPU-friendly model that reports 4–40% improvements over much larger zero/few-shot baselines.
The four versions on the leaderboard differ in pretraining generation (R1 vs R2) and in their fixed context/horizon size (512/1024 input steps, 96 output steps).
Versions on TS-Arena
Each version below corresponds to one registered model id in the leaderboard. Click through to its detail page for per-model rankings, forecasts, and history.
- TTM R1 (ctx 512 / horizon 96)tinytimemixer-r1-512-961M params…
First-generation checkpoint, 512-step context, 96-step horizon.
- TTM R1 (ctx 1024 / horizon 96)tinytimemixer-r1-1024-961M params…
First-generation checkpoint, longer 1024-step context.
- TTM R2 (ctx 512 / horizon 96)tinytimemixer-r2-512-961M params…
Refreshed pretraining (R2), 512-step context.
- TTM R2 (ctx 1024 / horizon 96)tinytimemixer-r2-1024-961M params…
Refreshed pretraining (R2), 1024-step context.