Foundation ModelIBM Research / IBM Granite

TinyTimeMixer (TTM)

Compact pretrained model (~1M parameters) built on the TSMixer MLP-mixer architecture. Uses adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle multi-frequency pretraining. Runs on CPU, beats much larger models in zero/few-shot forecasting. NeurIPS 2024.

TinyTimeMixer (TTM) is IBM's answer to the “do we really need a billion parameters?” question. The backbone is the all-MLP TSMixer design — interleaved feature- and patch-mixing blocks with gated attention — and the released checkpoints sit at roughly one million parameters each.

Three pretraining tricks let a single TTM checkpoint handle many frequencies and horizons: adaptive patching (patch size scales with input resolution), diverse resolution sampling, and resolution prefix tuning. The result is a CPU-friendly model that reports 4–40% improvements over much larger zero/few-shot baselines.

The four versions on the leaderboard differ in pretraining generation (R1 vs R2) and in their fixed context/horizon size (512/1024 input steps, 96 output steps).

Versions on TS-Arena

Each version below corresponds to one registered model id in the leaderboard. Click through to its detail page for per-model rankings, forecasts, and history.

  • TTM R1 (ctx 512 / horizon 96)
    tinytimemixer-r1-512-96
    1M params

    First-generation checkpoint, 512-step context, 96-step horizon.

  • TTM R1 (ctx 1024 / horizon 96)
    tinytimemixer-r1-1024-96
    1M params

    First-generation checkpoint, longer 1024-step context.

  • TTM R2 (ctx 512 / horizon 96)
    tinytimemixer-r2-512-96
    1M params

    Refreshed pretraining (R2), 512-step context.

  • TTM R2 (ctx 1024 / horizon 96)
    tinytimemixer-r2-1024-96
    1M params

    Refreshed pretraining (R2), 1024-step context.