Foundation ModelVisionTS authors

VisionTS / VisionTS++

Reframes forecasting as image reconstruction: render the time series as an image and let a visual masked autoencoder (MAE) pre-trained on ImageNet fill in the future. VisionTS++ continues pretraining the vision backbone on large-scale time-series data and adds probabilistic and multi-channel forecasting.

The original VisionTS is the most architecturally surprising entry on the leaderboard: it doesn't train a time-series model at all. The history is rendered as a 2-D image, the model is a plain visual masked autoencoder (MAE) pretrained on ImageNet, and forecasting is performed by asking the MAE to reconstruct the masked future region of the image. No time-series-specific pretraining, no fine-tuning — yet it delivers competitive zero-shot accuracy.

VisionTS++ is the natural follow-up and the version actually evaluated here: take the ImageNet-pretrained vision backbone and continually pretrain it on large-scale time-series data to close the distribution gap, while adding probabilistic outputs and native multi-channel forecasting. TS-Arena runs the Base (86M) and Large (307M) VisionTS++ checkpoints.

Versions on TS-Arena

Each version below corresponds to one registered model id in the leaderboard. Click through to its detail page for per-model rankings, forecasts, and history.

  • VisionTS++ Base
    visiontspp-base
    86M params

    Continually-pretrained ViT-B vision backbone.

  • VisionTS++ Large
    visiontspp-large
    307M params

    Continually-pretrained ViT-L vision backbone.