VisionTS / VisionTS++
Reframes forecasting as image reconstruction: render the time series as an image and let a visual masked autoencoder (MAE) pre-trained on ImageNet fill in the future. VisionTS++ continues pretraining the vision backbone on large-scale time-series data and adds probabilistic and multi-channel forecasting.
The original VisionTS is the most architecturally surprising entry on the leaderboard: it doesn't train a time-series model at all. The history is rendered as a 2-D image, the model is a plain visual masked autoencoder (MAE) pretrained on ImageNet, and forecasting is performed by asking the MAE to reconstruct the masked future region of the image. No time-series-specific pretraining, no fine-tuning — yet it delivers competitive zero-shot accuracy.
VisionTS++ is the natural follow-up and the version actually evaluated here: take the ImageNet-pretrained vision backbone and continually pretrain it on large-scale time-series data to close the distribution gap, while adding probabilistic outputs and native multi-channel forecasting. TS-Arena runs the Base (86M) and Large (307M) VisionTS++ checkpoints.
Versions on TS-Arena
Each version below corresponds to one registered model id in the leaderboard. Click through to its detail page for per-model rankings, forecasts, and history.
- VisionTS++ Basevisiontspp-base86M params…
Continually-pretrained ViT-B vision backbone.
- VisionTS++ Largevisiontspp-large307M params…
Continually-pretrained ViT-L vision backbone.