TimesFM: Google's Zero-Shot Prophet for Numbers

TL;DR

What it solves: Per-dataset forecasting pipelines that take weeks to train, validate, and deploy before you can answer “what happens next quarter?”
Why it matters: Most teams need a decent baseline forecast today, not after a model-selection sprint - bad baselines poison inventory planning, staffing, and budget conversations downstream
Best for: Data analysts, ML engineers, and product teams who have historical numbers and need point + uncertainty bands without standing up a custom training job
Main differentiator: Zero-shot inference on unseen series - pretrained on 100 billion real-world time-points, no fine-tuning required for a first pass
Usecase example: Feed 100 days of daily sales into model.forecast(horizon=12) and get next month’s trajectory plus 10th–90th percentile bands in one call

My manager asked for next quarter’s demand forecast on Tuesday. By Wednesday I had a Prophet notebook, an ARIMA baseline, and a growing suspicion that all three models disagreed because I had spent more time tuning than thinking.

Thursday I pip-installed TimesFM, passed it the same CSV, and had numbers before lunch. They were not perfect. They were consistent, fast, and honest about uncertainty. For a Tuesday ask, that was the whole game.

What TimesFM Actually Is

TimesFM is a Python package from Google Research built around a single pretrained decoder-only transformer - the same architectural family as large language models, but the tokens are patches of numbers, not words. You install it with pip install timesfm[torch] (or timesfm[flax] for JAX), load a checkpoint from Hugging Face, and call .forecast() on raw NumPy arrays.

One sentence a junior developer could repeat: TimesFM takes historical time-series values and predicts future values without training on your specific dataset first.

The latest checkpoint is TimesFM 2.5: 200M parameters (down from 500M in 2.0), up to 16k context length, and optional continuous quantile heads that output uncertainty bands up to a 1,000-step horizon. The model was pretrained on roughly 100 billion real-world time-points - Google Trends, Wikipedia pageviews, and other public corpora - so it has seen seasonal retail curves, traffic spikes, and slow decays before your spreadsheet ever reaches it.

The Mental Model

Think of two weather forecasters. The first one needs six months of local radar history before she will say anything about tomorrow. The second one has read weather patterns from every continent for a decade; you hand her last week’s temperatures and she gives you a forecast with error bars.

TimesFM is the second forecaster. Not omniscient. Not magic. But she does not make you wait for a training pipeline before she has an opinion.

Historical values  →  patch into tokens  →  decoder-only transformer  →  future patches
     (your CSV)         (groups of points)      (200M params, pretrained)     (point + quantiles)

The patch trick matters. Instead of predicting one timestep at a time (slow, error-prone over long horizons), TimesFM predicts whole chunks - input patch of 32 points might forecast the next 128. Fewer generation steps, less error accumulation. Same idea that makes LLMs practical, applied to numbers.

Markdy animation

The Smallest Working Example

import numpy as np
import timesfm

# Load pretrained TimesFM 2.5 and compile with your horizon settings
model = timesfm.TimesFM_2p5_200M_torch.from_pretrained("google/timesfm-2.5-200m-pytorch")
model.compile(timesfm.ForecastConfig(max_context=1024, max_horizon=256))

# Pass any 1D array of historical values; get point forecast + quantile bands
point, quantiles = model.forecast(
    horizon=12,
    inputs=[np.linspace(0, 1, 100)],  # 100 days of history → 12 steps ahead
)
# point.shape == (1, 12)
# quantiles.shape == (1, 12, 10)   -  mean plus 10th through 90th percentiles

Twelve lines. No train/val split. No hyperparameter grid. The quantile head is the part most baselines skip and most planners actually need.

💡 Tip: Call model.compile() once with your max context and horizon caps. Size max_horizon to your planning window so you do not recompile on every .forecast() call.

Where Google Already Ships It

This is not a research toy sitting in a notebook. Google has wired TimesFM into products people already touch:

Surface	What you get
BigQuery ML	SQL-level forecasting at warehouse scale
Google Sheets	Forecast button in connected spreadsheets
Vertex Model Garden	Dockerized endpoint for agentic pipelines

Retail demand planning is the use case Google keeps citing: even small improvements in forecast accuracy reduce inventory cost and lift revenue. If your ops team lives in Sheets and your data team lives in BigQuery, the same model bridges both without anyone retraining from scratch.

The open-source repo on GitHub (21k+ stars, Python) is the escape hatch for everyone else - local notebooks, custom pipelines, fine-tuning with LoRA via HuggingFace Transformers + PEFT if zero-shot is not enough.

Old Way vs. TimesFM

	Classic pipeline (ARIMA / Prophet / custom DL)	TimesFM zero-shot
Setup	Feature engineering, train/val split, hyperparameter search	`pip install`, load checkpoint, call `.forecast()`
Time to first forecast	Hours to weeks	Minutes
Works on new series	Retrain or transfer carefully	Out of the box
Uncertainty bands	Model-dependent, often bolted on	Built into quantile head
When it wins	Domain-specific patterns, rich covariates	Fast baseline, many heterogeneous series, cold-start problems

When Not to Use It

TimesFM is honest about what it is not.

Multivariate series with exotic covariates. Version 2.5 brought back covariate support through XReg, but if your forecast depends on fifty external regressors with complex interactions, a purpose-built model trained on your domain will still win.

Sub-hour granularity on noisy IoT streams. The pretraining corpus skews toward daily/weekly/monthly patterns - retail, traffic, search interest. High-frequency sensor data with microsecond jitter is not its home turf.

When “good enough” is not good enough. Zero-shot gets you close to supervised deep learning on public benchmarks (per the ICML 2024 paper), but “close” is not “best” on every dataset. If forecast error directly moves millions in inventory, you will still want to fine-tune or ensemble.

GPU memory on huge batches. The 200M model is small by LLM standards, but batching thousands of long-context series locally still wants a real accelerator. BigQuery ML exists partly because your laptop is not a warehouse.

The README also notes plainly: this open repo is not an officially supported Google product. Production SLAs live in Vertex and BigQuery, not in a GitHub issue thread.

⚠️ Warning: Treat the GitHub repo as a research escape hatch, not a governed endpoint. If forecast error moves inventory dollars, validate zero-shot output in a notebook first, then graduate to BigQuery ML or Vertex when you need audit trails and SLAs.

The Part That Surprised Me

I expected another forecasting library with a clever architecture diagram and a benchmark table. What I did not expect was the agent skill - timesfm-forecasting/SKILL.md in the repo - written so an AI coding agent can call the model correctly without hallucinating API shapes. Google Research repos rarely ship with that. It tells me they expect people to wire this into automated pipelines, not just academic ablations.

Fine-tuning examples landed in April 2026 (LoRA via HuggingFace + PEFT). Flax builds for faster inference. Unit tests in tests/. The project is moving like a product, not a paper artifact.

Closing

My manager still got a forecast on Friday. Two of them, actually - Prophet and TimesFM, side by side. We used TimesFM’s quantile bands for the planning meeting and kept Prophet as a sanity check.

The second forecaster did not replace the first. She just made Tuesday’s question answerable by Thursday, which is most of what forecasting in the real world actually needs.

google-research/timesfm · Apache-2.0 · 22k · blog