TimesFM: Google's Zero-Shot Prophet for Numbers

7 min read Tiếng Việt
Featured image for google-research/timesfm — TimesFM: Google's Zero-Shot Prophet for Numbers

TL;DR

  • What it solves: Per-dataset forecasting pipelines that take weeks to train, validate, and deploy before you can answer “what happens next quarter?”
  • Why it matters: Most teams need a decent baseline forecast today, not after a model-selection sprint - bad baselines poison inventory planning, staffing, and budget conversations downstream
  • Best for: Data analysts, ML engineers, and product teams who have historical numbers and need point + uncertainty bands without standing up a custom training job
  • Main differentiator: Zero-shot inference on unseen series - pretrained on 100 billion real-world time-points, no fine-tuning required for a first pass
  • Usecase example: Feed 100 days of daily sales into model.forecast(horizon=12) and get next month’s trajectory plus 10th–90th percentile bands in one call

My manager asked for next quarter’s demand forecast on Tuesday. By Wednesday I had a Prophet notebook, an ARIMA baseline, and a growing suspicion that all three models disagreed because I had spent more time tuning than thinking.

Thursday I pip-installed TimesFM, passed it the same CSV, and had numbers before lunch. They were not perfect. They were consistent, fast, and honest about uncertainty. For a Tuesday ask, that was the whole game.

What TimesFM Actually Is

TimesFM is a Python package from Google Research built around a single pretrained decoder-only transformer - the same architectural family as large language models, but the tokens are patches of numbers, not words. You install it with pip install timesfm[torch] (or timesfm[flax] for JAX), load a checkpoint from Hugging Face, and call .forecast() on raw NumPy arrays.

One sentence a junior developer could repeat: TimesFM takes historical time-series values and predicts future values without training on your specific dataset first.

The latest checkpoint is TimesFM 2.5: 200M parameters (down from 500M in 2.0), up to 16k context length, and optional continuous quantile heads that output uncertainty bands up to a 1,000-step horizon. The model was pretrained on roughly 100 billion real-world time-points - Google Trends, Wikipedia pageviews, and other public corpora - so it has seen seasonal retail curves, traffic spikes, and slow decays before your spreadsheet ever reaches it.

The Mental Model

Think of two weather forecasters. The first one needs six months of local radar history before she will say anything about tomorrow. The second one has read weather patterns from every continent for a decade; you hand her last week’s temperatures and she gives you a forecast with error bars.

TimesFM is the second forecaster. Not omniscient. Not magic. But she does not make you wait for a training pipeline before she has an opinion.

Historical values  →  patch into tokens  →  decoder-only transformer  →  future patches
     (your CSV)         (groups of points)      (200M params, pretrained)     (point + quantiles)

The patch trick matters. Instead of predicting one timestep at a time (slow, error-prone over long horizons), TimesFM predicts whole chunks - input patch of 32 points might forecast the next 128. Fewer generation steps, less error accumulation. Same idea that makes LLMs practical, applied to numbers.

The Smallest Working Example

import numpy as np
import timesfm

# Load pretrained TimesFM 2.5 and compile with your horizon settings
model = timesfm.TimesFM_2p5_200M_torch.from_pretrained("google/timesfm-2.5-200m-pytorch")
model.compile(timesfm.ForecastConfig(max_context=1024, max_horizon=256))

# Pass any 1D array of historical values; get point forecast + quantile bands
point, quantiles = model.forecast(
    horizon=12,
    inputs=[np.linspace(0, 1, 100)],  # 100 days of history → 12 steps ahead
)
# point.shape == (1, 12)
# quantiles.shape == (1, 12, 10)   -  mean plus 10th through 90th percentiles

Twelve lines. No train/val split. No hyperparameter grid. The quantile head is the part most baselines skip and most planners actually need.

💡 Tip: Call model.compile() once with your max context and horizon caps. Size max_horizon to your planning window so you do not recompile on every .forecast() call.

Where Google Already Ships It

This is not a research toy sitting in a notebook. Google has wired TimesFM into products people already touch:

SurfaceWhat you get
BigQuery MLSQL-level forecasting at warehouse scale
Google SheetsForecast button in connected spreadsheets
Vertex Model GardenDockerized endpoint for agentic pipelines

Retail demand planning is the use case Google keeps citing: even small improvements in forecast accuracy reduce inventory cost and lift revenue. If your ops team lives in Sheets and your data team lives in BigQuery, the same model bridges both without anyone retraining from scratch.

The open-source repo on GitHub (21k+ stars, Python) is the escape hatch for everyone else - local notebooks, custom pipelines, fine-tuning with LoRA via HuggingFace Transformers + PEFT if zero-shot is not enough.

Old Way vs. TimesFM

Classic pipeline (ARIMA / Prophet / custom DL)TimesFM zero-shot
SetupFeature engineering, train/val split, hyperparameter searchpip install, load checkpoint, call .forecast()
Time to first forecastHours to weeksMinutes
Works on new seriesRetrain or transfer carefullyOut of the box
Uncertainty bandsModel-dependent, often bolted onBuilt into quantile head
When it winsDomain-specific patterns, rich covariatesFast baseline, many heterogeneous series, cold-start problems

When Not to Use It

TimesFM is honest about what it is not.

Multivariate series with exotic covariates. Version 2.5 brought back covariate support through XReg, but if your forecast depends on fifty external regressors with complex interactions, a purpose-built model trained on your domain will still win.

Sub-hour granularity on noisy IoT streams. The pretraining corpus skews toward daily/weekly/monthly patterns - retail, traffic, search interest. High-frequency sensor data with microsecond jitter is not its home turf.

When “good enough” is not good enough. Zero-shot gets you close to supervised deep learning on public benchmarks (per the ICML 2024 paper), but “close” is not “best” on every dataset. If forecast error directly moves millions in inventory, you will still want to fine-tune or ensemble.

GPU memory on huge batches. The 200M model is small by LLM standards, but batching thousands of long-context series locally still wants a real accelerator. BigQuery ML exists partly because your laptop is not a warehouse.

The README also notes plainly: this open repo is not an officially supported Google product. Production SLAs live in Vertex and BigQuery, not in a GitHub issue thread.

⚠️ Warning: Treat the GitHub repo as a research escape hatch, not a governed endpoint. If forecast error moves inventory dollars, validate zero-shot output in a notebook first, then graduate to BigQuery ML or Vertex when you need audit trails and SLAs.

The Part That Surprised Me

I expected another forecasting library with a clever architecture diagram and a benchmark table. What I did not expect was the agent skill - timesfm-forecasting/SKILL.md in the repo - written so an AI coding agent can call the model correctly without hallucinating API shapes. Google Research repos rarely ship with that. It tells me they expect people to wire this into automated pipelines, not just academic ablations.

Fine-tuning examples landed in April 2026 (LoRA via HuggingFace + PEFT). Flax builds for faster inference. Unit tests in tests/. The project is moving like a product, not a paper artifact.

Closing

My manager still got a forecast on Friday. Two of them, actually - Prophet and TimesFM, side by side. We used TimesFM’s quantile bands for the planning meeting and kept Prophet as a sanity check.

The second forecaster did not replace the first. She just made Tuesday’s question answerable by Thursday, which is most of what forecasting in the real world actually needs.


google-research/timesfm · Apache-2.0 · 22k · blog

Hoang Yell

Hoang Yell

A software developer and technical storyteller. I spend my time exploring the most interesting open-source repositories on GitHub and presenting them as accessible stories for everyone.