TOKEN FORWARDS

BUY TOKENS FORWARD.
SECURE YOUR CAPACITY.

BUY TOKENS FORWARD.
SECURE YOUR CAPACITY.

BUY TOKENS FORWARD.
SECURE YOUR CAPACITY.

Procure inference tokens in advance and tap them over a term of up to six months. Lock unit economics, commit capacity terms, and secure supply ahead of demand across leading open-weight models.

Procure inference tokens in advance and tap them over a term of up to six months. Lock unit economics, commit capacity terms, and secure supply ahead of demand across leading open-weight models.

HOW IT WORKS

COMMIT. LOCK. TAP.

COMMIT. LOCK. TAP.

01

COMMIT

Specify the open model, token volume (input / cached / output), term up to six months, and whether batch processing is acceptable. We aggregate committed-use quotes across the provider network.

01

COMMIT

Specify the open model, token volume (input / cached / output), term up to six months, and whether batch processing is acceptable. We aggregate committed-use quotes across the provider network.

02

LOCK

Lock a committed per-token rate for the full term. Settlement (upfront, milestone, or monthly), priority allocation, and rollover terms surface per quote so you can compare bilaterally.

02

LOCK

Lock a committed per-token rate for the full term. Settlement (upfront, milestone, or monthly), priority allocation, and rollover terms surface per quote so you can compare bilaterally.

03

TAP

Tap tokens against your committed balance over the term, against your real demand curve. Usage reconciles per the agreed settlement schedule.

03

TAP

Tap tokens against your committed balance over the term, against your real demand curve. Usage reconciles per the agreed settlement schedule.

CONTRACT SPECIFICATION

STANDARDIZED TOKEN UNITS

STANDARDIZED TOKEN UNITS

Compute Exchange standardizes inference commitments as Standardized Token Units (STUs). Buyers commit to a chosen STU volume; providers fulfill against the published methodology.

INPUT TOKEN

UNCACHED

1.00

STU

CACHED INPUT

REUSED CONTEXT

0.20

STU

OUTPUT TOKEN

GENERATED

4.21

STU

BATCH MODE

NON-REALTIME

0.50

× DRAW

Per-STU pricing varies by model family. Providers absorb the basis between the published STU methodology and their underlying per-token economics buyers hold a single fungible commitment denominated in STUs.

PRINCIPLES

FORWARD VS ON-DEMAND

FORWARD VS ON-DEMAND

BUDGET CERTAINTY

Lock per-token unit economics for the full term. Forecast inference spend with confidence across the commitment.

BUDGET CERTAINTY

Lock per-token unit economics for the full term. Forecast inference spend with confidence across the commitment.

PRIORITY ALLOCATION

Quotes surface each provider's priority and reservation terms for committed balances during demand spikes.

PRIORITY ALLOCATION

Quotes surface each provider's priority and reservation terms for committed balances during demand spikes.

SUPPLY SECURITY

Secure token supply ahead of anticipated demand growth or open-model availability constraints.

SUPPLY SECURITY

Secure token supply ahead of anticipated demand growth or open-model availability constraints.

FLEXIBLE TAP

Tap your committed balance against real usage over the term, with rollover terms surfaced per quote.

FLEXIBLE TAP

Tap your committed balance against real usage over the term, with rollover terms surfaced per quote.

COVERAGE

OPEN MODELS ONLY

OPEN MODELS ONLY

LARGE OPEN-WEIGHT

Flagship open models — Llama, DeepSeek, Qwen class — served across the provider network at committed volume.

LARGE OPEN-WEIGHT

Flagship open models — Llama, DeepSeek, Qwen class — served across the provider network at committed volume.

SMALL & EFFICIENT

Distilled and small open models for high-volume, latency-sensitive, or cost-optimized inference.

SMALL & EFFICIENT

Distilled and small open models for high-volume, latency-sensitive, or cost-optimized inference.

MULTIMODAL & VISION

Open vision, speech, and multimodal models for document, image, and audio inference workloads.

MULTIMODAL & VISION

Open vision, speech, and multimodal models for document, image, and audio inference workloads.

EMBEDDING & SPECIALIZED

Open embedding, reranking, and classification models priced per standard inference billing units.

EMBEDDING & SPECIALIZED

Open embedding, reranking, and classification models priced per standard inference billing units.

CALIBRATION REFERENCE

CONVERSIONS BY MODEL

CONVERSIONS BY MODEL

The published STU methodology (1.0 · 0.2 · 4.21 · 0.5×) is one calibration anchored on the Kimi K2 line. Other open models have different native input:output economics. Below: how 1 input, cached, and output token convert to STUs under each model's native ratio.

HOW STU PRICING WORKS · Conversion to STU is fixed per model; the per-STU price floats by provider based on hardware and underlying economics. Providers absorb the basis between native ratios and the published index at quote time.

IMPORTANT · Per-STU pricing varies materially by model family and provider. Quotes are indicative; final terms are confirmed bilaterally with the matched provider.

MODEL

PROVIDER

INPUT

CACHED

OUTPUT

Kimi-K2.6

VISION

Moonshot AI

1.00

0.20

4.21

gpt-oss-120b

TEXT

OpenAI

1.00

varies

4.00

Nemotron-3-Nano-Omni

TEXT

NVIDIA

1.00

varies

4.00

MiniMax-M2.5

TEXT

Minimax

1.00

varies

4.00

GLM-5.2

TEXT

Z.ai

1.00

varies

3.14

GLM-5.1

TEXT

Z.ai

1.00

varies

3.14

Hermes-4-70B

TEXT

Nous Research

1.00

varies

3.08

Nemotron-3-Ultra-550b-a55b

TEXT

NVIDIA

1.00

varies

3.00

Cosmos3-Super-Reasoner

VISION

NVIDIA

1.00

varies

3.00

Nemotron-3-Super-120b-a12b

TEXT

NVIDIA

1.00

varies

3.00

Hermes-4-405B

TEXT

Nous Research

1.00

varies

3.00

Qwen3-235B-A22B-Instruct-2507

TEXT

Qwen

1.00

varies

3.00

DeepSeek-V4-Pro

TEXT

DeepSeek

1.00

varies

2.00

MiniCPM-V-4.5

VISION

OpenBMB

1.00

varies

1.69

Qwen3.5-397B-A17B

TEXT

Qwen

1.00

varies

6.00

MODEL

PROVIDER

INPUT

CACHED

OUTPUT

Kimi-K2.6

VISION

Moonshot AI

1.00

0.20

4.21

gpt-oss-120b

TEXT

OpenAI

1.00

varies

4.00

Nemotron-3-Nano-Omni

TEXT

NVIDIA

1.00

varies

4.00

MiniMax-M2.5

TEXT

Minimax

1.00

varies

4.00

GLM-5.2

TEXT

Z.ai

1.00

varies

3.14

GLM-5.1

TEXT

Z.ai

1.00

varies

3.14

Hermes-4-70B

TEXT

Nous Research

1.00

varies

3.08

Nemotron-3-Ultra-550b-a55b

TEXT

NVIDIA

1.00

varies

3.00

Cosmos3-Super-Reasoner

VISION

NVIDIA

1.00

varies

3.00

Nemotron-3-Super-120b-a12b

TEXT

NVIDIA

1.00

varies

3.00

Hermes-4-405B

TEXT

Nous Research

1.00

varies

3.00

Qwen3-235B-A22B-Instruct-2507

TEXT

Qwen

1.00

varies

3.00

DeepSeek-V4-Pro

TEXT

DeepSeek

1.00

varies

2.00

MiniCPM-V-4.5

VISION

OpenBMB

1.00

varies

1.69

Qwen3.5-397B-A17B

TEXT

Qwen

1.00

varies

6.00

Native ratios derived from active open-model provider catalogs. Cached-input ratio shown only where the provider publishes one most non-Kimi endpoints don't currently expose cached pricing publicly, so the cell reads "varies." Quotes price against the published 4.21× index; providers absorb the basis between native ratios and the index.

Frequently Asked Questions

TOKEN FORWARDS, EXPLAINED

What is a token forward?

What term lengths are available?

What happens if I do not use all my committed tokens?

Which models can I procure tokens for?

How is this different from Reserved GPU Rental?

How are token forwards settled?

SECURE YOUR TOKEN SUPPLY

SECURE YOUR TOKEN SUPPLY

Submit a commitment request and Compute Exchange returns a token forward quote across the verified open-model provider network.
DISCLAIMER
DISCLAIMER

DISCLAIMER: Token forward quotes are aggregated from verified third-party inference providers serving open-weight models. Compute Exchange facilitates introductions but does not operate inference infrastructure, take ownership of token balances, or guarantee model availability or SLA. All commitment terms — including pricing, tap, rollover, and settlement — are negotiated directly between buyer and provider.

DISCLAIMER: Token forward quotes are aggregated from verified third-party inference providers serving open-weight models. Compute Exchange facilitates introductions but does not operate inference infrastructure, take ownership of token balances, or guarantee model availability or SLA. All commitment terms — including pricing, tap, rollover, and settlement — are negotiated directly between buyer and provider.

COMPUTE

EXCHANGE

The transparent GPU marketplace for AI infrastructure. Built for builders.

ALL SYSTEMS OPERATIONAL

© 2026 COMPUTE EXCHANGE

BUILT FOR THE AI ERA

COMPUTE

EXCHANGE

The transparent GPU marketplace for AI infrastructure. Built for builders.

ALL SYSTEMS OPERATIONAL

© 2026 COMPUTE EXCHANGE

BUILT FOR THE AI ERA

COMPUTE

EXCHANGE

The transparent GPU marketplace for AI infrastructure. Built for builders.

ALL SYSTEMS OPERATIONAL

© 2026 COMPUTE EXCHANGE

BUILT FOR THE AI ERA