MARKETS

PROVIDERS

About

RESOURCES

MODEL FAMILY · GLM

FLAGSHIP OPEN-WEIGHT
BILINGUAL REASONING.

GLM is Z.ai's open-weight family of large language models, led by GLM-5.2 — a 753B-parameter multimodal flagship with 1M-token context, FP8 inference, native tool calling, code generation, explicit reasoning, and vision inputs. Procure GLM inference through Compute Exchange Token Forwards or Reserved GPU Rental.

REQUEST GLM QUOTE

FAMILY LINEUP

GLM-5 AND BEYOND

The GLM-5 line is the current flagship; GLM-4.7-Flash covers latency-sensitive workloads; GLM-OCR specializes in document and vision extraction. All ship under MIT license as open-weight releases from Z.ai.

GLM-5.2

FLAGSHIP · CURRENT

Z.ai's latest flagship multimodal model — strong bilingual (Chinese-English) reasoning, long-context understanding (up to 1M tokens), vision inputs, advanced tool use, code generation, and agent-oriented behavior.

PARAMS

753B

CONTEXT

1M

QUANT

FP8

MODALITY

MULTIMODAL

TOOL CALLING

REASONING

CODE GEN

VISION

RESPONSES API

LICENSE: MIT

GLM-5.1

PREVIOUS FLAGSHIP

Previous-generation flagship in the GLM-5 line — strong bilingual reasoning, long-horizon agentic tasks (sustains thousands of tool calls), and SWE-Bench Pro state-of-the-art code generation.

PARAMS

754B

CONTEXT

200K

QUANT

FP8

MODALITY

TEXT-TO-TEXT

TOOL CALLING

REASONING

CODE GEN

LICENSE: MIT

GLM-4.7-Flash

FAST / LIGHTWEIGHT MOE

30B-parameter MoE with 3B active per token — preserved thinking mode for multi-turn agentic tasks, with speculative decoding and multi-token prediction for low-latency, high-throughput inference.

PARAMS

30B - A3B

CONTEXT

200K

QUANT

BF16

MODALITY

TEXT-TO-TEXT

TOOL CALLING

REASONING

LOW LATENCY

LICENSE: MIT

GLM-OCR

VISION / OCR SPECIALIST

CogViT visual encoder + GLM-0.5B language decoder for OCR, document parsing, formula and table recognition. #1 on OmniDocBench V1.5 (94.62). *PP-DocLayoutV3 sub-component under Apache 2.0.

PARAMS

0.9B

CONTEXT

128K

QUANT

-

MODALITY

IMAGE-TEXT-TO-TEXT

VISION

OCR

DOC PARSING

TABLES

LICENSE: MIT

CAPABILITIES

WHAT GLM DOES WELL

BILINGUAL REASONING

Strong Chinese-English reasoning across long-form text, code, and structured tasks. Competitive with Western flagships on English benchmarks and class-leading on Chinese.

1M LONG CONTEXT

Up to 1M tokens on GLM-5.2 (with sparse-attention IndexShare reducing per-token FLOPs ~2.9× at full context) — absorb full legal filings, codebases, or research corpora without chunking. Persistent agent state for multi-turn loops.

NATIVE TOOL USE

First-class function calling, structured outputs, and Responses API support. Agent-oriented architecture handles multi-step plans and tool composition.

EXPLICIT REASONING

Reasoning mode surfaces chain-of-thought scratchpads for complex math, code, and analytical tasks. Tunable depth at the API boundary.

MULTIMODAL INPUTS

GLM-5.2 accepts image and text inputs natively. Pair with the specialized GLM-OCR (0.9B, CogViT + GLM-0.5B; #1 on OmniDocBench V1.5) for high-volume document extraction pipelines.

FP8 EFFICIENCY

Native FP8 quantization keeps cost-per-token competitive and inference fast on H100-class hardware across the open-weight provider network.

WHERE GLM FITS

REPRESENTATIVE WORKLOADS

BILINGUAL ENTREPRISE ASSISTANTS

Customer-facing or internal assistants serving Chinese-English markets with consistent reasoning quality across both languages.

LONG DOCUMENT ANALYSIS

Legal filings, financial disclosures, technical specifications — 432K context absorbs full documents without retrieval chunking.

AGENTIC WORKFLOWS

Tool-calling backbone for multi-step agents — research loops, code generation pipelines, structured action sequences.

RAG WITH REDUCED RETRIEVAL

Long-context tolerance lets you pack more context per query and reduce the brittleness of retrieval recall.

MULTIMODAL DOCUMENT PIPELINES

GLM-OCR for visual extraction → GLM-5.2 for reasoning over extracted content. End-to-end open-weight document understanding.

OPEN-WEIGHT PRODUCTION INFERENCE

MIT-licensed alternative to closed flagships, with deployable weights for sovereign and on-prem buyers.

PROCUREMENT

HOW TO ACCESS GLM

Two procurement paths through Compute Exchange. Choose by who you want operating the model.

TOKEN FORWARDS

COMMITTED INFERENCE

Lock GLM inference capacity in advance, denominated in Standardized Token Units. Provider operates the model; you tap tokens against a committed balance over terms up to six months.

Provider operates and scales GLM endpoint
Per-STU rate locked at commitment
Realtime, batch, or mixed latency
Quotes against the published STU index

RESERVED GPU RENTAL

RUN YOUR OWN

Reserve H100-class capacity from the neocloud network and deploy GLM weights yourself. Full operational control — sovereign data, custom serving stack, fine-tuned variants.

MIT-licensed weights — deploy anywhere
Custom serving stack (vLLM, SGLang, TensorRT-LLM)
Sovereign / on-prem / air-gapped deployments
Terms from 1 month to 24+ months

Frequently Asked Questions

GLM, EXPLAINED

Who builds GLM?

What is the difference between GLM-5.2 and GLM-5.1?

Is GLM open-weight?

How does GLM compare to Western flagship models?

How do I procure GLM inference through Compute Exchange?

How does GLM map to the STU methodology?

PROCURE GLM CAPACITY.

Submit a commitment request and Compute Exchange returns GLM inference quotes across the verified open-model provider network.

REQUEST GLM QUOTE

DISCLAIMER

GLM is an open-weight model family developed by Z.ai (Zhipu AI), released under MIT license. Compute Exchange facilitates quotes from verified third-party inference providers serving the GLM family and does not operate inference infrastructure or guarantee model availability, performance, or SLA. All commitment terms — including pricing, tap, rollover, and settlement — are negotiated directly between buyer and provider.

COMPUTE

EXCHANGE

The transparent GPU marketplace for AI infrastructure. Built for builders.

ALL SYSTEMS OPERATIONAL

GPUs

RESERVED GPUs

REFURBISHED GPUs

USED GPUs

HARDWARE MARKET

INFORMATION

Providers

ABOUT

BLOG

EVENTS

LEGAL

Marketplace Terms

Compute Service Terms

E-sign Disclosure

Fees

Referal Agreement

Ask AI for a summary of Compute Exchange

TWITTER

BUILT FOR THE AI ERA

COMPUTE

EXCHANGE

The transparent GPU marketplace for AI infrastructure. Built for builders.

ALL SYSTEMS OPERATIONAL

GPUs

RESERVED GPUs

REFURBISHED GPUs

USED GPUs

HARDWARE MARKET

INFORMATION

Providers

ABOUT

BLOG

EVENTS

LEGAL

Marketplace Terms

Compute Service Terms

E-sign Disclosure

Fees

Referal Agreement

Ask AI for a summary of Compute Exchange

TWITTER

BUILT FOR THE AI ERA

COMPUTE

EXCHANGE

The transparent GPU marketplace for AI infrastructure. Built for builders.

ALL SYSTEMS OPERATIONAL

GPUs

RESERVED GPUs

REFURBISHED GPUs

USED GPUs

HARDWARE MARKET

INFORMATION

Providers

ABOUT

BLOG

EVENTS

LEGAL

Marketplace Terms

Compute Service Terms

E-sign Disclosure

Fees

Referal Agreement

Ask AI for a summary of Compute Exchange

TWITTER

BUILT FOR THE AI ERA

FLAGSHIP OPEN-WEIGHTBILINGUAL REASONING.

FLAGSHIP OPEN-WEIGHTBILINGUAL REASONING.

GLM-5 AND BEYOND

GLM-5.2

753B

753B

1M

1M

FP8

FP8

MULTIMODAL

MULTIMODAL

GLM-5.1

754B

754B

200K

200K

FP8

FP8

TEXT-TO-TEXT

TEXT-TO-TEXT

GLM-4.7-Flash

30B - A3B

30B - A3B

200K

200K

BF16

BF16

TEXT-TO-TEXT

TEXT-TO-TEXT

GLM-OCR

0.9B

0.9B

128K

128K

-

-

IMAGE-TEXT-TO-TEXT

IMAGE-TEXT-TO-TEXT

WHAT GLM DOES WELL

REPRESENTATIVE WORKLOADS

BILINGUAL ENTREPRISE ASSISTANTS

LONG DOCUMENT ANALYSIS

AGENTIC WORKFLOWS

RAG WITH REDUCED RETRIEVAL

MULTIMODAL DOCUMENT PIPELINES

OPEN-WEIGHT PRODUCTION INFERENCE

HOW TO ACCESS GLM

COMMITTED INFERENCE

RUN YOUR OWN

GLM, EXPLAINED

Who builds GLM?

What is the difference between GLM-5.2 and GLM-5.1?

Is GLM open-weight?

How does GLM compare to Western flagship models?

How do I procure GLM inference through Compute Exchange?

How does GLM map to the STU methodology?

PROCURE GLM CAPACITY.

PROCURE GLM CAPACITY.

Submit a commitment request and Compute Exchange returns GLM inference quotes across the verified open-model provider network.

DISCLAIMER

DISCLAIMER

FLAGSHIP OPEN-WEIGHT
BILINGUAL REASONING.

FLAGSHIP OPEN-WEIGHT
BILINGUAL REASONING.