VARIANT · Z.ai / GLM

GLM glm-4.5 API

Same API key, same model skill workflow — switch variants by changing one model ID.

Operational · text · Commercial OK

runapi.ai

# Base URL
https://runapi.ai

# Endpoints
POST /v1/chat/completions

curl https://runapi.ai/v1/chat/completions \
  -H "Authorization: Bearer $RUNAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "glm-4.5",
  "messages": [
    {
      "role": "user",
      "content": "Read this multi-file repository, find the failing integration test, and propose a patch with an explanation of the root cause."
    }
  ]
}'

from openai import OpenAI

client = OpenAI(
    base_url="https://runapi.ai/v1",
    api_key="your-runapi-key"
)

response = client.chat.completions.create(
    model="glm-4.5",
    messages=[{"role": "user", "content": "Read this multi-file repository, find the failing integration test, and propose a patch with an explanation of the root cause."}]
)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://runapi.ai/v1",
  apiKey: "your-runapi-key"
});

const response = await client.chat.completions.create({
  model: "glm-4.5",
  messages: [{ role: "user", content: "Read this multi-file repository, find the failing integration test, and propose a patch with an explanation of the root cause." }]
});

https://runapi.ai /v1/chat/completions

Switch variant

glm-4.5-air glm-4.6 glm-4.7 glm-5 glm-5-turbo glm-5.1

OVERVIEW

glm-4.5 targets the sweet spot of quality and cost within the GLM family.

Pay-per-call pricing in USD
Failed generations not charged
Streaming when supported by the model
Schema-validated tool calls

PRICING

Pricing

Failed generations are not charged

Chat completion

Input $0.30 / 1M tokens

Output $1.10 / 1M tokens

Cache read $0.06

Cache write 5m Free

SPEC SHEET

Technical details

Model ID	glm-4.5
Provider	Z.ai
Modality	text
Task type	synchronous
Billing unit	1K tokens
API endpoint	/v1/chat/completions
Commercial license	Yes — included via API
Status	Operational

SKILLS

Model skill — glm-4.5

Install the skill once, then use the variant ID from this page while building.

Endpoint	Protocol
/v1/chat/completions	OpenAI compatible

HOW IT WORKS

Use glm-4.5 with a model skill

01

Install

Install the model skill for this model line.

02

Configure

Set the model field to the full model ID shown on this page.

03

Call

Use the skill instructions while adding prompt, input, and callback handling to your app.

04

Receive

Read the task response, webhook callback, or cached output URL from RunAPI.

DIFFERENCES

What's different about glm-4.5

VS GLM-4.5-AIR

355B / 32B active; 128K context; flagship open-weight MoE baseline

Lighter GLM-4.5 tier for fast, lower-cost everyday work

VS GLM-4.6

355B / 32B active; 128K context; flagship open-weight MoE baseline

200K context; first GLM on Cambricon chips; sharper code generation

VS GLM-4.7

355B / 32B active; 128K context; flagship open-weight MoE baseline

200K context; 73.8% SWE-bench; persistent thinking across turns

USE CASES

Best for

Customer support

Answer customer questions from a private knowledge base, reducing ticket volume.

Document analysis

Draft contract summaries and flag key clauses for attorney review.

Code generation

Auto-generate unit tests, code reviews, and refactoring suggestions in CI.

FAQ

Frequently asked questions about glm-4.5

Is the model ID stable across versions?

RunAPI keeps the model ID stable and handles compatible version refreshes without changing your request shape.

What's the rate limit on this variant?

Per-key rate limits scale with usage tier. See pricing page for current limits.

Can I switch variants later?

Yes — variant is a flag. Switch by changing the model parameter.

Does it stream?

Where streaming is available, RunAPI streams end-to-end.

Where do I report quality issues?

Open an issue on the public GitHub repo or email support.

Other variants of GLM

glm-4.5-air cheapest

$0.010 / 1K tokens

$0.020 / 1K tokens

$0.020 / 1K tokens

$0.020 / 1K tokens

glm-5-turbo fast

$0.020 / 1K tokens

$0.030 / 1K tokens

Alternatives from other models

Claude API access for Anthropic's LLM across complex reasoning, code, analysis, and extended-context tasks.

DeepSeek API access via RunAPI — flash for fast, low-cost work; pro for complex agentic tasks.

OpenAI text embeddings for semantic search, retrieval, clustering, and ranking workflows.

START NOW

Start building with GLM.

Create free account Read the quickstart →