VARIANT · Z.ai / GLM

GLM glm-5-turbo API

Same API key, same model skill workflow — switch variants by changing one model ID.

Operational · text · Commercial OK
runapi.ai
# Base URL
https://runapi.ai

# Endpoints
POST /v1/chat/completions
curl https://runapi.ai/v1/chat/completions \
  -H "Authorization: Bearer $RUNAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "glm-5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "Read this multi-file repository, find the failing integration test, and propose a patch with an explanation of the root cause."
    }
  ]
}'
from openai import OpenAI

client = OpenAI(
    base_url="https://runapi.ai/v1",
    api_key="your-runapi-key"
)

response = client.chat.completions.create(
    model="glm-5-turbo",
    messages=[{"role": "user", "content": "Read this multi-file repository, find the failing integration test, and propose a patch with an explanation of the root cause."}]
)
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://runapi.ai/v1",
  apiKey: "your-runapi-key"
});

const response = await client.chat.completions.create({
  model: "glm-5-turbo",
  messages: [{ role: "user", content: "Read this multi-file repository, find the failing integration test, and propose a patch with an explanation of the root cause." }]
});
https://runapi.ai /v1/chat/completions
Switch variant
OVERVIEW

glm-5-turbo targets the sweet spot of quality and cost within the GLM family.

  • Pay-per-call pricing in USD
  • Failed generations not charged
  • Streaming when supported by the model
  • Schema-validated tool calls
PRICING

Pricing

Failed generations are not charged
Chat completion
Input $0.60 / 1M tokens
Output $2.00 / 1M tokens
Cache read $0.12
Cache write 5m Free
SPEC SHEET

Technical details

Model ID glm-5-turbo
Provider Z.ai
Modality text
Task type synchronous
Billing unit 1K tokens
API endpoint /v1/chat/completions
Commercial license Yes — included via API
Status Operational
SKILLS

Model skill — glm-5-turbo

Install the skill once, then use the variant ID from this page while building.

Endpoint Protocol
/v1/chat/completions OpenAI compatible
HOW IT WORKS

Use glm-5-turbo with a model skill

01

Install

Install the model skill for this model line.

02

Configure

Set the model field to the full model ID shown on this page.

03

Call

Use the skill instructions while adding prompt, input, and callback handling to your app.

04

Receive

Read the task response, webhook callback, or cached output URL from RunAPI.

DIFFERENCES

What's different about glm-5-turbo

VS GLM-4.5

Speed-optimized GLM-5 tier for lower latency

355B / 32B active; 128K context; flagship open-weight MoE baseline

VS GLM-4.5-AIR

Speed-optimized GLM-5 tier for lower latency

Lighter GLM-4.5 tier for fast, lower-cost everyday work

VS GLM-4.6

Speed-optimized GLM-5 tier for lower latency

200K context; first GLM on Cambricon chips; sharper code generation

USE CASES

Best for

Customer support

Answer customer questions from a private knowledge base, reducing ticket volume.

Document analysis

Draft contract summaries and flag key clauses for attorney review.

Code generation

Auto-generate unit tests, code reviews, and refactoring suggestions in CI.

FAQ

Frequently asked questions about glm-5-turbo

Is the model ID stable across versions?

RunAPI keeps the model ID stable and handles compatible version refreshes without changing your request shape.

What's the rate limit on this variant?

Per-key rate limits scale with usage tier. See pricing page for current limits.

Can I switch variants later?

Yes — variant is a flag. Switch by changing the model parameter.

Does it stream?

Where streaming is available, RunAPI streams end-to-end.

Where do I report quality issues?

Open an issue on the public GitHub repo or email support.

START NOW

Start building with GLM.