---
title: "Kimi API — Variants, pricing & model skill | RunAPI"
url: "https://runapi.ai/models/kimi.md"
canonical: "https://runapi.ai/models/kimi.md"
locale: "en"
model: "Kimi"
provider: "Moonshot AI"
modality: "text"
variant_count: 2
price_from_cents: 2
---

# Kimi API

Moonshot AI Kimi API access via RunAPI — 1T-parameter MoE with 256K context, 58.6% SWE-bench Pro.

**Provider:** Moonshot AI
**Modality:** Text
**Catalog:** 2 variants

Kimi is Moonshot AI&#39;s K2 family of Mixture-of-Experts language models — 1 trillion total parameters with 32B active per token, 384 experts per layer. kimi-k2.5 (256K context) added native multimodal input and strong coding benchmarks. kimi-k2.6 refines post-training for long-horizon agent stability, reaching 58.6% on SWE-bench Pro and scaling Agent Swarm orchestration to 300 sub-agents. Both are available through RunAPI with one key and per-token billing.

## Variants

| Version | Variant | Pricing | Billing | URL |
|---|---|---|---|---|
| kimi-k2.5 | `k2.5` | $0.020 | 1K tokens | https://runapi.ai/models/kimi/k2.5.md |
| kimi-k2.6 | `k2.6` | $0.020 | 1K tokens | https://runapi.ai/models/kimi/k2.6.md |


## API endpoints

Base URL: `https://runapi.ai`

- `POST /v1/chat/completions`

Use the OpenAI or Anthropic SDK with your RunAPI API key. No extra SDK required.

## Context

Kimi K2 models from Moonshot AI are 1T-parameter MoE LLMs with 256K context, optimized for autonomous coding and multi-agent orchestration. kimi-k2.6 scores 58.6% on SWE-bench Pro. Through RunAPI they share a single API key with pay-as-you-go token billing, callable from the OpenAI Chat Completions, OpenAI Responses, and Anthropic Messages surfaces.

## FAQ

### Which variant should I start with?

Pick the cheapest variant that meets your quality bar. Most teams start on the fast variant and graduate to pro for production.

### Is there a free tier?

New accounts get free first calls on every model. After that, pay per call.

### Do you stream results?

Where streaming is available, RunAPI streams end-to-end.

### How are failures billed?

Failed generations are not charged.

### Are outputs cached?

Generated outputs are stored and retrievable by task ID. Inputs are not cached.

### Can I use commercially?

Yes — commercial use is included for every variant unless a model license explicitly restricts it, which is called out on the variant page.

### What about rate limits?

Per-key rate limits scale with usage tier. See pricing page for current limits.

### Where can I report issues?

Open an issue on the public GitHub repo or email support.

