Cached input
Repeated prompt prefixes are billed at a reduced input rate. RunAPI passes the discount through at 50% of OpenAI's cached rate.
GPT-5.4 costs $2.50 input and $15 output per million tokens; GPT-5.5 costs $5 and $30. RunAPI mirrors every GPT model at half the official rate — same API, same output, 50% less on your invoice.
OpenAI prices each GPT model per million tokens, with separate input and output rates and a cheaper cached-input rate. All figures below are per million tokens, the billing unit OpenAI uses.
GPT-5.4 at $1.25/M input and $7.50/M output through RunAPI. Official rate is $2.50/$15.
GPT-5.4-mini at a fraction of the flagship rate, billed at 50% off through RunAPI.
GPT-5.5 at $2.50/M input and $15/M output through RunAPI. Official rate is $5/$30.
Cached input tokens cost a fraction of standard input — passed through at 50% on RunAPI.
The table shows official OpenAI pricing alongside RunAPI pricing. RunAPI applies a flat 50% discount across all GPT models. No volume commits, no subscriptions.
| Model | Official input /M | Official output /M | RunAPI input /M | RunAPI output /M | Context window |
|---|---|---|---|---|---|
| GPT-5.5 | $5.00 | $30.00 | $2.50 | $15.00 | 400K |
| GPT-5.4 | $2.50 | $15.00 | $1.25 | $7.50 | 400K |
| GPT-5.4-mini | $0.25 | $2.00 | $0.13 | $1.00 | 400K |
| GPT-5.3-codex | $2.50 | $15.00 | $1.25 | $7.50 | 400K |
OpenAI charges less for cached input tokens and offers a deep discount on batch requests that tolerate delayed turnaround. Both matter for repetitive workloads like coding agents and bulk processing.
Repeated prompt prefixes are billed at a reduced input rate. RunAPI passes the discount through at 50% of OpenAI's cached rate.
Requests submitted to the Batch API run at half the standard rate with up to 24-hour turnaround. RunAPI passes this through on top of its own discount.
GPT-5 models let you set reasoning effort. Lower effort emits fewer reasoning tokens, directly reducing output cost on metered billing.
Cap max output tokens per request to bound cost and avoid runaway generations on long agentic tasks.
Token rates look abstract until attached to real tasks. Below are common developer workloads with estimated monthly costs at two usage levels, billed at RunAPI rates.
| Workload | Model | Light use (~50 tasks/day) | Heavy use (~200 tasks/day) | Monthly saving vs official |
|---|---|---|---|---|
| Coding agent (Codex) | GPT-5.3-codex | $20/mo | $80/mo | $20–$80 |
| Customer-support chatbot | GPT-5.4-mini | $6/mo | $24/mo | $6–$24 |
| RAG knowledge assistant | GPT-5.4 | $18/mo | $72/mo | $18–$72 |
| Content generation pipeline | GPT-5.4 | $25/mo | $100/mo | $25–$100 |
| Multi-agent orchestrator | GPT-5.5 | $90/mo | $360/mo | $90–$360 |
Developers weigh GPT against Claude and Gemini. Here is how the flagship models compare on a per-million-token basis, with RunAPI rates alongside.
| Provider | Flagship model | Input /M | Output /M | RunAPI rate |
|---|---|---|---|---|
| OpenAI | GPT-5.4 | $2.50 | $15.00 | $1.25 / $7.50 |
| Anthropic | Claude Opus 4.7 | $10.00 | $50.00 | $5.00 / $25.00 |
| Gemini 2.5 Pro | $1.25 | $10.00 | $0.63 / $5.00 |
RunAPI applies a 50% discount on all providers listed above. Prices verified June 2026.
Sign up at runapi.ai. No credit card required for the free tier.
Go to Dashboard → API Keys. Create a key and save it — you will use this as your OpenAI API key.
Set the base URL to https://api.runapi.ai/v1 and use your RunAPI key. Any OpenAI-compatible client works.
Use gpt-5.4, gpt-5.5, or any GPT model ID in the model parameter. RunAPI handles routing and billing at 50% of the official rate.
GPT-5.4 costs $2.50 per million input tokens and $15 per million output tokens officially. GPT-5.5 costs $5 and $30. Through RunAPI, every GPT model is billed at half those rates — GPT-5.4 runs $1.25 input and $7.50 output per million tokens.
RunAPI negotiates volume pricing with model providers and passes the savings on to developers. Requests reach the same OpenAI models with identical output, safety filters, and behavior, so the only difference is the lower rate on your invoice. There is no quality trade-off and no separate billing tier — the discount applies automatically to every GPT model.
Yes. OpenAI bills repeated prompt prefixes at a reduced cached-input rate, which lowers cost for agents that resend the same context. RunAPI passes the cache discount through at 50% of OpenAI's cached rate, so caching savings stack with the base discount.
The Batch API runs requests at 50% of the standard rate in exchange for up to 24-hour turnaround. It suits bulk jobs that do not need instant responses. RunAPI passes this discount through, so batch work is billed at half of the already-discounted rate.
On flagship input tokens, GPT-5.4 at $2.50 sits between Gemini 2.5 Pro at $1.25 and Claude Opus at $10. The cheapest choice depends on the model tier and workload. RunAPI halves the rate for all three, so the relative ranking stays the same.
Yes. RunAPI is OpenAI-compatible. Point any OpenAI client at https://api.runapi.ai/v1, use your RunAPI key, and pass a GPT model ID. Existing code that already uses the OpenAI SDK works without any changes beyond the base URL and key, so migrating an established project takes about a minute.
Yes. GPT-5.3-codex is available through RunAPI at 50% of the official rate, which is $1.25 input and $7.50 output per million tokens. It works with Codex and other OpenAI-compatible coding tools by overriding the base URL and key in their settings. Cached input and batch discounts also pass through, lowering the effective cost of repetitive coding sessions further.
Yes. New RunAPI accounts receive free credits to test any GPT model before committing. After that, billing is strictly pay-as-you-go with no minimum spend, no subscription, and no monthly commitment — you fund a balance and each call deducts its token cost. You can top up any amount and watch usage per model in the dashboard.
Create a free RunAPI account, get your API key, and call any OpenAI GPT model at 50% off official pricing.