Qwen3 72B Alibaba Cloud
Input
¥0.90
Output
¥2.00

Latest flagship, switchable thinking mode, 1M context, improved code/math/reasoning

Qwen2.5 72B Alibaba Cloud
Input
¥0.80
Output
¥1.80

Stable version, excellent Chinese, strong coding, open-weight & fine-tunable

DeepSeek V3 DeepSeek
Input
¥0.50
Output
¥1.50

Top open-source; excellent math reasoning; 1/10 the cost of GPT-4o

DeepSeek R1 DeepSeek
Input
¥1.00
Output
¥4.00

Reasoning model; best for complex logic, multi-step reasoning, code debugging

Kimi K2.6 Moonshot
Input
¥0.80
Output
¥2.00

Latest multimodal, text/image/video input, improved long-context code

Kimi K2.5 Moonshot
Input
¥0.70
Output
¥1.80

Stable version, 200K context, great for docs/contracts/papers

MiniMax M3 MiniMax
Input
¥0.30
Output
¥1.00

Latest flagship, optimized long context, 1M tokens, great for document analysis

MiniMax M2.7 MiniMax
Input
¥0.25
Output
¥0.90

Stable version, excellent cost-performance, strong multilingual

Doubao 1.8 ByteDance
Input
¥0.20
Output
¥0.80

Volcengine; ultra-low price, fluent Chinese, perfect for daily light tasks

Yi Lightning 01.AI
Input
¥0.60
Output
¥2.00

Global SOTA MoE model, ultra-fast inference, strong multilingual

International models — may be unstable in China
GPT-4o mini OpenAI
Input
$0.15
Output
$0.60

Fast response, ideal for daily tasks, best cost-performance

GPT-5.5 OpenAI
Input
$2.50
Output
$10.00

Top reasoning, best for complex tasks & long text

Claude Opus 4.6 Anthropic
Input
$3.00
Output
$5.00

Deep reasoning, long context, excellent coding

Gemini 3.5 Flash Google
Input
$0.075
Output
$9.00

Ultra-low price, 1M context, Google best value

Groq Mixtral 8x7B Groq
Input
$0.24
Output
$0.24

Ultra-fast inference, Groq LPU chip, extremely fast response

Prices in CNY (¥) or USD ($) per million tokens. Pay per actual usage.

Selection Guide
Model Pros Cons / Limits Best For
GPT-5.5 Top Reasoning Best for Code High Cost Weaker in Chinese Complex reasoning, code development, long-form writing, research
GPT-4o mini Fast Speed Low Cost Limited Complex Task Ability Daily chat, customer service, batch processing, light tasks
Claude Opus 4.6 Long Context Deep Analysis Unstable Access in China Long doc analysis, code review, creative writing, academic research
Gemini 3.5 Flash Best Cost Performance 100K Context Average Chinese Ability Massive data processing, ultra-long text summarization, multimodal tasks
Qwen2.5 72B Strongest Chinese Open Source, Fine-tunable Weaker than Top Closed-source Chinese content generation, chatbot, knowledge base Q&A
DeepSeek R1 Strong Reasoning Extremely Low Cost No Advantage in Non-reasoning Math proofs, code debugging, multi-step logic, complex problems
DeepSeek V3 Best Value Open Source Newer Ecosystem General chat, code generation, moderate tasks
Kimi 200K Ultra Long Context Great Chinese Experience Average Output Speed Contract review, paper summarization, long docs, multi-file analysis
MiniMax Text-01 1M Ultra Long Memory Low Price Ecosystem Less Mature Ultra-long text analysis, multi-turn memory, large-scale data mining
Doubao Pro Extremely Low Price Fluent Chinese Limited Complex Reasoning Lightweight support, daily chat, content moderation, batch tasks
Yi Large 200K Context Strong Multilingual Low Brand Awareness Multilingual apps, long text tasks, translation, content creation
Why Token API?

Self-hosting vs API

Self-hosting requires: GPU servers (expensive) + model download + environment setup + ongoing maintenance + upgrades. API needs only one key, pay-per-use, zero ops.

  • No need to purchase expensive GPUs, pay per usage
  • No server management needed, zero ops burden
  • Switch models anytime, find the best for your task
  • Latest models available instantly, no self-deployment

How to Choose the Right Model?

Not the most expensive, not the largest. Key is matching task characteristics.

  • Daily chat / Support → GPT-4o mini, Doubao Pro (cheap & fast)
  • Complex reasoning / Code → GPT-5.5, DeepSeek R1 (strong reasoning)
  • Long doc analysis → Kimi 200K, MiniMax (ultra-long context)
  • Chinese content → Qwen2.5, Yi Large (Chinese-optimized)
  • Large-scale data → Gemini 3.5 Flash (high speed + agentic tasks)

API Endpoints

OpenAI-compatible API, switch with one line, no code changes needed

POST
/v1/chat/completions
Call model, OpenAI-compatible format
POST
/auth/register
Register account, get API Key
POST
/user/topup
Top up balance, WeChat / Alipay
GET
/user/balance/{api_key}
Check account balance
GET
/v1/models
List available models

OpenAI-compatible API — zero migration, swap base URL to connect

View Quickstart Guide →
\n \n