chatio benchmark
model evaluation202503 / 04
A leaderboard that evaluates LLMs on real tasks instead of abstract puzzle scores.
The benchmark scores models across advice, instruction following, reading comprehension, empathy, and creative writing. The public leaderboard also shows price, context length, output limits, provider, and response speed so the score has practical context.
- Role
- Evaluation design, visual system, full-stack engineering
- Year
- 2025
Claude Opus 4.5
claude-opus-4-5-20251101
Helpfulness
Instruction
Following
Comprehension
Empathy
Creative
Writing
Cost per 1M$5.00 in
$25.00 out
Context Size200,000 Tokens
Max Output Tokens64,000 Tokens
Model Speed60 Tok/Sec
