All projects

chatio benchmark

model evaluation202503 / 04

A leaderboard that evaluates LLMs on real tasks instead of abstract puzzle scores.

The benchmark scores models across advice, instruction following, reading comprehension, empathy, and creative writing. The public leaderboard also shows price, context length, output limits, provider, and response speed so the score has practical context.

Role
Evaluation design, visual system, full-stack engineering
Year
2025
Claude Opus 4.5
claude-opus-4-5-20251101
0.0
Helpfulness
Instruction Following
Comprehension
Empathy
Creative Writing
Cost per 1M$5.00 in
$25.00 out
Context Size200,000 Tokens
Max Output Tokens64,000 Tokens
Model Speed60 Tok/Sec
Interactive model picker
Chatio Benchmark model ranking and score details
Rankings, evaluation scores, pricing, and model limits