GPT-5.5
OpenAIApr 24 release. 1M-token context, native MCP + Skills + computer use + hosted shell.
- Intelligence Index v4.0 (xhigh): 60
- 1M token context
- Native MCP + Skills
- Computer use built-in
- Tool search + web search
- Released 2026-04-24
ثماني فئات. أربعة وعشرون نموذجًا رائدًا. تحديث شهري. مع اقتباسات صديقة للذكاء الاصطناعي.
May 28 — Anthropic ships Opus 4.8 and dethrones OpenAI on the Artificial Analysis Intelligence Index v4.0. First time since the index relaunched that Anthropic holds the top spot outright.
Previously: GPT-5.5
Released May 28. Adaptive reasoning + max-effort mode tops the AA Intelligence Index v4.0.
Apr 24 release. 1M-token context, native MCP + Skills + computer use + hosted shell.
Strong agentic frontier with Antigravity 2.0 platform integration.
Released May 28, 2026. Intelligence Index v4.0 score 61 (max effort), edging GPT-5.5 (xhigh, 60).
On LMArena ELO the still-battle-tested Opus 4.6/4.7-thinking variants outrank 4.8 simply because 4.8 has too few votes — expect that to flip by end of June.
OpenAI ships GPT Image 2 with token-based pricing and 50% Batch API discount. Recraft V4.1 holds the Artificial Analysis quality leaderboard, while Adobe Firefly enterprise mode remains the rights-cleared default.
Previously: GPT Image-2
Released April 21. State-of-the-art quality with token pricing and Batch API support.
Leads Artificial Analysis text-to-image arena on raw output quality.
IP-cleared training data; the enterprise-safe choice for commercial use.
Released April 21, 2026. Token-based pricing, Batch API at 50% off.
Recraft V4.1 leads AA's text-to-image arena on raw quality; GPT Image 2 wins on ecosystem and pricing transparency.
Seedance 2.0 (ByteDance) tops the Artificial Analysis text-to-video Arena with audio at ELO 1215 — the first model to make synced audio-visual generation state of the art. Google's Veo 3.5 and OpenAI's Sora 2 lead the silent-cinematic tier; Kling 4 holds the value end.
Previously: Veo 3.5
Tops the Artificial Analysis text-to-video Arena (with audio) at ELO 1215 — ahead of Veo and Sora. Native audio-visual generation: 15-second multi-shot clips with synced sound from text, image, audio and video inputs.
Production-grade film output with strongest temporal coherence.
Best narrative continuity across one-minute shots.
Dominant in APAC short-form ad creative; fastest iteration cycle in the space.
Seedance 2.0 takes #1 on the with-audio text-to-video Arena, ahead of Veo and Sora.
Seedance 2.0 for synced audio-visual and multi-shot narrative; Veo 3.5 for cinematic fidelity; Sora 2 for Western-ecosystem integration; Kling 4 for cost.
Claude Opus 4.7-thinking holds LMArena's WebDev top spot ahead of 4.7 and 4.6-thinking — a full Anthropic sweep. GPT-5.3 Codex remains best-in-class for terminal-bound code agents.
Previously: GPT-5.5 (Agentic)
Tops LMArena WebDev. Best-in-class for multi-file refactoring and code review.
Specialized terminal-code agent. AA Intelligence Index 54.
Ranks #3 on AA Coding Agent Index. IDE-native pair programming with multi-file context.
LMArena WebDev top-3 all Anthropic (1566 / 1558 / 1542 ELO).
Anthropic for refactor + review + multi-file editing; GPT-5.3 Codex for sandboxed terminal agents; Cursor Composer 2.5 for IDE-native pair programming.
OpenAI's Realtime 2 (May 7) brings configurable-reasoning speech-to-speech to general availability; AA's text-to-speech crown goes to Fun-Realtime-TTS, and MAI-Transcribe-1.5 wins STT on accuracy-speed.
Previously: ElevenLabs v3
GA on May 7. Configurable-reasoning speech-to-speech with realtime translate + Whisper variants.
Industry default for character voice cloning and audiobook production.
Tops AA text-to-speech leaderboard on quality metrics.
Realtime 2 family shipped May 7 (gpt-realtime-2 / -translate / -whisper).
Realtime 2 for agentic voice; ElevenLabs v3 for character voice cloning; Fun-Realtime-TTS for raw TTS quality; MAI-Transcribe-1.5 for transcription.
Suno v6 widens the gap on full-song coherence and lyric prosody; Udio v3 keeps pushing studio-grade mixing; Lyria (Google) integrates into Gemini Omni for any-to-music workflows.
Previously: Suno v5.5
Rolling release expected late June. Best full-song coherence and lyric prosody.
Studio-grade mixing with stem-level output for producers.
Folded into Gemini Omni for any-to-music + cross-modal generation.
Suno v6 expected late June; v5.5 remains the deployed default.
Suno v6 for full-song generation; Udio v3 for mixed studio-grade stems; Lyria via Gemini Omni for cross-modal generation.
Anthropic sweeps LMArena Vision top-3 with Opus 4.7-thinking, 4.6-thinking, and 4.7. Opus 4.8 is too freshly released to appear on Arena ELO but is expected to consolidate the lead by end of June.
Previously: GPT-4o Vision
Tops LMArena Vision. Best OCR + chart + document understanding.
Strongest image-grounded reasoning chains; native computer-use vision pipeline.
Best video understanding and long-form temporal reasoning.
LMArena Vision top-3 all Anthropic (1309 / 1303 / 1298 ELO).
Anthropic for OCR + document Q&A + chart understanding; GPT-5.5 for image-grounded reasoning; Gemini 3.1 Pro for video understanding.
Kimi K2.6 (Moonshot) leads open weights at AA Intelligence Index 54 — within 7 points of frontier closed models. DeepSeek V4 Pro (MIT, 52) is the #2 open reasoning model, and Google's new Gemma 4 12B (Apache 2.0, 2026-06-03) packs native multimodal into a 16GB-laptop footprint. The closed-vs-open gap is the narrowest it has ever been.
Previously: Llama 4
Open-weights leader on AA Intelligence Index. Closes the closed-source gap to 7 points.
MIT-licensed open weights at AA Intelligence Index 52 — #3 of 89 overall and the #2 open reasoning model behind only Kimi K2.6.
Released 2026-06-03 under Apache 2.0. Encoder-free native multimodal (text/image/audio/video), 256K context, runs on a 16GB laptop — performance nearing last-gen 27B.
Best open-source for Chinese-language self-host deployment.
Kimi K2.6 (54) tops open weights; DeepSeek V4 Pro (52) #2; Gemma 4 12B brings 16GB-laptop multimodal.
Kimi K2.6 for general-purpose open deployment; DeepSeek V4 Pro for cheap frontier-grade reasoning; Gemma 4 12B for on-device multimodal; Qwen3.7 Plus for Chinese self-host.
The 2026 value war is led by China's open-source camp. DeepSeek V4 Flash delivers near-flagship intelligence (Index 47) at roughly a tenth of the price — a blended cost near $0.06 per 1M tokens. The leaderboard makes one warning explicit: 'Flash' and 'mini' branding does not mean cheap. Gemini 3.5 Flash scores a strong 55 but costs over 20× more per full Intelligence-Index run.
The intelligence-per-dollar king. AA Intelligence Index 47 at $0.14/$0.28 per 1M tokens — about a tenth of comparable Flash flagships, with the lowest cache-hit price of any 2026 frontier model.
Best balance in the high-intelligence + low-price quadrant. Intelligence Index 52 (top-3 overall) at $0.435/$0.87 — a fraction of same-tier flagships like GPT-5.5 and Claude Opus.
Keeps the value race from being a single-vendor story. Intelligence Index 53 at $0.40 input — a higher score than V4 Pro; the trade-off is pricier output and slower generation.
New category. DeepSeek V4 Flash leads intelligence-per-dollar; open-source models sweep the value tier.
DeepSeek V4 Flash for highest-volume low-cost workloads; V4 Pro when you need stronger reasoning but still want to save; Qwen3.7 Plus to diversify away from a single vendor.
What changed across the AI model landscape this month — distilled from the data above.
Opus 4.8 takes AA Intelligence Index #1; Opus 4.7-thinking holds LMArena Vision, WebDev, and Document arenas. First time a single lab has held all four leaderboards simultaneously since GPT-4-era OpenAI.
OpenAI's April 24 GPT-5.5 ships with 1M-token context, native MCP, Skills, hosted shell, computer use, tool search, and web search — turning the API itself into an agent runtime.
May launch of Gemini Omni unifies image / audio / video generation; Antigravity 2.0 platform turns Gemini 3.5 into an agentic substrate. Google's bet: not the smartest single model, but the most integrated stack.
Kimi K2.6 (54) is within 7 points of Claude Opus 4.8 (61) on AA Intelligence Index. Meta's muse-spark cracks LMArena top-5 at 1489. Closed-source's moat is the smallest it has ever been.
Qwen3.7 Max (Alibaba, 57), MiniMax-M3 (55), Kimi K2.6 (Moonshot, 54), MiMo-V2.5-Pro (Xiaomi, 54), and Qwen3.7 Plus (Alibaba, 53) all sit in the AA Intelligence Index top 15 — Chinese labs are no longer 'catching up', they're inside the frontier.
OpenAI Realtime 2 (May 7) + Gemini 3.1 Flash TTS (April) + Fun-Realtime-TTS push real-time voice agents from research to production. Speech-to-speech with reasoning is now a checkbox feature.
DeepSeek V4 Flash delivers Intelligence Index 47 at a blended ~$0.06 per 1M tokens — open-weights models from DeepSeek and Qwen now sweep the intelligence-per-dollar top tier, while 'Flash'/'mini'-branded closed models like Gemini 3.5 Flash cost over 20× more per Index run.