VOL. 2026ISSUE 06محدّث حتى 2026-06-04

لوحة المتصدرين الشهرية لنماذج LLM

يونيو 2026

ثماني فئات. أربعة وعشرون نموذجًا رائدًا. تحديث شهري. مع اقتباسات صديقة للذكاء الاصطناعي.

9
categories
29
models
9
sources
شارك هذا العددXLinkedIn
01
Text Generation & Reasoning

Text Generation & Reasoning

May 28 — Anthropic ships Opus 4.8 and dethrones OpenAI on the Artificial Analysis Intelligence Index v4.0. First time since the index relaunched that Anthropic holds the top spot outright.

Previously: GPT-5.5

المتصدر الحالي
Claude Opus 4.8
Anthropic

Released May 28. Adaptive reasoning + max-effort mode tops the AA Intelligence Index v4.0.

الدرجة
61
  • 01Intelligence Index v4.0: 61
  • 02Adaptive reasoning mode
  • 03Coding + agentic upgrade over 4.7
  • 04Long-running work consistency
  • 05Released 2026-05-28
Runners-up
2

GPT-5.5

OpenAI

Apr 24 release. 1M-token context, native MCP + Skills + computer use + hosted shell.

  • Intelligence Index v4.0 (xhigh): 60
  • 1M token context
  • Native MCP + Skills
  • Computer use built-in
  • Tool search + web search
  • Released 2026-04-24
60
3

Gemini 3.1 Pro Preview

Google

Strong agentic frontier with Antigravity 2.0 platform integration.

  • Intelligence Index v4.0: 57
  • Agentic platform (Antigravity 2.0) integration
  • Tied with GPT-5.5 medium and Qwen3.7 Max
57
Change

Released May 28, 2026. Intelligence Index v4.0 score 61 (max effort), edging GPT-5.5 (xhigh, 60).

Market

On LMArena ELO the still-battle-tested Opus 4.6/4.7-thinking variants outrank 4.8 simply because 4.8 has too few votes — expect that to flip by end of June.

02
Image Generation

Image Generation

OpenAI ships GPT Image 2 with token-based pricing and 50% Batch API discount. Recraft V4.1 holds the Artificial Analysis quality leaderboard, while Adobe Firefly enterprise mode remains the rights-cleared default.

Previously: GPT Image-2

المتصدر الحالي
GPT Image 2
OpenAI

Released April 21. State-of-the-art quality with token pricing and Batch API support.

الدرجة
None
  • 01Token-based pricing
  • 02Batch API at 50% discount
  • 03Flexible image sizes
  • 04High-fidelity inputs
  • 05Released 2026-04-21
Runners-up
2

Recraft V4.1

Recraft

Leads Artificial Analysis text-to-image arena on raw output quality.

  • Top of AA text-to-image quality
  • Strong control / style transfer
  • Designer-grade output
None
3

Adobe Firefly Image 4

Adobe

IP-cleared training data; the enterprise-safe choice for commercial use.

  • Trained on licensed assets
  • Indemnification for enterprise
  • Native Adobe Creative Cloud integration
None
Change

Released April 21, 2026. Token-based pricing, Batch API at 50% off.

Market

Recraft V4.1 leads AA's text-to-image arena on raw quality; GPT Image 2 wins on ecosystem and pricing transparency.

03
Video Generation

Video Generation

Seedance 2.0 (ByteDance) tops the Artificial Analysis text-to-video Arena with audio at ELO 1215 — the first model to make synced audio-visual generation state of the art. Google's Veo 3.5 and OpenAI's Sora 2 lead the silent-cinematic tier; Kling 4 holds the value end.

Previously: Veo 3.5

المتصدر الحالي
Seedance 2.0
ByteDance

Tops the Artificial Analysis text-to-video Arena (with audio) at ELO 1215 — ahead of Veo and Sora. Native audio-visual generation: 15-second multi-shot clips with synced sound from text, image, audio and video inputs.

الدرجة
1215
  • 01AA T2V Arena (w/ audio) #1 · ELO 1215
  • 0215s multi-shot · synced audio
  • 03Multimodal input (text/image/audio/video)
  • 04Dual-Branch Diffusion Transformer
Runners-up
2

Veo 3.5

Google

Production-grade film output with strongest temporal coherence.

  • 1080p output
  • Strong physics simulation
  • Long-shot temporal coherence
  • Native Gemini API integration
None
3

Sora 2

OpenAI

Best narrative continuity across one-minute shots.

  • Up to 60s shots
  • Strong character continuity
  • Cinematic camera language
  • Multi-shot scenes
None
4

Kling 4

Kuaishou 快手

Dominant in APAC short-form ad creative; fastest iteration cycle in the space.

  • 9:16 vertical native
  • Fastest editorial iteration
  • TikTok / Douyin native style
  • Low-latency generation
None
Change

Seedance 2.0 takes #1 on the with-audio text-to-video Arena, ahead of Veo and Sora.

Market

Seedance 2.0 for synced audio-visual and multi-shot narrative; Veo 3.5 for cinematic fidelity; Sora 2 for Western-ecosystem integration; Kling 4 for cost.

04
Code Generation & Agentic Coding

Code Generation & Agentic Coding

Claude Opus 4.7-thinking holds LMArena's WebDev top spot ahead of 4.7 and 4.6-thinking — a full Anthropic sweep. GPT-5.3 Codex remains best-in-class for terminal-bound code agents.

Previously: GPT-5.5 (Agentic)

المتصدر الحالي
Claude Opus 4.7-thinking
Anthropic

Tops LMArena WebDev. Best-in-class for multi-file refactoring and code review.

الدرجة
1566
  • 01LMArena WebDev ELO: 1566
  • 02Multi-file refactor SOTA
  • 03Code review top
  • 04Vision-aware coding
Runners-up
2

GPT-5.3 Codex (xhigh)

OpenAI

Specialized terminal-code agent. AA Intelligence Index 54.

  • AA Intelligence Index: 54
  • Sandboxed shell built-in
  • Strong agentic loops
  • Terminal-Bench best
54
3

Cursor Composer 2.5

Cursor

Ranks #3 on AA Coding Agent Index. IDE-native pair programming with multi-file context.

  • AA Coding Agent Index: #3
  • IDE-native context
  • Multi-file edits
  • Inline diff workflow
None
Change

LMArena WebDev top-3 all Anthropic (1566 / 1558 / 1542 ELO).

Market

Anthropic for refactor + review + multi-file editing; GPT-5.3 Codex for sandboxed terminal agents; Cursor Composer 2.5 for IDE-native pair programming.

05
Voice / Speech

Voice / Speech

OpenAI's Realtime 2 (May 7) brings configurable-reasoning speech-to-speech to general availability; AA's text-to-speech crown goes to Fun-Realtime-TTS, and MAI-Transcribe-1.5 wins STT on accuracy-speed.

Previously: ElevenLabs v3

المتصدر الحالي
Realtime 2
OpenAI

GA on May 7. Configurable-reasoning speech-to-speech with realtime translate + Whisper variants.

الدرجة
None
  • 01Configurable reasoning
  • 02Speech-to-speech agents
  • 03Streaming translate variant
  • 04Streaming STT variant
  • 05Released 2026-05-07
Runners-up
2

ElevenLabs v3

ElevenLabs

Industry default for character voice cloning and audiobook production.

  • Character voice cloning SOTA
  • 100+ languages
  • Long-form audiobook quality
  • Emotion control
None
3

Fun-Realtime-TTS

Fun (Alibaba DAMO)

Tops AA text-to-speech leaderboard on quality metrics.

  • AA TTS leaderboard #1
  • Sub-200ms latency
  • Multi-speaker streaming
  • Strong CJK
None
Change

Realtime 2 family shipped May 7 (gpt-realtime-2 / -translate / -whisper).

Market

Realtime 2 for agentic voice; ElevenLabs v3 for character voice cloning; Fun-Realtime-TTS for raw TTS quality; MAI-Transcribe-1.5 for transcription.

06
Music Generation

Music Generation

Suno v6 widens the gap on full-song coherence and lyric prosody; Udio v3 keeps pushing studio-grade mixing; Lyria (Google) integrates into Gemini Omni for any-to-music workflows.

Previously: Suno v5.5

المتصدر الحالي
Suno v6
Suno

Rolling release expected late June. Best full-song coherence and lyric prosody.

الدرجة
None
  • 01Full-song coherence SOTA
  • 02Lyric prosody best
  • 03Multilingual vocal
  • 04Style transfer
Runners-up
2

Udio v3

Udio

Studio-grade mixing with stem-level output for producers.

  • Stem-level export
  • Studio-grade mixing
  • Strong electronic genres
  • DAW-friendly workflow
None
3

Lyria (via Gemini Omni)

Google

Folded into Gemini Omni for any-to-music + cross-modal generation.

  • Gemini Omni native
  • Cross-modal generation
  • Image / video → music workflows
None
Change

Suno v6 expected late June; v5.5 remains the deployed default.

Market

Suno v6 for full-song generation; Udio v3 for mixed studio-grade stems; Lyria via Gemini Omni for cross-modal generation.

07
Vision / Multimodal Understanding

Vision / Multimodal Understanding

Anthropic sweeps LMArena Vision top-3 with Opus 4.7-thinking, 4.6-thinking, and 4.7. Opus 4.8 is too freshly released to appear on Arena ELO but is expected to consolidate the lead by end of June.

Previously: GPT-4o Vision

المتصدر الحالي
Claude Opus 4.7-thinking
Anthropic

Tops LMArena Vision. Best OCR + chart + document understanding.

الدرجة
1309
  • 01LMArena Vision ELO: 1309
  • 02OCR SOTA
  • 03Chart understanding
  • 04Document Q&A
Runners-up
2

GPT-5.5

OpenAI

Strongest image-grounded reasoning chains; native computer-use vision pipeline.

  • Image-grounded reasoning best
  • Computer use vision
  • 1M token multimodal
  • Released 2026-04-24
None
3

Gemini 3.1 Pro

Google

Best video understanding and long-form temporal reasoning.

  • Video understanding SOTA
  • Long-form temporal
  • Robotics-ER 1.6 integration
  • Multimodal context 2M+
None
Change

LMArena Vision top-3 all Anthropic (1309 / 1303 / 1298 ELO).

Market

Anthropic for OCR + document Q&A + chart understanding; GPT-5.5 for image-grounded reasoning; Gemini 3.1 Pro for video understanding.

08
Open-Source / Open-Weights

Open-Source / Open-Weights

Kimi K2.6 (Moonshot) leads open weights at AA Intelligence Index 54 — within 7 points of frontier closed models. DeepSeek V4 Pro (MIT, 52) is the #2 open reasoning model, and Google's new Gemma 4 12B (Apache 2.0, 2026-06-03) packs native multimodal into a 16GB-laptop footprint. The closed-vs-open gap is the narrowest it has ever been.

Previously: Llama 4

المتصدر الحالي
Kimi K2.6
Moonshot AI

Open-weights leader on AA Intelligence Index. Closes the closed-source gap to 7 points.

الدرجة
54
  • 01AA Intelligence Index: 54
  • 02Open weights
  • 03Strong Chinese + English
  • 04Long-context retention
Runners-up
2

DeepSeek V4 Pro

DeepSeek

MIT-licensed open weights at AA Intelligence Index 52 — #3 of 89 overall and the #2 open reasoning model behind only Kimi K2.6.

  • AA Intelligence Index: 52 (#3/89)
  • MIT license · open weights
  • MoE 1.6T total / 49B active
  • 1M-token context
52
3

Gemma 4 12B

Google

Released 2026-06-03 under Apache 2.0. Encoder-free native multimodal (text/image/audio/video), 256K context, runs on a 16GB laptop — performance nearing last-gen 27B.

  • Apache 2.0 · open weights
  • 256K context · native multimodal
  • Runs on 16GB VRAM
  • MMLU-Pro 77.2 · GPQA-Diamond 78.8
4

Qwen3.7 Plus

Alibaba

Best open-source for Chinese-language self-host deployment.

  • AA Intelligence Index: 53
  • Best Chinese open-source
  • Strong tool use
  • Open weights
53
Change

Kimi K2.6 (54) tops open weights; DeepSeek V4 Pro (52) #2; Gemma 4 12B brings 16GB-laptop multimodal.

Market

Kimi K2.6 for general-purpose open deployment; DeepSeek V4 Pro for cheap frontier-grade reasoning; Gemma 4 12B for on-device multimodal; Qwen3.7 Plus for Chinese self-host.

09
Intelligence per Dollar

Cost-Effectiveness / Value

The 2026 value war is led by China's open-source camp. DeepSeek V4 Flash delivers near-flagship intelligence (Index 47) at roughly a tenth of the price — a blended cost near $0.06 per 1M tokens. The leaderboard makes one warning explicit: 'Flash' and 'mini' branding does not mean cheap. Gemini 3.5 Flash scores a strong 55 but costs over 20× more per full Intelligence-Index run.

المتصدر الحالي
DeepSeek V4 Flash
DeepSeek

The intelligence-per-dollar king. AA Intelligence Index 47 at $0.14/$0.28 per 1M tokens — about a tenth of comparable Flash flagships, with the lowest cache-hit price of any 2026 frontier model.

الدرجة
47
  • 01AA Intelligence Index: 47
  • 02$0.14 in / $0.28 out per 1M
  • 03Blended ≈ $0.06 / 1M
  • 04MIT open weights · 1M context
Runners-up
2

DeepSeek V4 Pro

DeepSeek

Best balance in the high-intelligence + low-price quadrant. Intelligence Index 52 (top-3 overall) at $0.435/$0.87 — a fraction of same-tier flagships like GPT-5.5 and Claude Opus.

  • AA Intelligence Index: 52 (#3/89)
  • $0.435 in / $0.87 out per 1M
  • Top-3 intelligence overall
  • MIT open weights
52
3

Qwen3.7 Plus

Alibaba

Keeps the value race from being a single-vendor story. Intelligence Index 53 at $0.40 input — a higher score than V4 Pro; the trade-off is pricier output and slower generation.

  • AA Intelligence Index: 53
  • $0.40 in / $1.16 out per 1M
  • Highest score in the value tier
  • Strong Chinese + tool use
53
Change

New category. DeepSeek V4 Flash leads intelligence-per-dollar; open-source models sweep the value tier.

Market

DeepSeek V4 Flash for highest-volume low-cost workloads; V4 Pro when you need stronger reasoning but still want to save; Qwen3.7 Plus to diversify away from a single vendor.

Editorial · 07 observations

ما تغير هذا الشهر

What changed across the AI model landscape this month — distilled from the data above.

01

Anthropic Sweeps Reasoning + Vision + Code

Opus 4.8 takes AA Intelligence Index #1; Opus 4.7-thinking holds LMArena Vision, WebDev, and Document arenas. First time a single lab has held all four leaderboards simultaneously since GPT-4-era OpenAI.

02

GPT-5.5 Brings 1M Context + Native MCP/Skills

OpenAI's April 24 GPT-5.5 ships with 1M-token context, native MCP, Skills, hosted shell, computer use, tool search, and web search — turning the API itself into an agent runtime.

03

Google's Gemini Omni — Any-to-Any Generation

May launch of Gemini Omni unifies image / audio / video generation; Antigravity 2.0 platform turns Gemini 3.5 into an agentic substrate. Google's bet: not the smartest single model, but the most integrated stack.

04

Open-Source Closes to a 7-Point Gap

Kimi K2.6 (54) is within 7 points of Claude Opus 4.8 (61) on AA Intelligence Index. Meta's muse-spark cracks LMArena top-5 at 1489. Closed-source's moat is the smallest it has ever been.

05

Five Chinese Labs in AA Top 15

Qwen3.7 Max (Alibaba, 57), MiniMax-M3 (55), Kimi K2.6 (Moonshot, 54), MiMo-V2.5-Pro (Xiaomi, 54), and Qwen3.7 Plus (Alibaba, 53) all sit in the AA Intelligence Index top 15 — Chinese labs are no longer 'catching up', they're inside the frontier.

06

Sub-200ms Voice Agents Are Now Commodity

OpenAI Realtime 2 (May 7) + Gemini 3.1 Flash TTS (April) + Fun-Realtime-TTS push real-time voice agents from research to production. Speech-to-speech with reasoning is now a checkbox feature.

07

The value war is a Chinese open-source story

DeepSeek V4 Flash delivers Intelligence Index 47 at a blended ~$0.06 per 1M tokens — open-weights models from DeepSeek and Qwen now sweep the intelligence-per-dollar top tier, while 'Flash'/'mini'-branded closed models like Gemini 3.5 Flash cost over 20× more per Index run.

المصادر
  1. [01]
  2. [02]
    LMArena Leaderboardcommunity leaderboard
  3. [03]
  4. [04]
    OpenAI Changelogofficial changelog
  5. [05]
    Anthropic Newsofficial changelog
  6. [06]
    Google DeepMind Blogofficial changelog
  7. [07]
    DeepSeek API Pricingofficial changelog
  8. [08]
    Google Gemma 4 Launchofficial changelog
  9. [09]