VOL. 2026ISSUE 04Updated as of 2026-04-29pulseagent.io / leaderboards

LLM Monthly Leaderboard

Name: LLM Monthly Leaderboard — 2026-04
Creator: PulseAgent
Published: 2026-04-29T10:00:00Z
License: https://creativecommons.org/licenses/by/4.0/

April 2026

Eight categories. Twenty-four leading models. Updated monthly. AI-friendly citations included.

Claude Opus 4.7

Anthropic

Released April 16, strongest at long-context and code review.

SWE-Bench Pro: 64.3%
MCP-Atlas: 79.1%
Most reliable multi-step reasoning
Most thorough code-logic review
1M-token context

№3

Gemini 3.1 Pro

Google

In preview, strongest at math and algorithmic competition.

LiveCodeBench Elo: 2887
1M-token context
Lowest API price ($2/$12)
Leading video understanding
Best price-to-performance

~85

Change

Released April 23, 2026. SOTA on 14 benchmarks, composite score 89.

Market

GPT-5.5 leads in agentic and terminal coding; Claude Opus 4.7 leads in multi-file refactoring and code review; Gemini 3.1 Pro leads in algorithmic competition.

Tags1M-token contextAgentic workflowsMultimodal understanding

Text-to-Image

GPT Image-2 takes the throne with 99.2% text-rendering accuracy, while Nano Banana 2 keeps an edge in real-time generation.

Previously: Nano Banana 2

Current leader

GPT Image-2

OpenAI

Highest text-rendering accuracy.

Score

99.2%

01Text-rendering accuracy 99.2%
02Chinese / Arabic support
03Spatial logic & anatomical correctness
04Character consistency
05Thinking-mode reasoning engine

Runners-up

№2

Nano Banana 2

Google

Ultra-fast 4K generation with live web search.

Flash architecture, ultra-fast generation
4K image in 4-15s
Live web-search integration
Fastest on the market
Deep Gemini-ecosystem integration

4-15s

№3

Flux Pro

Black Forest Labs

Strongest open-source ecosystem.

Open-source, commercial use
Rich community ecosystem
Style diversity
Local deployment

Change

Released April 2026, with major leads in text rendering and spatial reasoning.

Market

GPT Image-2 wins on typography and physical correctness; Nano Banana 2 wins on speed and live web grounding — they complement rather than replace each other.

Tags4K generationMultilingual textCharacter consistencyReal-time generation

Text-to-Video

Sora 2 has exited; Google Veo 3.1 now leads in overall capability, while Seedance 2.0 and Kling 3.0 lead in specific niches.

Previously: Sora 2

Current leader

Veo 3.1

Google

Native audio + multi-shot, strongest overall.

01Native audio generation
02Multi-shot narrative
03Excellent physics simulation
04YouTube-ecosystem integration

Runners-up

№2

Seedance 2.0

ByteDance

Strongest multi-shot storyboarding.

Multi-shot storyboarding
Professional cinematic language
Leading domestic Chinese model
Douyin/TikTok ecosystem integration

№3

Kling 3.0 Omni

Kuaishou

Cinematic-grade visuals + most accurate lip-sync.

Cinematic-grade visuals
Most accurate lip-sync
Kuaishou ecosystem integration
Optimized for Chinese scenarios

Change

Sora 2 deprecated. Veo 3.1 takes the overall lead.

Market

Veo 3.1 best overall; Seedance for multi-shot storyboarding; Kling for cinematic visuals and lip-sync; Pika for social creators.

TagsNative audioMulti-shot narrativeCinematic visualsLip-sync

Code Generation

GPT-5.5 retakes the lead in terminal-agent coding; Claude Opus 4.7 still owns multi-file refactoring and tool orchestration.

Previously: Claude Opus 4.6

Current leader

GPT-5.5

OpenAI

Terminal-Bench 2.0 #1, strongest agentic coding.

Score

82.7%

01Terminal-Bench 2.0: 82.7%
02Expert-SWE: 73.1%
03Autonomous coding judgment
04Fewer tokens for the same task

Runners-up

№2

Claude Opus 4.7

Anthropic

SWE-Bench Pro #1, strongest multi-file refactoring.

SWE-Bench Pro: 64.3%
MCP-Atlas: 79.1%
Multi-file logic review
Code-vulnerability detection

64.3%

№3

Gemini 3.1 Pro

Google

LiveCodeBench #1, strongest in algorithmic competition.

LiveCodeBench Elo: 2887
1M-context whole-repo analysis
Lowest price
Best for algorithmic competition

2887 Elo

Change

GPT-5.5 released April 23, leading Terminal-Bench 2.0 by 13 percentage points.

Market

Use GPT-5.5 for terminal agentic coding, Claude Opus 4.7 for multi-file refactoring and review, Gemini for whole-repo analysis.

TagsAgentic codingMulti-file refactoringTool orchestrationAlgorithmic competition

Text-to-Speech

ElevenLabs remains the industry benchmark for voice realism and cloning; Hume AI leads in emotional voice.

Previously: ElevenLabs v2

Current leader

ElevenLabs v3

ElevenLabs

Industry-benchmark voice realism.

Score

9.2/10

01Realism score 9.2/10
0275ms ultra-low latency
0329+ languages
04Professional Clone quality
05Enterprise-grade API

Runners-up

№2

Hume AI Octave

Hume AI

Top of the emotional-voice leaderboard.

Emotion recognition 9.3/10
Emotional response capability
Empathetic interaction
Precise affect awareness

9.3/10

№3

GPT-4o Voice

OpenAI

Best real-time conversational experience.

Low-latency real-time conversation
Natural voice output
Multilingual real-time translation
Deep ChatGPT integration

Change

Continues to lead. v3 ships at 75ms ultra-low latency.

Market

ElevenLabs v3 for professional voiceover and cloning; Hume Octave for emotional interaction; GPT-4o Voice for real-time conversation.

TagsUltra-low latencyEmotional voiceVoice cloningMultilingual

AI Music Generation

Suno v5.5 remains the most-used platform; tools differentiate on speed, post-production, and enterprise deployment.

Previously: Suno v5

Current leader

Suno v5.5

Suno

Most widely used AI music platform.

01Largest user base
02Studio multi-track editing
03MIDI export
04Fastest to a finished song

Runners-up

№2

Udio v1.5

Udio

Strongest post-production and stem control.

Stem download
Mix control
Key adjustment
Professional post-production

№3

Lyria 3 Pro

Google DeepMind

Best for enterprise / API deployment.

Vertex AI delivery
Structured generation
Clear copyright posture
Enterprise-grade deployment

Change

Continuous iteration. v5.5 Studio adds multi-track editing and MIDI export.

Market

Suno is fastest to a finished song, Udio is strongest in editing, Lyria is safest for enterprise deployment, ElevenMusic / StableAudio are clearest on commercial rights.

TagsMulti-track editingMIDI exportStem controlCopyright safety

Vision Understanding

GPT-4o Vision keeps the strongest general-purpose lead; Gemini Vision leads on video understanding and long-document parsing.

Current leader

GPT-4o Vision

OpenAI

Strongest general-purpose vision understanding.

01UI parsing
02Chart understanding
03Live visual conversation
04Multimodal fusion

Runners-up

№2

Gemini Vision

Google

Leader for video and long-document understanding.

1M-token long documents
Leading video understanding
Multi-frame analysis
Search integration

№3

Qwen-VL

Alibaba

Top open-source Chinese-scenario vision model.

Optimized for Chinese scenarios
Open-source, commercial use
Multimodal reasoning
Local deployment

Change

Continues to lead — strongest on UI parsing and live visual conversation.

Market

GPT-4o Vision for general-purpose; Gemini Vision for long video / documents; Qwen-VL for the strongest open-source Chinese option.

TagsLive visionLong-document parsingUI parsingMultilingual

Open Source

Open-source models are closing the gap with closed-source on several benchmarks. Llama 4, DeepSeek V4, and Qwen3 form the leading tier.

Previously: Llama 3

Current leader

Llama 4

DeepSeek V4

DeepSeek

Strongest open-source reasoning, upgraded architecture.

Superior math and reasoning
Best-in-class coding ability
Efficient MoE architecture
Extremely low API price

№3

Qwen3

Alibaba

Top open-source Chinese model.

Strongest Chinese understanding
Multimodal support
Agent capability
Full size coverage

Change

Released in 2026, with major multimodal improvements.

Market

Llama 4 has the largest ecosystem; DeepSeek is strongest at reasoning and the cheapest; Qwen3 leads in Chinese and agent capability.

TagsMultimodalCommercial useLocal deploymentLow cost

Editorial · 06 observations

What changed this month

What changed across the AI model landscape this month — distilled from the data above.

From single dominance to specialist competition

In 2026 AI has shifted from one general-purpose model to a 'pick the model for the task' paradigm. Every niche has its specialist; multi-model routing is now the enterprise standard architecture.

GPT-5.5 and Claude Opus 4.7 — the dual frontier

Released April 16 and April 23, 2026 respectively, the two now define the cutting edge. GPT-5.5 wins on agentic coding and terminal use; Claude wins on code review and refactoring.

1M context becomes the new standard

From 128K to 1M tokens — Gemini 3.1 Pro, Claude Opus 4.7, and GPT-5.5 all now support 1M+ context, making whole-repository analysis possible.

Open source catches up fast

Llama 4, DeepSeek V4, and Qwen3 now match closed-source on several benchmarks at 1/10 of the price or less.

Domestic Chinese models break through globally

Seedance 2.0 (video), Qwen3 (open source), Kling 3.0 (video), and Qwen-VL (vision) have all entered the global top three in their respective domains.

API prices keep falling

LLM API prices have dropped roughly 80% in 2025-2026. Gemini 2.0 Flash at $0.10 / 1M tokens has dramatically lowered the barrier to AI applications.

Sources

[01]
Artificial Analysisbenchmark
2026-04-29
[02]
LMArena Leaderboardcommunity leaderboard
2026-04-29
[03]
Hugging Face Open LLM Leaderboardcommunity leaderboard
2026-04-29
[04]
OpenAI Changelogofficial changelog
2026-04-29
[05]
Anthropic Newsofficial changelog
2026-04-29
[06]
Google DeepMind Blogofficial changelog
2026-04-29