VOL. 2026ISSUE 04Atualizado em 2026-04-24pulseagent.io / leaderboards

abril

Name: LLM Monthly Leaderboard — 2026-04
Creator: PulseAgent
Published: 2026-04-24T10:00:00Z
License: https://creativecommons.org/licenses/by/4.0/

de 2026

Leaderboard Mensal de LLMs. Oito categorias. Vinte e quatro modelos líderes. Atualizado mensalmente. Com citações amigáveis para IA.

Geração de Texto e Raciocínio

2026 entra na era dos três titãs — sem modelo dominante único, a melhor escolha depende da tarefa em mãos.

Previously: GPT-5.4

Líder atual

GPT-5.5

OpenAI

Released April 23, the first fully retrained foundation model since GPT-5.

Pontuação

01Terminal-Bench 2.0: 82.7%
02OSWorld-Verified: 78.7%
03GDPval: 84.9%
04ARC-AGI-2: 85.0%
051M-token context

Runners-up

№2

Claude Opus 4.7

Anthropic

Released April 16, strongest at long-context and code review.

SWE-Bench Pro: 64.3%
MCP-Atlas: 79.1%
Most reliable multi-step reasoning
Most thorough code-logic review
1M-token context

№3

Gemini 3.1 Pro

Google

In preview, strongest at math and algorithmic competition.

LiveCodeBench Elo: 2887
1M-token context
Lowest API price ($2/$12)
Leading video understanding
Best price-to-performance

~85

Tags1M-token contextAgentic workflowsMultimodal understanding

Text-to-Image

Texto para Imagem

GPT Image-2 toma o trono com 99,2% de precisão na renderização de texto, enquanto Nano Banana 2 mantém vantagem na geração em tempo real.

Previously: Nano Banana 2

Líder atual

GPT Image-2

OpenAI

Highest text-rendering accuracy.

Pontuação

99.2%

01Text-rendering accuracy 99.2%
02Chinese / Arabic support
03Spatial logic & anatomical correctness
04Character consistency
05Thinking-mode reasoning engine

Runners-up

№2

Nano Banana 2

Google

Ultra-fast 4K generation with live web search.

Flash architecture, ultra-fast generation
4K image in 4-15s
Live web-search integration
Fastest on the market
Deep Gemini-ecosystem integration

4-15s

№3

Flux Pro

Black Forest Labs

Strongest open-source ecosystem.

Open-source, commercial use
Rich community ecosystem
Style diversity
Local deployment

Tags4K generationMultilingual textCharacter consistencyReal-time generation

Text-to-Video

Texto para Vídeo

Sora 2 saiu de cena; Google Veo 3.1 agora lidera em capacidade geral, enquanto Seedance 2.0 e Kling 3.0 lideram em nichos específicos.

Previously: Sora 2

Líder atual

Veo 3.1

Google

Native audio + multi-shot, strongest overall.

01Native audio generation
02Multi-shot narrative
03Excellent physics simulation
04YouTube-ecosystem integration

Runners-up

№2

Seedance 2.0

ByteDance

Strongest multi-shot storyboarding.

Multi-shot storyboarding
Professional cinematic language
Leading domestic Chinese model
Douyin/TikTok ecosystem integration

№3

Kling 3.0 Omni

Kuaishou

Cinematic-grade visuals + most accurate lip-sync.

Cinematic-grade visuals
Most accurate lip-sync
Kuaishou ecosystem integration
Optimized for Chinese scenarios

TagsNative audioMulti-shot narrativeCinematic visualsLip-sync

Code Generation

Geração de Código

GPT-5.5 retoma a liderança em codificação agente-terminal; Claude Opus 4.7 ainda domina refatoração multi-arquivo e orquestração de ferramentas.

Previously: Claude Opus 4.6

Líder atual

GPT-5.5

OpenAI

Terminal-Bench 2.0 #1, strongest agentic coding.

Pontuação

82.7%

01Terminal-Bench 2.0: 82.7%
02Expert-SWE: 73.1%
03Autonomous coding judgment
04Fewer tokens for the same task

Runners-up

№2

Claude Opus 4.7

Anthropic

SWE-Bench Pro #1, strongest multi-file refactoring.

SWE-Bench Pro: 64.3%
MCP-Atlas: 79.1%
Multi-file logic review
Code-vulnerability detection

64.3%

№3

Gemini 3.1 Pro

Google

LiveCodeBench #1, strongest in algorithmic competition.

LiveCodeBench Elo: 2887
1M-context whole-repo analysis
Lowest price
Best for algorithmic competition

2887 Elo

TagsAgentic codingMulti-file refactoringTool orchestrationAlgorithmic competition

Text-to-Speech

Texto para Voz

ElevenLabs continua sendo a referência da indústria em realismo de voz e clonagem; Hume AI lidera em voz emocional.

Previously: ElevenLabs v2

Líder atual

ElevenLabs v3

ElevenLabs

Industry-benchmark voice realism.

Pontuação

9.2/10

01Realism score 9.2/10
0275ms ultra-low latency
0329+ languages
04Professional Clone quality
05Enterprise-grade API

Runners-up

№2

Hume AI Octave

Hume AI

Top of the emotional-voice leaderboard.

Emotion recognition 9.3/10
Emotional response capability
Empathetic interaction
Precise affect awareness

9.3/10

№3

GPT-4o Voice

OpenAI

Best real-time conversational experience.

Low-latency real-time conversation
Natural voice output
Multilingual real-time translation
Deep ChatGPT integration

TagsUltra-low latencyEmotional voiceVoice cloningMultilingual

AI Music Generation

Geração de Música com IA

Suno v5.5 continua sendo a plataforma mais usada; ferramentas se diferenciam em velocidade, pós-produção e implantação empresarial.

Previously: Suno v5

Líder atual

Suno v5.5

Suno

Most widely used AI music platform.

01Largest user base
02Studio multi-track editing
03MIDI export
04Fastest to a finished song

Runners-up

№2

Udio v1.5

Udio

Strongest post-production and stem control.

Stem download
Mix control
Key adjustment
Professional post-production

№3

Lyria 3 Pro

Google DeepMind

Best for enterprise / API deployment.

Vertex AI delivery
Structured generation
Clear copyright posture
Enterprise-grade deployment

TagsMulti-track editingMIDI exportStem controlCopyright safety

Vision Understanding

Compreensão Visual

GPT-4o Vision mantém a liderança em uso geral; Gemini Vision lidera em compreensão de vídeo e análise de documentos longos.

Líder atual

GPT-4o Vision

OpenAI

Strongest general-purpose vision understanding.

01UI parsing
02Chart understanding
03Live visual conversation
04Multimodal fusion

Runners-up

№2

Gemini Vision

Google

Leader for video and long-document understanding.

1M-token long documents
Leading video understanding
Multi-frame analysis
Search integration

№3

Qwen-VL

Alibaba

Top open-source Chinese-scenario vision model.

Optimized for Chinese scenarios
Open-source, commercial use
Multimodal reasoning
Local deployment

TagsLive visionLong-document parsingUI parsingMultilingual

Open Source

Código Aberto

Modelos open-source estão alcançando os closed-source em vários benchmarks. Llama 4, DeepSeek V3.2 e Qwen3 formam o primeiro escalão.

Previously: Llama 3

Líder atual

Llama 4

DeepSeek V3.2

DeepSeek

Strongest open-source reasoning model.

Excellent math reasoning
Strong coding ability
Efficient MoE architecture
Extremely low API price

№3

Qwen3

Alibaba

Top open-source Chinese model.

Strongest Chinese understanding
Multimodal support
Agent capability
Full size coverage

TagsMultimodalCommercial useLocal deploymentLow cost

Editorial · 06 observations

O que mudou este mês

What changed across the AI model landscape this month — distilled from the data above.

Do domínio único à competição de especialistas

Em 2026 a IA mudou de um modelo geral único para um paradigma 'escolha o modelo para a tarefa'. Cada nicho tem seu especialista; roteamento multi-modelo é agora a arquitetura padrão empresarial.

GPT-5.5 e Claude Opus 4.7 — a fronteira dupla

Lançados em 16 e 23 de abril de 2026 respectivamente, os dois agora definem o estado da arte. GPT-5.5 vence em codificação agente e uso de terminal; Claude vence em revisão de código e refatoração.

Contexto de 1M se torna o novo padrão

De 128K para 1M tokens — Gemini 3.1 Pro, Claude Opus 4.7 e GPT-5.5 agora suportam contexto de 1M+, tornando análise de repositório completo possível.

Open source alcança rapidamente

Llama 4, DeepSeek V3.2 e Qwen3 agora igualam closed-source em vários benchmarks por 1/10 do preço ou menos.

Modelos chineses domésticos rompem globalmente

Seedance 2.0 (vídeo), Qwen3 (open source), Kling 3.0 (vídeo) e Qwen-VL (visão) entraram no top três global em seus respectivos domínios.

Preços de API continuam caindo

Preços de LLM API caíram aproximadamente 80% em 2025-2026. Gemini 2.0 Flash a $0,10 / 1M tokens reduziu drasticamente a barreira para aplicações de IA.

Fontes

[01]
Artificial Analysisbenchmark
2026-04-24
[02]
LMArena Leaderboardcommunity leaderboard
2026-04-24
[03]
Hugging Face Open LLM Leaderboardcommunity leaderboard
2026-04-24
[04]
OpenAI Changelogofficial changelog
2026-04-24
[05]
Anthropic Newsofficial changelog
2026-04-24
[06]
Google DeepMind Blogofficial changelog
2026-04-24