VOL. 2026ISSUE 04Diperbarui per 2026-04-24

April

2026

Leaderboard Bulanan LLM. Delapan kategori. Dua puluh empat model unggulan. Diperbarui bulanan. Dengan kutipan ramah AI.

8
categories
24
models
6
sources
Bagikan edisi iniXLinkedIn
01
Text Generation & Reasoning

Pembuatan Teks & Penalaran

2026 memasuki era tiga raksasa — tidak ada model dominan tunggal, pilihan terbaik bergantung pada tugas yang dihadapi.

Previously: GPT-5.4

Pemimpin saat ini
GPT-5.5
OpenAI

Released April 23, the first fully retrained foundation model since GPT-5.

Skor
89
  • 01Terminal-Bench 2.0: 82.7%
  • 02OSWorld-Verified: 78.7%
  • 03GDPval: 84.9%
  • 04ARC-AGI-2: 85.0%
  • 051M-token context
Runners-up
2

Claude Opus 4.7

Anthropic

Released April 16, strongest at long-context and code review.

  • SWE-Bench Pro: 64.3%
  • MCP-Atlas: 79.1%
  • Most reliable multi-step reasoning
  • Most thorough code-logic review
  • 1M-token context
86
3

Gemini 3.1 Pro

Google

In preview, strongest at math and algorithmic competition.

  • LiveCodeBench Elo: 2887
  • 1M-token context
  • Lowest API price ($2/$12)
  • Leading video understanding
  • Best price-to-performance
~85
Tags1M-token contextAgentic workflowsMultimodal understanding
02
Text-to-Image

Teks ke Gambar

GPT Image-2 mengambil takhta dengan akurasi rendering teks 99,2%, sementara Nano Banana 2 mempertahankan keunggulan dalam pembuatan real-time.

Previously: Nano Banana 2

Pemimpin saat ini
GPT Image-2
OpenAI

Highest text-rendering accuracy.

Skor
99.2%
  • 01Text-rendering accuracy 99.2%
  • 02Chinese / Arabic support
  • 03Spatial logic & anatomical correctness
  • 04Character consistency
  • 05Thinking-mode reasoning engine
Runners-up
2

Nano Banana 2

Google

Ultra-fast 4K generation with live web search.

  • Flash architecture, ultra-fast generation
  • 4K image in 4-15s
  • Live web-search integration
  • Fastest on the market
  • Deep Gemini-ecosystem integration
4-15s
3

Flux Pro

Black Forest Labs

Strongest open-source ecosystem.

  • Open-source, commercial use
  • Rich community ecosystem
  • Style diversity
  • Local deployment
Tags4K generationMultilingual textCharacter consistencyReal-time generation
03
Text-to-Video

Teks ke Video

Sora 2 telah keluar; Google Veo 3.1 kini memimpin kemampuan keseluruhan, sementara Seedance 2.0 dan Kling 3.0 memimpin di niche tertentu.

Previously: Sora 2

Pemimpin saat ini
Veo 3.1
Google

Native audio + multi-shot, strongest overall.

  • 01Native audio generation
  • 02Multi-shot narrative
  • 03Excellent physics simulation
  • 04YouTube-ecosystem integration
Runners-up
2

Seedance 2.0

ByteDance

Strongest multi-shot storyboarding.

  • Multi-shot storyboarding
  • Professional cinematic language
  • Leading domestic Chinese model
  • Douyin/TikTok ecosystem integration
3

Kling 3.0 Omni

Kuaishou

Cinematic-grade visuals + most accurate lip-sync.

  • Cinematic-grade visuals
  • Most accurate lip-sync
  • Kuaishou ecosystem integration
  • Optimized for Chinese scenarios
TagsNative audioMulti-shot narrativeCinematic visualsLip-sync
04
Code Generation

Pembuatan Kode

GPT-5.5 merebut kembali kepemimpinan dalam coding agen-terminal; Claude Opus 4.7 masih menguasai refactoring multi-file dan orkestrasi tool.

Previously: Claude Opus 4.6

Pemimpin saat ini
GPT-5.5
OpenAI

Terminal-Bench 2.0 #1, strongest agentic coding.

Skor
82.7%
  • 01Terminal-Bench 2.0: 82.7%
  • 02Expert-SWE: 73.1%
  • 03Autonomous coding judgment
  • 04Fewer tokens for the same task
Runners-up
2

Claude Opus 4.7

Anthropic

SWE-Bench Pro #1, strongest multi-file refactoring.

  • SWE-Bench Pro: 64.3%
  • MCP-Atlas: 79.1%
  • Multi-file logic review
  • Code-vulnerability detection
64.3%
3

Gemini 3.1 Pro

Google

LiveCodeBench #1, strongest in algorithmic competition.

  • LiveCodeBench Elo: 2887
  • 1M-context whole-repo analysis
  • Lowest price
  • Best for algorithmic competition
2887 Elo
TagsAgentic codingMulti-file refactoringTool orchestrationAlgorithmic competition
05
Text-to-Speech

Teks ke Suara

ElevenLabs tetap menjadi tolok ukur industri untuk realisme suara dan kloning; Hume AI memimpin dalam suara emosional.

Previously: ElevenLabs v2

Pemimpin saat ini
ElevenLabs v3
ElevenLabs

Industry-benchmark voice realism.

Skor
9.2/10
  • 01Realism score 9.2/10
  • 0275ms ultra-low latency
  • 0329+ languages
  • 04Professional Clone quality
  • 05Enterprise-grade API
Runners-up
2

Hume AI Octave

Hume AI

Top of the emotional-voice leaderboard.

  • Emotion recognition 9.3/10
  • Emotional response capability
  • Empathetic interaction
  • Precise affect awareness
9.3/10
3

GPT-4o Voice

OpenAI

Best real-time conversational experience.

  • Low-latency real-time conversation
  • Natural voice output
  • Multilingual real-time translation
  • Deep ChatGPT integration
TagsUltra-low latencyEmotional voiceVoice cloningMultilingual
06
AI Music Generation

Pembuatan Musik AI

Suno v5.5 tetap menjadi platform yang paling banyak digunakan; tool-tool berbeda dalam kecepatan, pasca-produksi, dan deployment enterprise.

Previously: Suno v5

Pemimpin saat ini
Suno v5.5
Suno

Most widely used AI music platform.

  • 01Largest user base
  • 02Studio multi-track editing
  • 03MIDI export
  • 04Fastest to a finished song
Runners-up
2

Udio v1.5

Udio

Strongest post-production and stem control.

  • Stem download
  • Mix control
  • Key adjustment
  • Professional post-production
3

Lyria 3 Pro

Google DeepMind

Best for enterprise / API deployment.

  • Vertex AI delivery
  • Structured generation
  • Clear copyright posture
  • Enterprise-grade deployment
TagsMulti-track editingMIDI exportStem controlCopyright safety
07
Vision Understanding

Pemahaman Visual

GPT-4o Vision mempertahankan kepemimpinan tujuan umum; Gemini Vision memimpin dalam pemahaman video dan parsing dokumen panjang.

Pemimpin saat ini
GPT-4o Vision
OpenAI

Strongest general-purpose vision understanding.

  • 01UI parsing
  • 02Chart understanding
  • 03Live visual conversation
  • 04Multimodal fusion
Runners-up
2

Gemini Vision

Google

Leader for video and long-document understanding.

  • 1M-token long documents
  • Leading video understanding
  • Multi-frame analysis
  • Search integration
3

Qwen-VL

Alibaba

Top open-source Chinese-scenario vision model.

  • Optimized for Chinese scenarios
  • Open-source, commercial use
  • Multimodal reasoning
  • Local deployment
TagsLive visionLong-document parsingUI parsingMultilingual
08
Open Source

Sumber Terbuka

Model open-source mengejar cepat closed-source di beberapa benchmark. Llama 4, DeepSeek V3.2, dan Qwen3 membentuk tier pertama.

Previously: Llama 3

Pemimpin saat ini
Llama 4
Meta

Most complete open-source ecosystem.

  • 01Multimodal support
  • 02Largest community ecosystem
  • 03Commercial-use license
  • 04Multiple sizes
Runners-up
2

DeepSeek V3.2

DeepSeek

Strongest open-source reasoning model.

  • Excellent math reasoning
  • Strong coding ability
  • Efficient MoE architecture
  • Extremely low API price
3

Qwen3

Alibaba

Top open-source Chinese model.

  • Strongest Chinese understanding
  • Multimodal support
  • Agent capability
  • Full size coverage
TagsMultimodalCommercial useLocal deploymentLow cost
Editorial · 06 observations

Yang berubah bulan ini

What changed across the AI model landscape this month — distilled from the data above.

01

Dari dominasi tunggal ke kompetisi spesialis

Pada 2026 AI bergeser dari satu model tujuan umum ke paradigma 'pilih model untuk tugas'. Setiap niche memiliki spesialisnya; routing multi-model kini menjadi arsitektur standar enterprise.

02

GPT-5.5 dan Claude Opus 4.7 — frontier ganda

Diluncurkan pada 16 dan 23 April 2026 secara berurutan, keduanya kini menentukan ujung tombak. GPT-5.5 menang pada coding agentik dan penggunaan terminal; Claude menang pada review kode dan refactoring.

03

Konteks 1M menjadi standar baru

Dari 128K ke 1M token — Gemini 3.1 Pro, Claude Opus 4.7, dan GPT-5.5 kini mendukung konteks 1M+, memungkinkan analisis repositori penuh.

04

Open source mengejar cepat

Llama 4, DeepSeek V3.2, dan Qwen3 kini menyamai closed-source di beberapa benchmark dengan 1/10 harga atau kurang.

05

Model Tiongkok domestik menembus global

Seedance 2.0 (video), Qwen3 (open source), Kling 3.0 (video), dan Qwen-VL (vision) semuanya masuk top tiga global di domain masing-masing.

06

Harga API terus turun

Harga LLM API telah turun sekitar 80% pada 2025-2026. Gemini 2.0 Flash dengan $0,10 / 1M token secara dramatis menurunkan barrier untuk aplikasi AI.

Sumber
  1. [01]
  2. [02]
    LMArena Leaderboardcommunity leaderboard
  3. [03]
  4. [04]
    OpenAI Changelogofficial changelog
  5. [05]
    Anthropic Newsofficial changelog
  6. [06]
    Google DeepMind Blogofficial changelog