ChatGPT vs Gemini

A detailed comparison of AI assistant subscriptions and performance

Feature ChatGPT Gemini
Free Go Plus Free Plus Pro
Reasoning model limit per day Unknown [GPT 5 Thinking Mini] 40(10 every 5h) [GPT 5 Thinking Mini] 428(3000/week)[GPT 5.4 Thinking] 3+9[Gemini 3.1 Pro + Gemini 3 Flash Thinking] 30+90(X every X hours)[Gemini 3.1 Pro + Gemini 3 Flash Thinking] 100+300(X every X hours)[Gemini 3.1 Pro + Gemini 3 Flash Thinking]
Non-reasoning model limit per day 40(10 every 5h)[GPT 5.3 Instant] 1280(160 every 3h)[GPT 5.3 Instant] Unlimited[Gemini 3 Flash]
Context window limit 27k / 256k(Non-reasoning / Reasoning) 54k / 256k(Non-reasoning / Reasoning) 32k 128k 1M
File uploads per day 3 30 640(80 every 3h) Unknown
Memory features Memories Memories, chats history, chat search Memories, chats history, ecosystem memory
Image generation per day Unknown[GPT Image 2 / GPT Image 2 Thinking] 20[Nano Banana 2] 50+50[Nano Banana Pro + Nano Banana 2 / Nano Banana 2 Thinking] 100+100[Nano Banana Pro + Nano Banana 2 / Nano Banana 2 Thinking]
Image resolution (1:1) 1254×1254 (1.2K)[All image models] 2048×2048 (2K)[All image models]
Video generation per day No Unknown[Sora 2] No 2[Veo 3.1 Lite] 3[Veo 3.1 Lite]
Video resolution and length No 480p, 720p(up to 10s)[Sora 2] No 720p(up to 8s)[Veo 3.1 Lite]
Music generation per day No 10(30s per song)[Lyria 3] 20+10(30s per song + 180s per song)[Lyria 3 + Lyria 3 Pro] 50+20(30s per song + 180s per song)[Lyria 3 + Lyria 3 Pro]
Custom model creation No[Custom GPTs] Yes[Custom GPTs] Yes[Gems]
Folder organization Yes[Projects] Yes[Notebooks]
Maximum scheduled tasks 10
[Tasks]
No 10
[Scheduled Actions]
Deep research uses per month Unknown[GPT 5.2 Thinking / GPT o3] 5[Gemini 3 Flash Thinking] 360(12/day)[Gemini 3.1 Pro] 600(20/day)[Gemini 3.1 Pro]
Agent mode uses per month No 40 No
Advertisements Yes No No
Video inputs per day No 3(Up to 5 minutes per file) 6(Up to 5 minutes per file) 20(Up to 60 minutes per file)
Audio inputs per day No 3(Up to 10 minutes per file) 6(Up to 10 minutes per file) 20(Up to 180 minutes per file)
Youtube summaries No Yes
Cloud storage included No 15GB 200GB 5TB
API credit included No No $10/month(Unused credit expires monthly)
Plan sharing No No Up to 6 accounts
Age estimation and verification method AI-basedVerification only with ID, through a third-party (Persona Identities) Birth date based (prioritized), AI-basedVerification with ID or credit card, sent directly to Google
Best option
Sufficient option (subjectively)
Includes USA-exclusive features
USA-exclusive features
Subjective recommendations

Footnotes

Model Filters
Claude models
Claude 4.7 Opus Extended
Claude 4.6 Sonnet Extended
Claude 4.5 Haiku Extended
Claude 4.6 Opus Extended
Claude 4.7 Opus
Claude 4.6 Sonnet
Claude 4.5 Haiku
Claude 4.6 Opus
Reasoning models
GPT 5.5 Thinking Heavy
GPT 5.5 Thinking Extended
GPT 5.5 Thinking Standard
GPT 5 Thinking Mini
Gemini 3.1 Pro
Gemini 3 Flash Thinking
Claude 4.7 Opus Extended
Claude 4.6 Sonnet Extended
Claude 4.5 Haiku Extended
Claude 4.6 Opus Extended
Grok 4.20 Expert
GLM 5.1 Deep Think
GLM 5 Turbo Deep Think
GLM 5V Turbo Deep Think
GLM 5 Deep Think
Qwen 3.6 Plus Think
Kimi K2.6 Thinking
Mimo V2.5 Pro
Mimo V2.5
Minimax M2.7 Max
Non-reasoning models
GPT 5.2 Instant
Gemini 3 Flash
Claude 4.7 Opus
Claude 4.6 Sonnet
Claude 4.5 Haiku
Claude 4.6 Opus
Grok 4.20 Fast
GLM 5.1
GLM 5
Qwen 3.5 Omni Plus
Other recommended models
(Selection of reliable models with a low hallucination rate).
Legacy models
Benchmark Filters
Accuracy benchmarks
AA-Omniscience
Long context benchmarks
MRCR v2 8-Needle
AA-LCR
Max Context Limit
Multimodal benchmarks
MMMU-Pro
Coding benchmarks
Terminal-Bench Hard
τ²-Bench Telecom

Knowledge Accuracy — AA-Omniscience Benchmark

Claude models
Reasoning models
Non-reasoning models
Other recommended models

Shows how often the model is right or wrong when answering. Higher is better for Correct & Abstention. Lower is better for Hallucination.

Sort by

AA-Omniscience Notes

  • Correct answers were calculated with Correct = A.
  • Abstention answers were calculated with Abstention = (1−A)⋅(1−H).
  • Hallucination answers were calculated with Hallucination = (1−A)⋅H.
  • (Where "A" is the percentage scored on the "AA-Omnisicence Accuracy" benchmark, "H" is the percentage scored on the "AA-Omnisicence Hallucination Rate" benchmark.)

Long Context Accuracy — MRCR v2 8-Needle Benchmark

Claude models
Reasoning models
Non-reasoning models
Other recommended models

Shows how accuracy at locating specific details in context degrades as context length increases. Ranges from 8k to 1048k (1M) tokens. Higher is better.

Max Context Length: 1M
Y-Axis Min

MRCR v2 8-Needle Notes

  • • Differs greatly from the next benchmark. While AA-LCR revolves around comprehending really long texts, MRCR v2 8-Needle evaluates the model capability to find specific information in those texts.
  • • There are no benchmarks for 200k tokens.
  • • This benchmark doesn't support all models, thus it's likely some of the models selected do not appear here.
  • • Maximum context window of GPT Instant models is 27k/54k/128k tokens (Free/Plus,Go/Pro).
  • • Maximum context window of GPT Thinking models is 256k/400k (Free,Go,Plus/Pro) tokens.
  • • Maximum context window of all Gemini models is 32k/128k/1M tokens (Free/Plus/Pro,Ultra).
  • • Maximum context window of all Claude models is 200k tokens.
  • • Maximum context window of all Grok models is 2M tokens.
  • • Maximum context window of all GLM models is 200k tokens.
  • • Maximum context window of Qwen Plus models is 256k tokens.
  • • Maximum context window of Qwen Omni Plus models is 1M tokens.
  • • Maximum context window of all Kimi models is 200k tokens.
  • • Maximum context window of all Mimo models is 1M tokens.
  • • Maximum context window of all Minimax models is 200k tokens.
  • • 1M tokens equals 1048k tokens, 2M tokens equals 2096k tokens.

Long Context Reasoning — AA-LCR

Claude models
Reasoning models
Non-reasoning models
Other recommended models

Shows extraction, comprehension and synthesizing of information on texts ranging from 10k to 100k tokens. Higher is better.

Sort by

AA-LCR Notes

  • • Differs greatly from the next benchmark. While AA-LCR revolves around comprehending really long texts, MRCR v2 8-Needle evaluates the model capability to find specific information in those texts.
  • • Maximum context window of GPT Instant models is 27k/54k/128k tokens (Free/Plus,Go/Pro).
  • • Maximum context window of GPT Thinking models is 256k/400k (Free,Go,Plus/Pro) tokens.
  • • Maximum context window of all Gemini models is 32k/128k/1M tokens (Free/Plus/Pro,Ultra).
  • • Maximum context window of all Claude models is 200k tokens.
  • • Maximum context window of all Grok models is 2M tokens.
  • • Maximum context window of all GLM models is 200k tokens.
  • • Maximum context window of Qwen Plus models is 256k tokens.
  • • Maximum context window of Qwen Omni Plus models is 1M tokens.
  • • Maximum context window of all Kimi models is 200k tokens.
  • • Maximum context window of all Mimo models is 1M tokens.
  • • Maximum context window of all Minimax models is 200k tokens.
  • • 1M tokens equals 1048k tokens, 2M tokens equals 2096k tokens.

Max Context Limit

Claude models
Reasoning models
Non-reasoning models
Other recommended models

Maximum context window size in tokens. Take into consideration that a higher context limit doesn't always come with a good performance at high context lenghts.

Sort by

Visual Reasoning — MMMU-Pro

Claude models
Reasoning models
Non-reasoning models
Other recommended models

Shows performance processing visual and textual information at the same time. Higher is better.

Sort by

MMMU-Pro Notes

  • GPT models are capable of understanding text and images.
  • Gemini models are capable of understanding text, images, video and audio.
  • Claude models are capable of understanding text and images.
  • Grok models are capable of understanding text and images.
  • GLM models are capable of understanding only text. Image and video inputs are likely routed to GLM 5V Turbo Deep Think and GLM 5V Turbo respectively.
  • Qwen 3.6 Plus Think is capable of understanding text, images and video.
  • Qwen 3.5 Omni Plus is capable of understanding text, images, video and audio.
  • Kimi models are capable of understanding text, images and video.
  • Mimo V2.5 Pro is capable of understanding only text.
  • Mimo V2.5 is capable of understanding text, images, video and audio.
  • Minimax models are capable of understanding only text. Image and video inputs are likely routed to Gemini 3.1 Pro.
  • • Models capable of understanding only text are usually able to understand text inside images, but nothing else in them.

Agentic Coding — Terminal-Bench Hard

Claude models
Reasoning models
Non-reasoning models
Other recommended models

Shows capabilities in terminal environments. Higher is better.

Sort by

Agentic Tool Use — τ²-Bench Telecom

Claude models
Reasoning models
Non-reasoning models
Other recommended models

Shows performance guiding a user through technical troubleshooting. Higher is better.

Sort by

Notes

  • • For "GPT 5.5 Thinking Heavy", "gpt-5.5:xhigh" was used in the graph.
  • • For "GPT 5.5 Thinking Extended", "gpt-5.5:high" was used in the graph.
  • • For "GPT 5.5 Thinking Standard", "gpt-5.5:low" was used in the graph.
  • • For "GPT 5 Thinking Mini", "gpt-5-mini:medium" was used in the graph.
  • • All the other models are presented with their respective highest settings.
  • • I have not been able to find consistent graphs for the following models: GPT 5.3 Instant, GLM 5 Turbo, GLM 5V Turbo, Qwen 3.6 Plus Fast, Kimi K2.6 Instant and Minimax M2.7 Air. I will add them immediately if I find some.
  • • GPT 5.5 Thinking Heavy is available for Plus and Pro ChatGPT users through the Codex app, with limits that weekly. It's also available on the web for Pro users.
  • • Claude Opus 4.6 Extended and Claude Sonnet 4.6 Extended are available for free to Gemini users through the Antigravity app, with limits that reset depending on demand.

ChatGPT

  • Web search on thinking models thinks significantly longer (can easily take 7-10 min vs Gemini's 1-3 min)
  • Easy switching between different personalities
  • Can connect to a limited selection of websites, as well as to a web version of Photoshop, though tool calls can be inconsistent

Gemini

  • Nano Banana Pro offers faster and higher resolution on image generation than GPT Image 2. The images' realism are on par; on text GPT Image 2 seems to be better.
  • Native Android integration (iOS confirmed to come soon), though it can be over-eager with tool calls
  • Offers uses of Claude Opus 4.6 Extended, and Claude Sonnet 4.6 Extended through the Antigravity app