ChatGPT vs Gemini

A detailed comparison of AI assistant subscriptions and performance

Feature ChatGPT Gemini
Free Go Plus Free Plus Pro
Reasoning model limit per day Unknown [GPT 5 Thinking Mini] 40(10 every 5h) [GPT 5 Thinking Mini] 428(3000/week)[GPT 5.2 Thinking] 3+9[Gemini 3.1 Pro + Gemini 3 Flash Thinking] 30+90[Gemini 3.1 Pro + Gemini 3 Flash Thinking] 100+300[Gemini 3.1 Pro + Gemini 3 Flash Thinking]
Non-reasoning model limit per day 40(10 every 5h)[GPT 5.3 Instant] Unlimited[GPT 5.3 Instant] Unlimited[Gemini 3 Flash]
Context window limit 16k / 256k(Non-reasoning / Reasoning) 32k / 256k(Non-reasoning / Reasoning) 32k 128k 1M
File uploads per day 3 30 640(80 every 3h) Unknown
Memory features Memories Memories, chats history, chat search Memories , chats history Memories, chats historyecosystem memory
Image generation per day 5[GPT Image 1.5] >5?[GPT Image 1.5] 50?[GPT Image 1.5] 20[Nano Banana 2] 50+50[Nano Banana Pro + Nano Banana 2 / Nano Banana 2 Thinking] 100+100[Nano Banana Pro + Nano Banana 2 / Nano Banana 2 Thinking]
Image resolution (1:1) 1024×1024 (1K)[GPT Image 1.5] 2048×2048 (2K)[All image models]
Video generation per day No Unknown[Sora 2] No 2[Veo 3.1 Fast] 3[Veo 3.1 Fast]
Video resolution and length No 480p, 720p(up to 10s)[Sora 2] No 720p(up to 8s)[Veo 3.1 Fast]
Music generation per day No 10(30 seconds per piece)[Lyria 3] 20(30 seconds per piece)[Lyria 3] 50(30 seconds per piece)[Lyria 3]
Custom model creation No[Custom GPTs] Yes[Custom GPTs] Yes[Gems]
Folder organization Yes[Projects] No
Maximum scheduled tasks 10
[Tasks]
No 10
[Scheduled Actions]
Deep research uses per month 5[GPT 5.2 Thinking] 5[GPT 5.2 Thinking] 25[GPT 5.2 Thinking] 5[Gemini 3 Flash Thinking] 360(12/day)[Gemini 3.1 Pro] 600(20/day)[Gemini 3.1 Pro]
Agent mode uses per month No 40 No
Advertisements Yes No No
Video inputs per day No 3(Up to 5 minutes per file) 6(Up to 5 minutes per file) 20(Up to 60 minutes per file)
Audio inputs per day No 3(Up to 10 minutes per file) 6(Up to 10 minutes per file) 20(Up to 180 minutes per file)
Youtube summaries No Yes
Cloud storage included No 15GB 200GB 2TB
API credit included No No $10/month(Unused credit expires monthly)
Plan sharing No No Up to 6 accounts
Age estimation and verification method AI-basedVerification only with ID, through a third-party (Persona Identities) Birth date based (prioritized), AI-basedVerification with ID or credit card, sent directly to Google
Best option
Sufficient option (subjectively)
Includes USA-exclusive features
USA-exclusive features
Subjective recommendations

Footnotes

Model Filters + Extra Benchmarks
Claude models
Claude 4.6 Opus Extended
Claude 4.6 Sonnet Extended
Claude 4.5 Haiku Extended
Claude 4.6 Opus
Claude 4.6 Sonnet
Claude 4.5 Haiku
Claude 4.5 Opus Extended
Claude 4.5 Sonnet Extended
Claude 4.5 Opus
Claude 4.5 Sonnet
Reasoning models
GPT 5.4 Thinking xhigh
GPT 5.4 Thinking
GPT 5.2 Thinking xhigh
GPT 5.2 Thinking
GPT 5 Thinking Mini
Gemini 3.1 Pro
Gemini 3 Flash Thinking
Gemini 3 Pro
Claude 4.6 Opus Extended
Claude 4.6 Sonnet Extended
Claude 4.5 Haiku Extended
Claude 4.5 Opus Extended
Claude 4.5 Sonnet Extended
GLM 5 Deep Think
Non-reasoning models
GPT 5.2 Instant
Gemini 3 Flash
Claude 4.6 Opus
Claude 4.6 Sonnet
Claude 4.5 Haiku
Claude 4.5 Opus
Claude 4.5 Sonnet
GLM 5
Coding benchmarks and models
Coding benchmarks
GPT 5.3 Codex
GPT 5.2 Codex
Other recommended models
(Selection of reliable models with a low hallucination rate).
Legacy models

Knowledge Accuracy — AA-Omniscience Benchmark

Claude models
Reasoning models
Non-reasoning models

Shows how often the model is right or wrong when answering. Higher is better for Correct & Abstention. Lower is better for Hallucination.

Sort by

AA-Omniscience Notes

  • Correct answers were calculated with Correct = A.
  • Abstention answers were calculated with Abstention = (1−A)⋅(1−H).
  • Hallucination answers were calculated with Hallucination = (1−A)⋅H.
  • (Where "A" is the percentage scored on the "AA-Omnisicence Accuracy" benchmark, "H" is the percentage scored on the "AA-Omnisicence Hallucination Rate" benchmark.)

Long Context Accuracy — MRCR v2 8-Needle Benchmark

Claude models
Reasoning models
Non-reasoning models

Shows how accuracy at locating specific details in context degrades as context length increases. Ranges from 8k to 1048k (1M) tokens. Higher is better.

Max Context Length: 1M
Y-Axis Min

MRCR v2 8-Needle Notes

  • • Differs greatly from the next benchmark. While AA-LCR revolves around comprehending really long texts, MRCR v2 8-Needle evaluates the model capability to find specific information in those texts.
  • • There are no benchmarks for 200k tokens.
  • • This benchmark currently does not provide support for GPT Codex models, non-reasoning Claude models and GLM models. The newest GPT 5.4 Thinking and GPT 5.3 Instant have not been added either.
  • • This benchmark has not yet published the full results for the Claude 4.6 models. It is only known that Claude Opus 4.6 Extended performs with an accuracy of >93% on 128k tokens and Claude Sonnet 4.6 Extended performs with an accuracy of >90.3% on 128k tokens. The remaining values were estimated by assuming a roughly linear trend on a log2 context-length scale.
  • • Maximum context window of GPT 5.4 Thinking xhigh and GPT 5 Thinking Mini is 256k/400k (Free,Go,Plus/Pro) tokens.
  • • Maximum context window of GPT 5.3 Instant is 16k/32k/128k tokens (Free/Plus,Go/Pro).
  • • Maximum context window of all Gemini models is 32k/128k/1M tokens (Free/Plus/Pro,Ultra).
  • • Maximum context window of all Claude models is 200k tokens.
  • • Maximum context window of all GLM models is 200k tokens.
  • • 1M tokens equals 1048k tokens.

Long Context Reasoning — AA-LCR

Claude models
Reasoning models
Non-reasoning models

Shows extraction, comprehension and synthesizing of information on texts ranging from 10k to 100k tokens. Higher is better.

Sort by

AA-LCR Notes

  • • Differs greatly from the next benchmark. While AA-LCR revolves around comprehending really long texts, MRCR v2 8-Needle evaluates the model capability to find specific information in those texts.
  • • Maximum context window of GPT 5.4 Thinking xhigh and GPT 5 Thinking Mini is 256k/400k (Free,Go,Plus/Pro) tokens.
  • • Maximum context window of GPT 5.3 Instant is 16k/32k/128k tokens (Free/Plus,Go/Pro).
  • • Maximum context window of all Gemini models is 32k/128k/1M tokens (Free/Plus/Pro,Ultra).
  • • Maximum context window of all Claude models is 200k tokens.
  • • Maximum context window of all GLM models is 200k tokens.
  • • 1M tokens equals 1048k tokens.

Visual Reasoning — MMMU-Pro

Claude models
Reasoning models
Non-reasoning models

Shows performance processing visual and textual information at the same time. Higher is better.

Sort by

MMMU-Pro Notes

  • • ChatGPT models are capable of understanding text and images.
  • • Gemini models are capable of understanding text, images, video and audio.
  • • Claude models are capable of understanding text and images.
  • • GLM 5 Deep Think and GLM 5 do not have the capability of understanding images, video or audio. Their older model, GLM 4.6V, is capable of understanding images and video, however, it doesn't perform too well.

Agentic Coding — Terminal-Bench Hard

Claude models
Reasoning models
Non-reasoning models

Shows capabilities in terminal environments. Higher is better.

Sort by

Agentic Tool Use — τ²-Bench Telecom

Claude models
Reasoning models
Non-reasoning models

Shows performance guiding a user through technical troubleshooting. Higher is better.

Sort by

Notes

  • • For "GPT 5.3 Codex", "gpt-5.3-codex:xhigh" was used in the graph. This model is only available on Codex and Visual Studio Code, not the web or the app.
  • • For "GPT 5.4 Thinking xhigh", "gpt-5.4:xhigh" was used in the graph.
  • • For "GPT 5.4 Thinking", an estimation based on the performance of "gpt-5.4:xhigh", "gpt-5.2:xhigh" and "gpt-5.2:medium" was used in the graph; aiming to get as close as possible to what the performance of "gpt-5.4:medium" might be.
  • • For "GPT 5.3 Instant", "gpt-5.3" was used in the graph.
  • • For "GPT 5 Thinking Mini", "gpt-5-mini:medium" was used in the graph.
  • • For "Gemini 3.1 Pro", "gemini-3.1-pro-preview:high" was used in the graph.
  • • For "Gemini 3 Flash Thinking", "gemini-3-flash-preview:thinking:high" was used in the graph.
  • • For "Gemini 3 Flash", "gemini-3-flash-preview" was used in the graph.
  • • For "Claude Opus 4.6 Extended", "claude-opus-4.6:high" was used in the graph.
  • • For "Claude Opus 4.6", "claude-opus-4.6" was used in the graph.
  • • For "Claude Opus 4.5 Extended", "claude-opus-4.5:high" was used in the graph.
  • • For "Claude Opus 4.5", "claude-opus-4.5" was used in the graph.
  • • For "Claude Sonnet 4.5 Extended", "claude-sonnet-4.5:high" was used in the graph.
  • • For "Claude Sonnet 4.5", "claude-sonnet-4.5" was used in the graph.
  • • For "Claude Haiku 4.5 Extended", "claude-haiku-4.5:high" was used in the graph.
  • • For "Claude Haiku 4.5", "claude-haiku-4.5" was used in the graph.
  • • For "GLM 5 Deep Think", "glm-5:thinking" was used in the graph.
  • • For "GLM 5", "glm-5" was used in the graph.
  • • Claude Opus 4.6 Extended and Claude Sonnet 4.6 Extended are available for free to Gemini users through the Antigravity app, with limits that reset depending on demand.

Legacy Notes

  • • For "GPT 5.2 Thinking xhigh", "gpt-5.2:xhigh" was used in the graph. A weaker (x0.67 maximum reasoning time) version of this model is available only for ChatGPT Pro accounts, by selecting "heavy" as the thinking effort with 5.2 Thinking or "standard" as the thinking effort with 5.2 Pro. The actual model is available only when selecting "extended" as the thinking effort with 5.2 Pro.
  • • For "GPT 5.2 Thinking", "gpt-5.2:medium" was used in the graph. A stronger (x4 maximum reasoning time) version of this model is available for ChatGPT Plus and Pro accounts, by selecting "extended" as the thinking effort with 5.2 Thinking. Selecting "standard" as the thinking effort actually uses a model weaker than it (x0.25 maximum reasoning time). There are unfortunately no benchmarks for those two exact models.
  • • For "GPT 5.2 Instant", "gpt-5.2" was used in the graph.
  • • For "GPT 5.2 Codex", "gpt-5.2-codex:xhigh" was used in the graph. This model is deprecated.
  • • For "Gemini 3 Pro", "gemini-3-pro-preview:high" was used in the graph. This model is deprecated.

ChatGPT

  • Web search on thinking models thinks significantly longer (can easily take 7-10 min vs Gemini's 1-3 min)
  • Easy switching between different personalities
  • Can connect to a limited selection of websites, as well as to a web version of Photoshop, though tool calls can be inconsistent

Gemini

  • Nano Banana Pro offers faster image generation than GPT Image 1.5. Its images also appear more realistic (subjectively).
  • Native Android integration (iOS confirmed to come soon), though it can be over-eager with tool calls
  • Offers uses of Claude Opus 4.6 Extended, and Claude Sonnet 4.6 Extended through the Antigravity app