ChatGPT O3-mini Is Here: First Impressions and Comparisons

ChatGPT o3-mini & o3-mini-high first impressions and comparisons: Faster, cost-effective, solid for coding and brainstorming.

ChatGPT O3-mini Is Here: First Impressions and Comparisons
Image from openai.com

OpenAI has just rolled out its latest conversational AI variants for ChatGPT: o3-mini and o3-mini-high. In today’s quick first-look article, we share our initial impressions, compare these models to the previous o1 and other models, and explore what this means for everyday users and developers alike.


What OpenAI says about the new models

According to official sources, the new models are designed with different user needs in mind:

  • ChatGPT o3-mini:
    Intended for everyday queries and casual interactions, it promises a lightweight and fast experience at a lower cost while maintaining the reliability users have come to expect.
  • ChatGPT o3-mini-high:
    Targeting users with more complex needs, this variant boasts improved performance, offering better multi-turn conversation handling, creativity, and problem-solving abilities. OpenAI highlights that o3-mini-high is optimized for tasks such as coding challenges and advanced brainstorming.

o3-mini and o3-mini-high reactions on X

  • Performance in Coding Tasks:
    Users on X have noted that when handling coding challenges, such as generating a p5.js script for 100 bouncing yellow balls, o3-mini produces impressive results. In particular, o3-mini-high variant has been reported to score around 200 Elo points higher than the o1 on Codeforces for Plus users, suggesting a measurable improvement in coding and problem-solving performance.
  • Hot Takes:
    Not all feedback has been positive. Some critics on X have labeled the o3-mini as “a flashy cash grab masquerading as innovation”, arguing that downsizing isn’t synonymous with progress. Others have compared its performance to that of a GPT 4o claiming that the initial experience feels underwhelming compared to expectations.

Our Testing Impressions

We have tested OpenAI’s new o3-mini and o3-mini-high models across diverse scenarios.

  1. Brainstorming (via ChatGPT):
    1. o3‑mini:
      Mirrors the performance of o1 in generating conventional ideas and workflows, but with significantly faster response times.
    2. o3‑mini‑high:
      Delivers noticeably more creative outputs and demonstrates greater accuracy in estimating unavailable information. It ventures slightly beyond convention yet doesn’t quite match the leap seen in Deepseek R1.
  2. Coding in IDE (Complex Context):
    1. In a refactoring challenge, o3‑mini solved the problem roughly 20% more efficiently than o1. It also required fewer prompts (4 prompts versus 5 for o1 and 7 for the Sonnet 3.5 agent), underscoring its enhanced efficiency in complex coding tasks.
    2. However, when it came to feature prototyping, o3‑mini was slightly outperformed by both o1 and the Sonnet 3.5 agent, needing two additional prompts to approach their one-shot results.
  3. Coding in ChatGPT (Low-Context Environment):
    1. During an “idea-to-app” experiment, o3‑mini‑high matched o1’s performance closely, demonstrating that in low-context scenarios, the mini‑high variant holds its own. o3-mini on the other hand was left behind significantly.

From our observations, the o3-mini-high consistently outperforms both the o3-mini and o1, with the extent of its advantage over o1 varying by context. Notably, the o3-mini can also deliver surprisingly strong results in certain scenarios.


Pricing and Speed

Cost-Effective and Fast:
Several developers mention that while they continue to rely on the o1 for heavy-duty applications, testing with o3-mini has shown it to be both cheaper and faster for everyday use. Specifically:

  • Pricing: This makes the o3-mini 63% cheaper than o1-mini and 93% less expensive than the full o1 model on a per-token basis.
    • Input Tokens: o3-mini is priced at $0.55 per million cached input tokens.
    • Output Tokens: It costs $4.40 per million output tokens.
  • Speed:
    In addition to cost savings, the o3-mini boasts response times that are 24% faster than o1-mini. In our tests, average latency was reduced from 12.8 seconds to approximately 10.32 seconds for processing 100 tokens.

This combination of cost-efficiency and improved speed makes the o3-mini particularly attractive for real-time applications and smaller-scale tasks.


Final Thoughts: Are They Strong Enough?

The o3-mini and o3-mini-high models stand as strong competitors and market leaders, offering just a glimpse of what the full o3 model will be capable of. They highlight OpenAI’s sustained leadership in the AI industry, its rapid pace of innovation, and its ability to adapt and respond effectively to rivals like DeepSeek.