How OpenAI's improved o3 Sets the Stage for GPT-5

Yoshi Soornack
Jun 16
3 min read

When OpenAI released GPT-4o in May 2024, it marked a turning point. Here was a model that could hear, see, and speak – all within a single architecture. A year later, the evolution of that system, known internally as the O3-series, has not just matured. It has signposted what comes next: a model not only more powerful but more fluid, more contextually aware, and more human in its reasoning. This is the shape of GPT-5 before it arrives.

A Quick Look Back at What Made GPT-4o Matter

GPT-4o, the “omni” model, was OpenAI’s first truly multimodal system. It fused text, image, and audio into one native model—no longer a bolted-together pipeline of specialist components. With this, GPT-4o achieved:

Real-time voice conversations with emotional nuance
Rapid image analysis and feedback
Significantly lower latency compared to GPT-4 Turbo
A 128k context window, enabling long-form memory within a single chat

This wasn’t just a speed bump in the road of progress. GPT-4o signalled a shift in AI design philosophy: fewer silos, more integration.

The o3 Breakthrough and What Changed in 2025

By June 2025, OpenAI had launched o3-pro—a refined, high-reasoning version of GPT-4o available to ChatGPT Pro and API users. While it apparently uses the same base model as GPT-4o, o3-pro thinks longer and performs better in complex domains like business, science, coding, and structured writing. It was, by expert judgment, OpenAI’s most capable model to date.

And yet, performance wasn’t the only leap. June also brought major usability changes:

Cost reductions: o3 API pricing dropped by 80%, making large-scale deployment more viable
Voice upgrades: Better tone, rhythm, and real-time translation in live conversations
Tool invocation: The model now consistently uses internal tools like image generators or code interpreters when the task calls for it

These aren’t just incremental upgrades. They reflect a systemic shift in how OpenAI wants its models to think, act, and interact.

What Makes O3 Different Under the Hood?

Unlike the GPT-3 and GPT-4 family, o3 introduces several architectural advances:

Unified multimodal training: Instead of training separate models for speech, text, and vision, o3 is trained end-to-end across all modalities
Reasoning with visual input: It doesn’t just see an image, it uses it in its thinking process
Chain-of-thought native: o3 internally explores reasoning paths before producing an answer, leading to more robust, less brittle outcomes
Extended memory: 128,000-token context windows plus the introduction of user-specific long-term memory make o3 persistently smarter over time
Autonomous tool use: o3 can decide to browse, compute, or generate visuals without explicit user commands
Deliberative safety: It reasons about policy, risk, and appropriateness before answering sensitive prompts

Together, these create a system that doesn’t just answer questions. It engages, adapts, and solves.

How This All Leads to GPT-5

Sam Altman has made it clear: the next major release, GPT-5, will unify OpenAI’s fragmented model stack. The distinctions between GPT, O-series, and plug-in tools are set to disappear. Instead, we’ll get one intelligence that scales up or down as needed.

Expected features include:

Truly unified multimodal interactions (text, image, voice, canvas)
Deep memory integration for persistent context across time and use cases
Adaptive reasoning that modulates depth and speed depending on task complexity
Agentic behaviour, where the model can take initiative and combine tools to achieve goals

From the outside, GPT-5 might feel like magic. But under the hood, it will be o3’s lineage that makes it possible.

Why This Matters for Business and Beyond

For professionals exploring AI integration, especially for complex workflows, the o3-to-GPT-5 journey is more than technical evolution. It signals maturity in three key areas:

Trust in reasoning: These models are finally making fewer critical errors on hard tasks.
Speed without dumbing down: Even sophisticated queries return usable, well-reasoned answers within seconds.
Scalability with specificity: From real-time conversations to large document analysis, a single system can now adapt.

Only time will tell whether GPT-5 will be what the AI community thinks it will be, or whether it will be more, or something completely different. What we know for now is that we're starting to see the models behaving differently in ChatGPT, indicating that changes are happening in the back-end.