Search...

Index

Inline evaluation / Guardrails: Ensure good system performance at run-time

This is some text inside of a div block.

GPT-5: What should we expect?

Learn more about the expected GPT-5 features on improved reasoning, multimodality and accuracy on math & coding

Author

Aug 30, 2024

If you work with LLMs, you probably wait for each version of OpenAI’s GPT series with excitement. It’s reminiscent of the early iPhone days, where each subsequent model was touted as a significant upgrade from the predecessor. This time, the excitement is shared by both consumers and enterprises, as these innovations set the foundations of many advanced AI systems.

To date, this year has been dominated by GPT-4. More specifically, the derivative GPT-4 Turbo and GPT-4o/Omni model have been at the forefront. Turbo significantly improved GPT-4’s accuracy, and Omni extended GPT-4’s reasoning and interfacing to voice/audio.

Now, we’re expecting OpenAI to debut its next major installment in the GPT series sometime at the end of this year or early 2025. The timing aligns with their 1-2 year cadence of releasing major models.

Knowing OpenAI.. they’ll probably launch in 2025.

But what are developers expecting from this new model?

Let’s cover the latest, the timelines — and the expectations from developers who build with LLMs.

‍

Double Launch: Project Strawberry and Project Orion

According to two insiders, there are two models associated with OpenAI’s next launch—Project Strawberry and Project Orion (yes, we know, it sounds a little silly). The former is a brand-new type of model, tackling the coveted problem of reasoning. The other is the actual successor to GPT-4.

What is Project Strawberry?

Previously known as Q*, Project Strawberry is OpenAI’s most secretive project. The latest info suggests we might see a distilled version of this model as soon as this fall, and it’s expected to:

Solve new math problems it’s never encountered before (but how?)
Take time to “think deeply” when planning its answers
Offer advanced reasoning capabilities that you can toggle on or off depending on how quickly you need a response

But, there is another project that’s being talked about now: Project Orion!

What is Project Orion?

Project Orion is expected to be the next flagship model by OpenAI. What’s novel, however, is that GPT-5 is not just being trained on direct Internet data, but also synthetic data that’s being generated by Project Strawberry.

To visualize this through an oversimplification, Project Strawberry would download and digest a paper on, say, chemical titration, generate an abundance of data that’s ingestible to an LLM, and then train Project Orion on it so that it could tackle chemistry problems. This was harder before, because those reasoning problems were presented in irregular and sparse ways across the existing Internet.

This sounds too good to be true — We’re definitely excited to try this one!

It’s worth saying that although these are just rumors, as these things go, they become realities in a few months in.

But what is OpenAI really doing behind the curtain to enable these features?

‍

The Engine Driving GPT-5’s Capabilities

We put our thinking cap and talked with some experts to understand how OpenAI might be pushing the boundaries for their new models. Here’re three interesting observations:

Improved Reasoning with Built-in Prompting Techniques

These days, there are tons of studies showing what works well when prompting LLMs. One particularly effective method is Chain of Thought prompting, which helps the model reason more effectively. Also, Anthropic introduced the “Thinking” step where the model first lays out its reasoning in an XML tags, before answering the question in another tag.

So, what’s stopping OpenAI from incorporating these techniques into the model, allowing it to perform these steps behind the scenes before delivering an answer?

People are also expecting the model to rank responses internally, evaluating options before selecting the best one to output.

One downside might be slower responses—but they could include an option to toggle this feature on or off, as some rumors suggest.

Knowing when it’s wrong

When it comes to LLM hallucinations — you can’t really take them out completely, because they’ll hinder the models ability to be “creative”. So finding the right balance between the two has been something that a lot of people have been thinking about.

Logprobs was a great feature that helped with this. Logprobs in LLMs indicate the model’s confidence in each generated token, showing how likely it is to be the correct choice based on the preceding context. Higher log probabilities mean the token is more likely in that context, helping users see the model’s confidence in its output or explore other options the model considered.

So, maybe we’ll eventually see logprobs as a built-in feature, allowing the model to be more “confident” in its answers right from the start.

High Multimodality Improvements

Today, using GPT-4o for data extractions from images/pdfs is very constrained — but that can change very fast. Mostly because we’ve been using GPT-4o for a while now, and they can utilize all of our data (pdfs, images..) to improve the mutimodal capability of the next model a.k.a GPT-5 or alike.

The biggest jump from GPT-3.5 to GPT-4 came from widespread adoption, and now OpenAI has doubled its users since last year—so the training is in full swing!

‍

What do customers need from GPT-5?

We scoured the internet and asked our customers — “What do you actually need from GPT-5 to improve your systems?”. We got a really obvious answer:

The most hoped for capabilities are increased context windows, improved reasoning and multimodal functionality, lower hallucinations, and, of course, accuracy bumps across all benchmarks (especially coding & math) at a lower price.

In short, many things that didn’t work before should suddenly start working. We’re expecting a leap similar to the one from GPT-3.5 to GPT-4 — faster, cheaper and more powerful models.

‍

GPT-5 Release Date?

The rumors suggest that we might get an early version of Project Orion (aka GPT-5) this fall, but knowing OpenAI — plan for 2025.

Thus far, OpenAI has been releasing major models every 2 years, with some intermediary models in-between. The major models (e.g. GPT-2, GPT-3, etc.) have featured sizable leaps. The intermediary GPT models (e.g. GPT-3.5, GPT-4 Turbo, GPT-4o) overcame the most immediate hiccups that held back the respective flagship model.

A Quick Timeline of GPT releases

Let’s review a timeline of the previous GPT AI models. While most users have only learned about GPT recently, it’s been going through iterations for over a half-decade.

GPT Version	Launch Date	Details
GPT-1	June 2018	GPT-1 was OpenAI's inaugural flagship model, trained on just 40GB of data. It could rephrase and generate text, and do some translation. It could only respond to fairly short sentences.
GPT-2	February 2019	GPT-2 was trained on significantly more text, with over 1.5B parameters. It could maintain coherence and relevance far better than GPT-1.
GPT-3	June 2020	GPT-3 was trained on 570GB of text data with over 175B parameters. This time, GPT-3 was trained on large corpuses of knowledge such as Wikipedia. GPT-3 was a significant leap over GPT-2, attaining major spotlight. It was also criticized by the general public for biases.
GPT-3.5	December 2022	GPT-3.5 was similar to GPT-3, but it featured some additional techniques such as Reinforcement Learning with Human Feedback (RLHF) to make responses more human-friendly, allowing it to parse intent better.
GPT-4	March 2023	Like previous iterations, GPT-4 was trained on more data. This dramatically expanded its knowledge base. It also was better at cracking down on disallowed content. The difference between GPT-4 and GPT-3.5 is minimal for small tasks, but sizable for big ones. It also integrated images as a valid input.
GPT-4 Turbo	November 2023	GPT-4 Turbo is a faster version of GPT-4 with some improved accuracy by cracking down on hallucinations.
GPT-4o	May 2024	GPT-4o expanded the multimodality of GPT-4, now able to handle not just text and images, but also voice interactions, allowing an end user to speak directly to it.
GPT-5	Rumored late 2024 or early 2025	Release date TBD. Should be the most advanced AI model to date.

‍

Thus far, what has Sam Altman hinted at?

As goes the tune of many Reddit comments, we know to take OpenAI CEO Sam Altman’s comments with a grain of salt. It’s not a matter of dishonesty; hype cycles are just games of exaggeration and headlines, and his opinions are often harvested by online discourse for the spiciest bits. Rumors of previous models have always wavered from existential dread to praising monumental advancements. (Admittedly, both descriptors can coexist, but in our experience, GPT models rarely amount to either extreme.)

Regardless, Sam’s, Open AI’s, and OpenAI’s partners’ comments still matter. So far, there have been two major themes hinted for GPT-5's advantages over previous versions: multimodality and reasoning.

Multimodality

GPT-4o’s hallmark achievement was allowing users to interact with it via speech. It also integrated with DALLE, enabling users to requested generated images related to the conversation.

This trend will only continue with GPT-5, according to Sam Altman. GPT-5’s flagship feature will also be multimodality, with text, images, and videos being valid inputs and available outputs. Unlike GPT-4o, GPT-5 should be able to work with audiovisual data seamlessly, where there is a consistent thread between them, not just one-off generations.

This has been a constant gripe from our customers with the current model, where images aren’t consistent with one another, making the multimodality feel more like an integration than a native feature. GPT-5 should fix that.

Better “Reasoning”

While GPT cannot reason from an anthropomorphic standpoint, it can simulate reasoning through probabilistic inference.

On Bill Gates’s Unconfuse Me, Sam Altman spoke on how GPT-4o featured major reasoning (and accuracy) improvements over GPT-4, and that trend should continue with GPT-5 due to the sizable leaps in training size. Microsoft’s CTO, Kevin Scott, was more forthright with GPT-5’s promise, expecting it could “pass your qualifying exams when you’re a PhD student” and that everybody will be impressed by “reasoning breakthroughs” of the model.

‍

How much will GPT-5 cost?

It is difficult to guess how much GPT-5 will cost. However, OpenAI has had a history of releasing the new flagship model at an expensive price, but then trimming the cost with subsequent models that are more streamlined and limited. We could expect the same pattern for GPT-5, especially if it can tackle niche tasks.

Conclusion

In short, all of this talk around the next GPT model—whether it’s Project Strawberry or Project Orion—is real, and everyone’s feeling it, from developers to businesses.

These new models promise to take things to the next level with smarter reasoning, better handling of different types of media, and overall stronger performance.

But as we look forward to these cool new features, we also need to think about the trade-offs, like how fast it responds and how accurate it is.

Whether OpenAI rolls these out later this year or in 2025, one thing’s clear: the next GPT model is going to shake things up in a big way.

Knowing OpenAI.. they’ll probably launch in 2025.

But what are developers expecting from this new model?

Let’s cover the latest, the timelines — and the expectations from developers who build with LLMs.

‍

Double Launch: Project Strawberry and Project Orion

What is Project Strawberry?

Solve new math problems it’s never encountered before (but how?)
Take time to “think deeply” when planning its answers
Offer advanced reasoning capabilities that you can toggle on or off depending on how quickly you need a response

But, there is another project that’s being talked about now: Project Orion!

What is Project Orion?

This sounds too good to be true — We’re definitely excited to try this one!

It’s worth saying that although these are just rumors, as these things go, they become realities in a few months in.

But what is OpenAI really doing behind the curtain to enable these features?

‍

The Engine Driving GPT-5’s Capabilities

We put our thinking cap and talked with some experts to understand how OpenAI might be pushing the boundaries for their new models. Here’re three interesting observations:

Improved Reasoning with Built-in Prompting Techniques

So, what’s stopping OpenAI from incorporating these techniques into the model, allowing it to perform these steps behind the scenes before delivering an answer?

People are also expecting the model to rank responses internally, evaluating options before selecting the best one to output.

One downside might be slower responses—but they could include an option to toggle this feature on or off, as some rumors suggest.

Knowing when it’s wrong

So, maybe we’ll eventually see logprobs as a built-in feature, allowing the model to be more “confident” in its answers right from the start.

High Multimodality Improvements

The biggest jump from GPT-3.5 to GPT-4 came from widespread adoption, and now OpenAI has doubled its users since last year—so the training is in full swing!

‍

What do customers need from GPT-5?

We scoured the internet and asked our customers — “What do you actually need from GPT-5 to improve your systems?”. We got a really obvious answer:

In short, many things that didn’t work before should suddenly start working. We’re expecting a leap similar to the one from GPT-3.5 to GPT-4 — faster, cheaper and more powerful models.

‍

GPT-5 Release Date?

The rumors suggest that we might get an early version of Project Orion (aka GPT-5) this fall, but knowing OpenAI — plan for 2025.

A Quick Timeline of GPT releases

Let’s review a timeline of the previous GPT AI models. While most users have only learned about GPT recently, it’s been going through iterations for over a half-decade.

GPT Version	Launch Date	Details
GPT-1	June 2018	GPT-1 was OpenAI's inaugural flagship model, trained on just 40GB of data. It could rephrase and generate text, and do some translation. It could only respond to fairly short sentences.
GPT-2	February 2019	GPT-2 was trained on significantly more text, with over 1.5B parameters. It could maintain coherence and relevance far better than GPT-1.
GPT-3	June 2020	GPT-3 was trained on 570GB of text data with over 175B parameters. This time, GPT-3 was trained on large corpuses of knowledge such as Wikipedia. GPT-3 was a significant leap over GPT-2, attaining major spotlight. It was also criticized by the general public for biases.
GPT-3.5	December 2022	GPT-3.5 was similar to GPT-3, but it featured some additional techniques such as Reinforcement Learning with Human Feedback (RLHF) to make responses more human-friendly, allowing it to parse intent better.
GPT-4	March 2023	Like previous iterations, GPT-4 was trained on more data. This dramatically expanded its knowledge base. It also was better at cracking down on disallowed content. The difference between GPT-4 and GPT-3.5 is minimal for small tasks, but sizable for big ones. It also integrated images as a valid input.
GPT-4 Turbo	November 2023	GPT-4 Turbo is a faster version of GPT-4 with some improved accuracy by cracking down on hallucinations.
GPT-4o	May 2024	GPT-4o expanded the multimodality of GPT-4, now able to handle not just text and images, but also voice interactions, allowing an end user to speak directly to it.
GPT-5	Rumored late 2024 or early 2025	Release date TBD. Should be the most advanced AI model to date.

‍

Thus far, what has Sam Altman hinted at?

Multimodality

GPT-4o’s hallmark achievement was allowing users to interact with it via speech. It also integrated with DALLE, enabling users to requested generated images related to the conversation.

Better “Reasoning”

While GPT cannot reason from an anthropomorphic standpoint, it can simulate reasoning through probabilistic inference.

‍

How much will GPT-5 cost?

Conclusion

In short, all of this talk around the next GPT model—whether it’s Project Strawberry or Project Orion—is real, and everyone’s feeling it, from developers to businesses.

These new models promise to take things to the next level with smarter reasoning, better handling of different types of media, and overall stronger performance.

But as we look forward to these cool new features, we also need to think about the trade-offs, like how fast it responds and how accurate it is.

Whether OpenAI rolls these out later this year or in 2025, one thing’s clear: the next GPT model is going to shake things up in a big way.

ABOUT THE AUTHOR

Mathew Pregasen

Technical Contributor

Mathew Pregasen is a technical expert with experience with AI, infrastructure, security, and frontend frameworks. He contributes to multiple technical publications and is an alumnus of Columbia University and YCombinator.

Anita Kirkovska

Founding Growth Lead

An AI expert with a strong ML background, specializing in GenAI and LLM education. A former Fulbright scholar, she leads Growth and Education at Vellum, helping companies build and scale AI products. She conducts LLM evaluations and writes extensively on AI best practices, empowering business leaders to drive effective AI adoption.

talk with an AI Expert

Product Updates

July 1, 2025

•

6 min

Vellum Product Update | May & June

LLM basics

June 8, 2025

•

5 min

Big Ideas from the AI Engineer World’s Fair

LLM basics

June 1, 2025

•

8 min

Build AI Products Faster: Top Development Platforms Compared

Customer Stories

May 30, 2025

•

5 min

How GravityStack Cut Credit Agreement Review Time by 200% with Agentic AI

Guides

May 28, 2025

•

7 min

How the Best Product and Engineering Teams Ship AI Solutions

Model Comparisons

May 23, 2025

•

8 min

Evaluation: Claude 4 Sonnet vs OpenAI o4-mini vs Gemini 2.5 Pro

The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska

Head of Engineering