If you work with LLMs, you probably wait for each version of OpenAI’s GPT series with excitement. It’s reminiscent of the early iPhone days, where each subsequent model was touted as a significant upgrade from the predecessor. This time, the excitement is shared by both consumers and enterprises, as these innovations set the foundations of many advanced AI systems.
To date, this year has been dominated by GPT-4. More specifically, the derivative GPT-4 Turbo and GPT-4o/Omni model have been at the forefront. Turbo significantly improved GPT-4’s accuracy, and Omni extended GPT-4’s reasoning and interfacing to voice/audio.
Now, we’re expecting OpenAI to debut its next major installment in the GPT series sometime at the end of this year or early 2025. The timing aligns with their 1-2 year cadence of releasing major models.
Knowing OpenAI.. they’ll probably launch in 2025.
But what are developers expecting from this new model?
Let’s cover the latest, the timelines — and the expectations from developers who build with LLMs.
According to two insiders, there are two models associated with OpenAI’s next launch—Project Strawberry and Project Orion (yes, we know, it sounds a little silly). The former is a brand-new type of model, tackling the coveted problem of reasoning. The other is the actual successor to GPT-4.
What is Project Strawberry?
Previously known as Q*, Project Strawberry is OpenAI’s most secretive project. The latest info suggests we might see a distilled version of this model as soon as this fall, and it’s expected to:
- Solve new math problems it’s never encountered before (but how?)
- Take time to “think deeply” when planning its answers
- Offer advanced reasoning capabilities that you can toggle on or off depending on how quickly you need a response
But, there is another project that’s being talked about now: Project Orion!
What is Project Orion?
Project Orion is expected to be the next flagship model by OpenAI. What’s novel, however, is that GPT-5 is not just being trained on direct Internet data, but also synthetic data that’s being generated by Project Strawberry.
To visualize this through an oversimplification, Project Strawberry would download and digest a paper on, say, chemical titration, generate an abundance of data that’s ingestible to an LLM, and then train Project Orion on it so that it could tackle chemistry problems. This was harder before, because those reasoning problems were presented in irregular and sparse ways across the existing Internet.
This sounds too good to be true — We’re definitely excited to try this one!
It’s worth saying that although these are just rumors, as these things go, they become realities in a few months in.
But what is OpenAI really doing behind the curtain to enable these features?
We put our thinking cap and talked with some experts to understand how OpenAI might be pushing the boundaries for their new models. Here’re three interesting observations:
Improved Reasoning with Built-in Prompting Techniques
These days, there are tons of studies showing what works well when prompting LLMs. One particularly effective method is Chain of Thought prompting, which helps the model reason more effectively. Also, Anthropic introduced the “Thinking” step where the model first lays out its reasoning in an XML tags, before answering the question in another tag.
So, what’s stopping OpenAI from incorporating these techniques into the model, allowing it to perform these steps behind the scenes before delivering an answer?
People are also expecting the model to rank responses internally, evaluating options before selecting the best one to output.
One downside might be slower responses—but they could include an option to toggle this feature on or off, as some rumors suggest.
Knowing when it’s wrong
When it comes to LLM hallucinations — you can’t really take them out completely, because they’ll hinder the models ability to be “creative”. So finding the right balance between the two has been something that a lot of people have been thinking about.
Logprobs was a great feature that helped with this. Logprobs in LLMs indicate the model’s confidence in each generated token, showing how likely it is to be the correct choice based on the preceding context. Higher log probabilities mean the token is more likely in that context, helping users see the model’s confidence in its output or explore other options the model considered.
So, maybe we’ll eventually see logprobs as a built-in feature, allowing the model to be more “confident” in its answers right from the start.
High Multimodality Improvements
Today, using GPT-4o for data extractions from images/pdfs is very constrained — but that can change very fast. Mostly because we’ve been using GPT-4o for a while now, and they can utilize all of our data (pdfs, images..) to improve the mutimodal capability of the next model a.k.a GPT-5 or alike.
The biggest jump from GPT-3.5 to GPT-4 came from widespread adoption, and now OpenAI has doubled its users since last year—so the training is in full swing!
We scoured the internet and asked our customers — “What do you actually need from GPT-5 to improve your systems?”. We got a really obvious answer:
The most hoped for capabilities are increased context windows, improved reasoning and multimodal functionality, lower hallucinations, and, of course, accuracy bumps across all benchmarks (especially coding & math) at a lower price.
In short, many things that didn’t work before should suddenly start working. We’re expecting a leap similar to the one from GPT-3.5 to GPT-4 — faster, cheaper and more powerful models.
The rumors suggest that we might get an early version of Project Orion (aka GPT-5) this fall, but knowing OpenAI — plan for 2025.
Thus far, OpenAI has been releasing major models every 2 years, with some intermediary models in-between. The major models (e.g. GPT-2, GPT-3, etc.) have featured sizable leaps. The intermediary GPT models (e.g. GPT-3.5, GPT-4 Turbo, GPT-4o) overcame the most immediate hiccups that held back the respective flagship model.
A Quick Timeline of GPT releases
Let’s review a timeline of the previous GPT AI models. While most users have only learned about GPT recently, it’s been going through iterations for over a half-decade.
As goes the tune of many Reddit comments, we know to take OpenAI CEO Sam Altman’s comments with a grain of salt. It’s not a matter of dishonesty; hype cycles are just games of exaggeration and headlines, and his opinions are often harvested by online discourse for the spiciest bits. Rumors of previous models have always wavered from existential dread to praising monumental advancements. (Admittedly, both descriptors can coexist, but in our experience, GPT models rarely amount to either extreme.)
Regardless, Sam’s, Open AI’s, and OpenAI’s partners’ comments still matter. So far, there have been two major themes hinted for GPT-5's advantages over previous versions: multimodality and reasoning.
Multimodality
GPT-4o’s hallmark achievement was allowing users to interact with it via speech. It also integrated with DALLE, enabling users to requested generated images related to the conversation.
This trend will only continue with GPT-5, according to Sam Altman. GPT-5’s flagship feature will also be multimodality, with text, images, and videos being valid inputs and available outputs. Unlike GPT-4o, GPT-5 should be able to work with audiovisual data seamlessly, where there is a consistent thread between them, not just one-off generations.
This has been a constant gripe from our customers with the current model, where images aren’t consistent with one another, making the multimodality feel more like an integration than a native feature. GPT-5 should fix that.
Better “Reasoning”
While GPT cannot reason from an anthropomorphic standpoint, it can simulate reasoning through probabilistic inference.
On Bill Gates’s Unconfuse Me, Sam Altman spoke on how GPT-4o featured major reasoning (and accuracy) improvements over GPT-4, and that trend should continue with GPT-5 due to the sizable leaps in training size. Microsoft’s CTO, Kevin Scott, was more forthright with GPT-5’s promise, expecting it could “pass your qualifying exams when you’re a PhD student” and that everybody will be impressed by “reasoning breakthroughs” of the model.
It is difficult to guess how much GPT-5 will cost. However, OpenAI has had a history of releasing the new flagship model at an expensive price, but then trimming the cost with subsequent models that are more streamlined and limited. We could expect the same pattern for GPT-5, especially if it can tackle niche tasks.
Conclusion
In short, all of this talk around the next GPT model—whether it’s Project Strawberry or Project Orion—is real, and everyone’s feeling it, from developers to businesses.
These new models promise to take things to the next level with smarter reasoning, better handling of different types of media, and overall stronger performance.
But as we look forward to these cool new features, we also need to think about the trade-offs, like how fast it responds and how accurate it is.
Whether OpenAI rolls these out later this year or in 2025, one thing’s clear: the next GPT model is going to shake things up in a big way.
Latest AI news, tips, and techniques
Specific tips for Your AI use cases
No spam
Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.
This is just a great newsletter. The content is so helpful, even when I’m busy I read them.
Experiment, Evaluate, Deploy, Repeat.
AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.