Guides
August 30, 2024

GPT-5: What should we expect?

Guest Post
Co-authors
Mathew Pregasen
Anita Kirkovska

If you work with LLMs, you probably wait for each version of OpenAI’s GPT series with excitement. It’s reminiscent of the early iPhone days, where each subsequent model was touted as a significant upgrade from the predecessor. This time, the excitement is shared by both consumers and enterprises, as these innovations set the foundations of many advanced AI systems.

To date, this year has been dominated by GPT-4. More specifically, the derivative GPT-4 Turbo and GPT-4o/Omni model have been at the forefront. Turbo significantly improved GPT-4’s accuracy, and Omni extended GPT-4’s reasoning and interfacing to voice/audio.

Now, we’re expecting OpenAI to debut its next major installment in the GPT series sometime at the end of this year or early 2025. The timing aligns with their 1-2 year cadence of releasing major models.

Knowing OpenAI.. they’ll probably launch in 2025.

But what are developers expecting from this new model?

Let’s cover the latest, the timelines — and the expectations from developers who build with LLMs.

Learn how successful companies build with AI

Download this practical guide and enable your teams to innovate with AI.
Get Free Copy
If you want to compare more models, check our LLM Leaderboard here or book a demo to start using Vellum Evaluations to run these tests at scale.
If you want to evaluate models for your use-case book a call with one of our experts, and we’ll help you figure it out.
Read the whole analysis in the sections that follow, and sign up for our newsletter if you want to get these analyses in your inbox!
Inspired by this, we've designed Vellum to meet these needs, and now many product and engineering teams use our suite of tools—Workflows, Evaluations, and Deployments—to build agentic workflows.

Build a Production-Ready AI System

Platform for product and engineering teams to build, deploy, and improve AI products.
Learn More

LLM orchestration with Vellum

Platform for product and engineering teams to build, deploy, and improve AI products.
Learn More

Double Launch: Project Strawberry and Project Orion

According to two insiders, there are two models associated with OpenAI’s next launch—Project Strawberry and Project Orion (yes, we know, it sounds a little silly). The former is a brand-new type of model, tackling the coveted problem of reasoning. The other is the actual successor to GPT-4.

What is Project Strawberry?

Previously known as Q*, Project Strawberry is OpenAI’s most secretive project. The latest info suggests we might see a distilled version of this model as soon as this fall, and it’s expected to:

  • Solve new math problems it’s never encountered before (but how?)
  • Take time to “think deeply” when planning its answers
  • Offer advanced reasoning capabilities that you can toggle on or off depending on how quickly you need a response

But, there is another project that’s being talked about now: Project Orion!

What is Project Orion?

Project Orion is expected to be the next flagship model by OpenAI. What’s novel, however, is that GPT-5 is not just being trained on direct Internet data, but also synthetic data that’s being generated by Project Strawberry.

To visualize this through an oversimplification, Project Strawberry would download and digest a paper on, say, chemical titration, generate an abundance of data that’s ingestible to an LLM, and then train Project Orion on it so that it could tackle chemistry problems. This was harder before, because those reasoning problems were presented in irregular and sparse ways across the existing Internet.

This sounds too good to be true — We’re definitely excited to try this one!

It’s worth saying that although these are just rumors, as these things go, they become realities in a few months in.

But what is OpenAI really doing behind the curtain to enable these features?

The Engine Driving GPT-5’s Capabilities

We put our thinking cap and talked with some experts to understand how OpenAI might be pushing the boundaries for their new models. Here’re three interesting observations:

Improved Reasoning with Built-in Prompting Techniques

These days, there are tons of studies showing what works well when prompting LLMs. One particularly effective method is Chain of Thought prompting, which helps the model reason more effectively. Also, Anthropic introduced the “Thinking” step where the model first lays out its reasoning in an XML tags, before answering the question in another tag.

So, what’s stopping OpenAI from incorporating these techniques into the model, allowing it to perform these steps behind the scenes before delivering an answer?

People are also expecting the model to rank responses internally, evaluating options before selecting the best one to output.

One downside might be slower responses—but they could include an option to toggle this feature on or off, as some rumors suggest.

Knowing when it’s wrong

When it comes to LLM hallucinations — you can’t really take them out completely, because they’ll hinder the models ability to be “creative”. So finding the right balance between the two has been something that a lot of people have been thinking about.

Logprobs was a great feature that helped with this. Logprobs in LLMs indicate the model’s confidence in each generated token, showing how likely it is to be the correct choice based on the preceding context. Higher log probabilities mean the token is more likely in that context, helping users see the model’s confidence in its output or explore other options the model considered.

So, maybe we’ll eventually see logprobs as a built-in feature, allowing the model to be more “confident” in its answers right from the start.

High Multimodality Improvements

Today, using GPT-4o for data extractions from images/pdfs is very constrained — but that can change very fast. Mostly because we’ve been using GPT-4o for a while now, and they can utilize all of our data (pdfs, images..) to improve the mutimodal capability of the next model a.k.a GPT-5 or alike.

The biggest jump from GPT-3.5 to GPT-4 came from widespread adoption, and now OpenAI has doubled its users since last year—so the training is in full swing!

What do customers need from GPT-5?

We scoured the internet and asked our customers — “What do you actually need from GPT-5 to improve your systems?”. We got a really obvious answer:

The most hoped for capabilities are increased context windows, improved reasoning and multimodal functionality, lower hallucinations, and, of course, accuracy bumps across all benchmarks (especially coding & math) at a lower price.

In short, many things that didn’t work before should suddenly start working. We’re expecting a leap similar to the one from GPT-3.5 to GPT-4 — faster, cheaper and more powerful models.

GPT-5 Release Date?

The rumors suggest that we might get an early version of Project Orion (aka GPT-5) this fall, but knowing OpenAI — plan for 2025.

Thus far, OpenAI has been releasing major models every 2 years, with some intermediary models in-between. The major models (e.g. GPT-2, GPT-3, etc.) have featured sizable leaps. The intermediary GPT models (e.g. GPT-3.5, GPT-4 Turbo, GPT-4o) overcame the most immediate hiccups that held back the respective flagship model.

A Quick Timeline of GPT releases

Let’s review a timeline of the previous GPT AI models. While most users have only learned about GPT recently, it’s been going through iterations for over a half-decade.

GPT Version Launch Date Details
GPT-1 June 2018 GPT-1 was OpenAI's inaugural flagship model, trained on just 40GB of data. It could rephrase and generate text, and do some translation. It could only respond to fairly short sentences.
GPT-2 February 2019 GPT-2 was trained on significantly more text, with over 1.5B parameters. It could maintain coherence and relevance far better than GPT-1.
GPT-3 June 2020 GPT-3 was trained on 570GB of text data with over 175B parameters. This time, GPT-3 was trained on large corpuses of knowledge such as Wikipedia. GPT-3 was a significant leap over GPT-2, attaining major spotlight. It was also criticized by the general public for biases.
GPT-3.5 December 2022 GPT-3.5 was similar to GPT-3, but it featured some additional techniques such as Reinforcement Learning with Human Feedback (RLHF) to make responses more human-friendly, allowing it to parse intent better.
GPT-4 March 2023 Like previous iterations, GPT-4 was trained on more data. This dramatically expanded its knowledge base. It also was better at cracking down on disallowed content. The difference between GPT-4 and GPT-3.5 is minimal for small tasks, but sizable for big ones. It also integrated images as a valid input.
GPT-4 Turbo November 2023 GPT-4 Turbo is a faster version of GPT-4 with some improved accuracy by cracking down on hallucinations.
GPT-4o May 2024 GPT-4o expanded the multimodality of GPT-4, now able to handle not just text and images, but also voice interactions, allowing an end user to speak directly to it.
GPT-5 Rumored late 2024 or early 2025 Release date TBD. Should be the most advanced AI model to date.

Thus far, what has Sam Altman hinted at?

As goes the tune of many Reddit comments, we know to take OpenAI CEO Sam Altman’s comments with a grain of salt. It’s not a matter of dishonesty; hype cycles are just games of exaggeration and headlines, and his opinions are often harvested by online discourse for the spiciest bits. Rumors of previous models have always wavered from existential dread to praising monumental advancements. (Admittedly, both descriptors can coexist, but in our experience, GPT models rarely amount to either extreme.)

Regardless, Sam’s, Open AI’s, and OpenAI’s partners’ comments still matter. So far, there have been two major themes hinted for GPT-5's advantages over previous versions: multimodality and reasoning.

Multimodality

GPT-4o’s hallmark achievement was allowing users to interact with it via speech. It also integrated with DALLE, enabling users to requested generated images related to the conversation.

This trend will only continue with GPT-5, according to Sam Altman. GPT-5’s flagship feature will also be multimodality, with text, images, and videos being valid inputs and available outputs. Unlike GPT-4o, GPT-5 should be able to work with audiovisual data seamlessly, where there is a consistent thread between them, not just one-off generations.

This has been a constant gripe from our customers with the current model, where images aren’t consistent with one another, making the multimodality feel more like an integration than a native feature. GPT-5 should fix that.

Better “Reasoning”

While GPT cannot reason from an anthropomorphic standpoint, it can simulate reasoning through probabilistic inference.

On Bill Gates’s Unconfuse Me, Sam Altman spoke on how GPT-4o featured major reasoning (and accuracy) improvements over GPT-4, and that trend should continue with GPT-5 due to the sizable leaps in training size. Microsoft’s CTO, Kevin Scott, was more forthright with GPT-5’s promise, expecting it could “pass your qualifying exams when you’re a PhD student” and that everybody will be impressed by “reasoning breakthroughs” of the model.

How much will GPT-5 cost?

It is difficult to guess how much GPT-5 will cost. However, OpenAI has had a history of releasing the new flagship model at an expensive price, but then trimming the cost with subsequent models that are more streamlined and limited. We could expect the same pattern for GPT-5, especially if it can tackle niche tasks.

Conclusion

In short, all of this talk around the next GPT model—whether it’s Project Strawberry or Project Orion—is real, and everyone’s feeling it, from developers to businesses.

These new models promise to take things to the next level with smarter reasoning, better handling of different types of media, and overall stronger performance.

But as we look forward to these cool new features, we also need to think about the trade-offs, like how fast it responds and how accurate it is.

Whether OpenAI rolls these out later this year or in 2025, one thing’s clear: the next GPT model is going to shake things up in a big way.

TABLE OF CONTENTS

Join 10,000+ developers staying up-to-date with the latest AI techniques and methods.
🎉
Thanks for joining our newsletter.
Oops! Something went wrong.

About the authors

Mathew Pregasen

Technical Contributor

Mathew Pregasen is a technical expert with experience with AI, infrastructure, security, and frontend frameworks. He contributes to multiple technical publications and is an alumnus of Columbia University and YCombinator.

Anita Kirkovska

Founding GenAI Growth at Vellum

Anita Kirkovska, is currently leading Growth and Content Marketing at Vellum. She is a technical marketer, with an engineering background and a sharp acumen for scaling startups. She has helped SaaS startups scale and had a successful exit from an ML company. Anita writes a lot of content on generative AI to educate business founders on best practices in the field.

Related posts