How to Use the Top_P parameter?

Supported by
OpenAI

What is Top P?

Top P, also known as nucleus sampling, is a setting supported by some LLMs; it determines which tokens should be considered when generating a response.

Top P is sometimes stylized as top_p in literature.

How does Top P work?

Top_P sorts the tokens, then selects the smallest subset of tokens whose cumulative probability adds up to P. For example, if P is 0.95, the model will consider only the tokens whose combined probability is 95%

Because LLMs, or Large Language Models, are trained on massive corpuses of text, they have huge dictionaries. However, some words are significantly more likely to occur in text (e.g. the, you, jump) than others (e.g. matriculation, dearth, socratic).

These words are cataloged as tokens, the fundamental unit of LLMs. Tokenizing large words includes splitting them into smaller strings (e.g. matriculationmatri, cula, and tion).

Top P defines the probabilistic sum of tokens that should be considered for each subsequent token. Top P’s values span from 0.0 to 1.0. Importantly, Top P defines a probabilistic sum, not a percentage. For example, a Top P of 0.5 would include the most popular tokens whose relative likelihoods sum to 50%.

For a more concrete scenario, imagine a response that has thus far generated the string: I took the dog for a . With a Top P of 0.3, the LLM would only consider tokens like walk (0.2)  and check-up (0.1) because their net sum is 0.3. There’s a long-tail of other considerations, such as trip (0.05), bath (0.04), or wolf (0.02), but the Top P value crops them from consideration.

An alternative to Top P is Top K. While Top P defines a probabilistic sum of a subset of tokens, Top K instead defines the size of the subset. For the aforementioned example, a Top K of 3 would include walk, check-up, and trip.

How do you set the Top P parameter?

OpenAI

To use Top P with the Chat Completions API, you can set the optional top_p parameter. top_p accepts a number value (or null, which default to 1). The value should be between 0.0 and 1.0.  For instance, 0.2 would define a 20% probabilistic mass.

Anthropic

You can set stop sequences on Anthropic’s Messages API with the optional top_p parameter. top_p strictly accepts a number value. The value should be between 0.0 and 1.0.

Gemini

You can set Top P on Gemini’s API with the optional topP parameter. topP strictly accepts a number value. The value should be between 0.0 and 1.0.

How to experiment with Top P

By increasing or decreasing Top P, you can explore how repetitive or complex responses can get, particularly in their vocabulary and phrasing. With a very low Top P, such as 0.05, you’ll get very constrained responses, typically at a low reading level. With a very high Top P, you’ll output a more complex vocabulary.

Conversely, you can also experiment with Top K, which is similar to Top P, but instead defines a quantity of the most popular tokens. However, Top P is a more popular setting because it accounts for fast and slow drop-offs in probabilities; some LLMs like OpenAI support Top P but not Top K.

When to use Top P

Top P can be especially useful in certain scenarios:

  • Managing Reading Level: If you need to produce content at a specific reading level and don’t want to pollute your prompt, Top P is an easy way to accomplish that.
  • Generating Variations: If you are looking to generate multiple variations of a response—such as a title of a book—alternating the Top P value could help.