Giant language fashions (LLMs) provide a number of parameters that allow you to fine-tune their habits and management how they generate responses. If a mannequin isn’t producing the specified output, the problem typically lies in how these parameters are configured. On this tutorial, we’ll discover a few of the mostly used ones — max_completion_tokens, temperature, top_p, presence_penalty, and frequency_penalty — and perceive how every influences the mannequin’s output.
Putting in the dependencies
pip set up openai pandas matplotlib
Loading OpenAI API Key
import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')
Initializing the Mannequin
from openai import OpenAI
mannequin="gpt-4.1"
consumer = OpenAI()
Max Tokens
Max Tokens is the utmost variety of tokens the mannequin can generate throughout a run. The mannequin will attempt to keep inside this restrict throughout all turns. If it exceeds the desired quantity, the run will cease and be marked as incomplete.
A smaller worth (like 16) limits the mannequin to very brief solutions, whereas a better worth (like 80) permits it to generate extra detailed and full responses. Rising this parameter offers the mannequin extra room to elaborate, clarify, or format its output extra naturally.
immediate = "What's the hottest French cheese?"
for tokens in [16, 30, 80]:
print(f"n--- max_output_tokens = {tokens} ---")
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "developer", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
max_completion_tokens=tokens
)
print(response.decisions[0].message.content material)
Temperature
In Giant Language Fashions (LLMs), the temperature parameter controls the range and randomness of generated outputs. Decrease temperature values make the mannequin extra deterministic and centered on probably the most possible responses — splendid for duties that require accuracy and consistency. Increased values, alternatively, introduce creativity and selection by permitting the mannequin to discover much less probably choices. Technically, temperature scales the chances of predicted tokens within the softmax perform: growing it flattens the distribution (extra numerous outputs), whereas lowering it sharpens the distribution (extra predictable outputs).
On this code, we’re prompting the LLM to present 10 totally different responses (n_choices = 10) for a similar query — “What’s one intriguing place price visiting?” — throughout a spread of temperature values. By doing this, we are able to observe how the range of solutions adjustments with temperature. Decrease temperatures will probably produce related or repeated responses, whereas greater temperatures will present a broader and extra various distribution of locations.
immediate = "What's one intriguing place price visiting? Give a single-word reply and suppose globally."
temperatures = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.5]
n_choices = 10
outcomes = {}
for temp in temperatures:
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
temperature=temp,
n=n_choices
)
# Accumulate all n responses in an inventory
outcomes[temp] = [response.choices[i].message.content material.strip() for i in vary(n_choices)]
# Show outcomes
for temp, responses in outcomes.objects():
print(f"n--- temperature = {temp} ---")
print(responses)

As we are able to see, because the temperature will increase to 0.6, the responses turn into extra numerous, transferring past the repeated single reply “Petra.” At a better temperature of 1.5, the distribution shifts, and we are able to see responses like Kyoto, and Machu Picchu as properly.
Prime P
Prime P (also referred to as nucleus sampling) is a parameter that controls what number of tokens the mannequin considers based mostly on a cumulative chance threshold. It helps the mannequin give attention to the almost certainly tokens, typically bettering coherence and output high quality.
Within the following visualization, we first set a temperature worth after which apply Prime P = 0.5 (50%), which means solely the highest 50% of the chance mass is stored. Be aware that when temperature = 0, the output is deterministic, so Prime P has no impact.
The technology course of works as follows:
- Apply the temperature to regulate the token chances.
- Use Prime P to retain solely probably the most possible tokens that collectively make up 50% of the whole chance mass.
- Renormalize the remaining chances earlier than sampling.
We’ll visualize how the token chance distribution adjustments throughout totally different temperature values for the query:
“What’s one intriguing place price visiting?”
immediate = "What's one intriguing place price visiting? Give a single-word reply and suppose globally."
temperatures = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.5]
n_choices = 10
results_ = {}
for temp in temperatures:
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
temperature=temp,
n=n_choices,
top_p=0.5
)
# Accumulate all n responses in an inventory
results_[temp] = [response.choices[i].message.content material.strip() for i in vary(n_choices)]
# Show outcomes
for temp, responses in results_.objects():
print(f"n--- temperature = {temp} ---")
print(responses)


Since Petra persistently accounted for greater than 50% of the whole response chance, making use of Prime P = 0.5 filters out all different choices. In consequence, the mannequin solely selects “Petra” as the ultimate output in each case.
Frequency Penalty
Frequency Penalty controls how a lot the mannequin avoids repeating the identical phrases or phrases in its output.
Vary: -2 to 2
Default: 0
When the frequency penalty is greater, the mannequin will get penalized for utilizing phrases it has already used earlier than. This encourages it to decide on new and totally different phrases, making the textual content extra various and fewer repetitive.
In easy phrases — a better frequency penalty = much less repetition and extra creativity.
We’ll take a look at this utilizing the immediate:
“Checklist 10 potential titles for a fantasy ebook. Give the titles solely and every title on a brand new line.”
immediate = "Checklist 10 potential titles for a fantasy ebook. Give the titles solely and every title on a brand new line."
frequency_penalties = [-2.0, -1.0, 0.0, 0.5, 1.0, 1.5, 2.0]
outcomes = {}
for fp in frequency_penalties:
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
frequency_penalty=fp,
temperature=0.2
)
textual content = response.decisions[0].message.content material
objects = [line.strip("- ").strip() for line in text.split("n") if line.strip()]
outcomes[fp] = objects
# Show outcomes
for fp, objects in outcomes.objects():
print(f"n--- frequency_penalty = {fp} ---")
print(objects)


- Low frequency penalties (-2 to 0): Titles are likely to repeat, with acquainted patterns like “The Shadow Weaver’s Oath”, “Crown of Ember and Ice”, and “The Final Dragon’s Inheritor” showing incessantly.
- Reasonable penalties (0.5 to 1.5): Some repetition stays, however the mannequin begins producing extra various and artistic titles.
- Excessive penalty (2.0): The primary three titles are nonetheless the identical, however after that, the mannequin produces numerous, distinctive, and imaginative ebook names (e.g., “Whisperwind Chronicles: Rise of the Phoenix Queen”, “Ashes Beneath the Willow Tree”).
Presence Penalty
Presence Penalty controls how a lot the mannequin avoids repeating phrases or phrases which have already appeared within the textual content.
- Vary: -2 to 2
- Default: 0
A better presence penalty encourages the mannequin to make use of a greater diversity of phrases, making the output extra numerous and artistic.
In contrast to the frequency penalty, which accumulates with every repetition, the presence penalty is utilized as soon as to any phrase that has already appeared, lowering the possibility it will likely be repeated within the output. This helps the mannequin produce textual content with extra selection and originality.
immediate = "Checklist 10 potential titles for a fantasy ebook. Give the titles solely and every title on a brand new line."
presence_penalties = [-2.0, -1.0, 0.0, 0.5, 1.0, 1.5, 2.0]
outcomes = {}
for fp in frequency_penalties:
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
presence_penalty=fp,
temperature=0.2
)
textual content = response.decisions[0].message.content material
objects = [line.strip("- ").strip() for line in text.split("n") if line.strip()]
outcomes[fp] = objects
# Show outcomes
for fp, objects in outcomes.objects():
print(f"n--- presence_penalties = {fp} ---")
print(objects)


- Low to Reasonable Penalty (-2.0 to 0.5): Titles are considerably various, with some repetition of widespread fantasy patterns like “The Shadow Weaver’s Oath”, “The Final Dragon’s Inheritor”, “Crown of Ember and Ice”.
- Medium Penalty (1.0 to 1.5): The primary few widespread titles stay, whereas later titles present extra creativity and distinctive mixtures. Examples: “Ashes of the Fallen Kingdom”, “Secrets and techniques of the Starbound Forest”, “Daughter of Storm and Stone”.
- Most Penalty (2.0): Prime three titles keep the identical, however the remainder turn into extremely numerous and imaginative. Examples: “Moonfire and Thorn”, “Veil of Starlit Ashes”, “The Midnight Blade”.
Take a look at the FULL CODES right here. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to observe us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you’ll be able to be a part of us on telegram as properly.

I’m a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I’ve a eager curiosity in Information Science, particularly Neural Networks and their software in varied areas.
Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s tendencies in the present day: learn extra, subscribe to our publication, and turn into a part of the NextTech neighborhood at NextTech-news.com

