On this tutorial, we construct a cost-aware planning agent that intentionally balances output high quality towards real-world constraints similar to token utilization, latency, and tool-call budgets. We design the agent to generate a number of candidate actions, estimate their anticipated prices and advantages, after which choose an execution plan that maximizes worth whereas staying inside strict budgets. With this, we show how agentic techniques can transfer past “at all times use the LLM” habits and as an alternative motive explicitly about trade-offs, effectivity, and useful resource consciousness, which is vital for deploying brokers reliably in constrained environments. Take a look at the FULL CODES right here.
import os, time, math, json, random
from dataclasses import dataclass, subject
from typing import Checklist, Dict, Elective, Tuple, Any
from getpass import getpass
USE_OPENAI = True
if USE_OPENAI:
if not os.getenv("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass("Enter OPENAI_API_KEY (hidden): ").strip()
attempt:
from openai import OpenAI
consumer = OpenAI()
besides Exception as e:
print("OpenAI SDK import failed. Falling again to offline mode.nError:", e)
USE_OPENAI = False
We arrange the execution setting and securely load the OpenAI API key at runtime with out hardcoding it. We additionally initialize the consumer so the agent gracefully falls again to offline mode if the API is unavailable. Take a look at the FULL CODES right here.
def approx_tokens(textual content: str) -> int:
return max(1, math.ceil(len(textual content) / 4))
@dataclass
class Finances:
max_tokens: int
max_latency_ms: int
max_tool_calls: int
@dataclass
class Spend:
tokens: int = 0
latency_ms: int = 0
tool_calls: int = 0
def inside(self, b: Finances) -> bool:
return (self.tokens <= b.max_tokens and
self.latency_ms <= b.max_latency_ms and
self.tool_calls <= b.max_tool_calls)
def add(self, different: "Spend") -> "Spend":
return Spend(
tokens=self.tokens + different.tokens,
latency_ms=self.latency_ms + different.latency_ms,
tool_calls=self.tool_calls + different.tool_calls
)
We outline the core budgeting abstractions that allow the agent to motive explicitly about prices. We mannequin token utilization, latency, and power calls as first-class portions and supply utility strategies to build up and validate spend. It provides us a clear basis for imposing constraints all through planning and execution. Take a look at the FULL CODES right here.
@dataclass
class StepOption:
identify: str
description: str
est_spend: Spend
est_value: float
executor: str
payload: Dict[str, Any] = subject(default_factory=dict)
@dataclass
class PlanCandidate:
steps: Checklist[StepOption]
spend: Spend
worth: float
rationale: str = ""
def llm_text(immediate: str, *, mannequin: str = "gpt-5", effort: str = "low") -> str:
if not USE_OPENAI:
return ""
t0 = time.time()
resp = consumer.responses.create(
mannequin=mannequin,
reasoning={"effort": effort},
enter=immediate,
)
_ = (time.time() - t0)
return resp.output_text or ""
We introduce the info constructions that characterize particular person motion selections and full plan candidates. We additionally outline a light-weight LLM wrapper that standardizes how textual content is generated and measured. This separation permits the planner to motive about actions abstractly with out being tightly coupled to execution particulars. Take a look at the FULL CODES right here.
def generate_step_options(activity: str) -> Checklist[StepOption]:
base = [
StepOption(
name="Clarify deliverables (local)",
description="Extract deliverable checklist + acceptance criteria from the task.",
est_spend=Spend(tokens=60, latency_ms=20, tool_calls=0),
est_value=6.0,
executor="local",
),
StepOption(
name="Outline plan (LLM)",
description="Create a structured outline with sections, constraints, and assumptions.",
est_spend=Spend(tokens=600, latency_ms=1200, tool_calls=1),
est_value=10.0,
executor="llm",
payload={"prompt_kind":"outline"}
),
StepOption(
name="Outline plan (local)",
description="Create a rough outline using templates (no LLM).",
est_spend=Spend(tokens=120, latency_ms=40, tool_calls=0),
est_value=5.5,
executor="local",
),
StepOption(
name="Risk register (LLM)",
description="Generate risks, mitigations, owners, and severity.",
est_spend=Spend(tokens=700, latency_ms=1400, tool_calls=1),
est_value=9.0,
executor="llm",
payload={"prompt_kind":"risks"}
),
StepOption(
name="Risk register (local)",
description="Generate a standard risk register from a reusable template.",
est_spend=Spend(tokens=160, latency_ms=60, tool_calls=0),
est_value=5.0,
executor="local",
),
StepOption(
name="Timeline (LLM)",
description="Draft a realistic milestone timeline with dependencies.",
est_spend=Spend(tokens=650, latency_ms=1300, tool_calls=1),
est_value=8.5,
executor="llm",
payload={"prompt_kind":"timeline"}
),
StepOption(
name="Timeline (local)",
description="Draft a simple timeline from a generic milestone template.",
est_spend=Spend(tokens=150, latency_ms=60, tool_calls=0),
est_value=4.8,
executor="local",
),
StepOption(
name="Quality pass (LLM)",
description="Rewrite for clarity, consistency, and formatting.",
est_spend=Spend(tokens=900, latency_ms=1600, tool_calls=1),
est_value=8.0,
executor="llm",
payload={"prompt_kind":"polish"}
),
StepOption(
name="Quality pass (local)",
description="Light formatting + consistency checks without LLM.",
est_spend=Spend(tokens=120, latency_ms=50, tool_calls=0),
est_value=3.5,
executor="local",
),
]
if USE_OPENAI:
meta_prompt = f"""
You're a planning assistant. For the duty under, suggest 3-5 OPTIONAL additional steps that enhance high quality,
like checks, validations, or stakeholder tailoring. Hold every step brief.
TASK:
{activity}
Return JSON record with fields: identify, description, est_value(1-10).
"""
txt = llm_text(meta_prompt, mannequin="gpt-5", effort="low")
attempt:
gadgets = json.masses(txt.strip())
for it in gadgets[:5]:
base.append(
StepOption(
identify=str(it.get("identify","Additional step (native)"))[:60],
description=str(it.get("description",""))[:200],
est_spend=Spend(tokens=120, latency_ms=60, tool_calls=0),
est_value=float(it.get("est_value", 5.0)),
executor="native",
)
)
besides Exception:
move
return base
We concentrate on producing a various set of candidate steps, together with each LLM-based and native alternate options with totally different price–high quality trade-offs. We optionally use the mannequin itself to recommend further low-cost enhancements whereas nonetheless controlling their impression on the price range. By doing so, we enrich the motion house with out shedding effectivity. Take a look at the FULL CODES right here.
def plan_under_budget(
choices: Checklist[StepOption],
price range: Finances,
*,
max_steps: int = 6,
beam_width: int = 12,
diversity_penalty: float = 0.2
) -> PlanCandidate:
def redundancy_cost(chosen: Checklist[StepOption], new: StepOption) -> float:
key_new = new.identify.cut up("(")[0].strip().decrease()
overlap = 0
for s in chosen:
key_s = s.identify.cut up("(")[0].strip().decrease()
if key_s == key_new:
overlap += 1
return overlap * diversity_penalty
beams: Checklist[PlanCandidate] = [PlanCandidate(steps=[], spend=Spend(), worth=0.0, rationale="")]
for _ in vary(max_steps):
expanded: Checklist[PlanCandidate] = []
for cand in beams:
for decide in choices:
if decide in cand.steps:
proceed
new_spend = cand.spend.add(decide.est_spend)
if not new_spend.inside(price range):
proceed
new_value = cand.worth + decide.est_value - redundancy_cost(cand.steps, decide)
expanded.append(
PlanCandidate(
steps=cand.steps + [opt],
spend=new_spend,
worth=new_value,
rationale=cand.rationale
)
)
if not expanded:
break
expanded.kind(key=lambda c: c.worth, reverse=True)
beams = expanded[:beam_width]
greatest = max(beams, key=lambda c: c.worth)
return greatest
We implement the budget-constrained planning logic that searches for the highest-value mixture of steps below strict limits. We apply a beam-style search with redundancy penalties to keep away from wasteful motion overlap. That is the place the agent really turns into cost-aware by optimizing worth topic to constraints. Take a look at the FULL CODES right here.
def run_local_step(activity: str, step: StepOption, working: Dict[str, Any]) -> str:
identify = step.identify.decrease()
if "make clear deliverables" in identify:
return (
"Deliverables guidelines:n"
"- Govt summaryn- Scope & assumptionsn- Workplan + milestonesn"
"- Danger register (danger, impression, chance, mitigation, proprietor)n"
"- Subsequent steps + information neededn"
)
if "define plan" in identify:
return (
"Define:n1) Context & objectiven2) Scopen3) Approachn4) Timelinen5) Risksn6) Subsequent stepsn"
)
if "danger register" in identify:
return (
"Danger register (template):n"
"1) Knowledge entry delays | Excessive | Mitigation: agree information record + ownersn"
"2) Stakeholder alignment | Med | Mitigation: weekly reviewn"
"3) Tooling constraints | Med | Mitigation: phased rolloutn"
)
if "timeline" in identify:
return (
"Timeline (template):n"
"Week 1: discovery + requirementsnWeek 2: prototype + feedbackn"
"Week 3: pilot + metricsnWeek 4: rollout + handovern"
)
if "high quality move" in identify:
draft = working.get("draft", "")
return "Mild high quality move accomplished (headings normalized, bullets aligned).n" + draft
return f"Accomplished: {step.identify}n"
def run_llm_step(activity: str, step: StepOption, working: Dict[str, Any]) -> str:
form = step.payload.get("prompt_kind", "generic")
context = working.get("draft", "")
prompts = {
"define": f"Create a crisp, structured define for the duty under.nTASK:n{activity}nReturn a numbered define.",
"dangers": f"Create a danger register for the duty under. Embody: Danger | Impression | Probability | Mitigation | Proprietor.nTASK:n{activity}",
"timeline": f"Create a sensible milestone timeline with dependencies for the duty under.nTASK:n{activity}",
"polish": f"Rewrite and polish the next draft for readability and consistency.nDRAFT:n{context}",
"generic": f"Assist with this step: {step.description}nTASK:n{activity}nCURRENT:n{context}",
}
return llm_text(prompts.get(form, prompts["generic"]), mannequin="gpt-5", effort="low")
def execute_plan(activity: str, plan: PlanCandidate) -> Tuple[str, Spend]:
working = {"draft": ""}
precise = Spend()
for i, step in enumerate(plan.steps, 1):
t0 = time.time()
if step.executor == "llm" and USE_OPENAI:
out = run_llm_step(activity, step, working)
tool_calls = 1
else:
out = run_local_step(activity, step, working)
tool_calls = 0
dt_ms = int((time.time() - t0) * 1000)
tok = approx_tokens(out)
precise = precise.add(Spend(tokens=tok, latency_ms=dt_ms, tool_calls=tool_calls))
working["draft"] += f"nn### Step {i}: {step.identify}n{out}n"
return working["draft"].strip(), precise
TASK = "Draft a 1-page challenge proposal for a logistics dashboard + fleet optimization pilot, together with scope, timeline, and dangers."
BUDGET = Finances(
max_tokens=2200,
max_latency_ms=3500,
max_tool_calls=2
)
choices = generate_step_options(TASK)
best_plan = plan_under_budget(choices, BUDGET, max_steps=6, beam_width=14)
print("=== SELECTED PLAN (budget-aware) ===")
for s in best_plan.steps:
print(f"- {s.identify} | est_spend={s.est_spend} | est_value={s.est_value}")
print("nEstimated spend:", best_plan.spend)
print("Finances:", BUDGET)
print("n=== EXECUTING PLAN ===")
draft, precise = execute_plan(TASK, best_plan)
print("n=== OUTPUT DRAFT ===n")
print(draft[:6000])
print("n=== ACTUAL SPEND (approx) ===")
print(precise)
print("nWithin price range?", precise.inside(BUDGET))
We execute the chosen plan and observe precise useful resource utilization step-by-step. We dynamically select between native and LLM execution paths and mixture the ultimate output right into a coherent draft. By evaluating estimated and precise spend, we show how planning assumptions will be validated and refined in observe.
In conclusion, we demonstrated how a cost-aware planning agent can motive about its useful resource consumption and adapt its habits in actual time. We executed solely the steps that match inside predefined budgets and tracked precise spend to validate the planning assumptions, closing the loop between estimation and execution. Additionally, we highlighted how agentic AI techniques can turn out to be extra sensible, controllable, and scalable by treating price, latency, and power utilization as first-class choice variables somewhat than afterthoughts.
Take a look at the FULL CODES right here. Additionally, be happy to comply with us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you’ll be able to be a part of us on telegram as nicely.
The submit How an AI Agent Chooses What to Do Below Tokens, Latency, and Software-Name Finances Constraints? appeared first on MarkTechPost.
Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s traits at this time: learn extra, subscribe to our publication, and turn out to be a part of the NextTech group at NextTech-news.com

