Constructing A Context-Folding LLM Agent For Lengthy-Horizon Reasoning With Reminiscence Compression And Software Use

On this tutorial, we discover the best way to construct a Context-Folding LLM Agent that effectively solves lengthy, advanced duties by intelligently managing restricted context. We design the agent to interrupt down a big activity into smaller subtasks, carry out reasoning or calculations when wanted, after which fold every accomplished sub-trajectory into concise summaries. By doing this, we protect important data whereas conserving the energetic reminiscence small. Take a look at the FULL CODES right here.

import os, re, sys, math, random, json, textwrap, subprocess, shutil, time
from typing import Record, Dict, Tuple
strive:
   import transformers
besides:
   subprocess.run([sys.executable, "-m", "pip", "install", "-q", "transformers", "accelerate", "sentencepiece"], verify=True)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
MODEL_NAME = os.environ.get("CF_MODEL", "google/flan-t5-small")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
mannequin = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
llm = pipeline("text2text-generation", mannequin=mannequin, tokenizer=tokenizer, device_map="auto")
def llm_gen(immediate: str, max_new_tokens=160, temperature=0.0) -> str:
   out = llm(immediate, max_new_tokens=max_new_tokens, do_sample=temperature>0.0, temperature=temperature)[0]["generated_text"]
   return out.strip()

We start by organising the environment and loading a light-weight Hugging Face mannequin. We use this mannequin to generate and course of textual content domestically, guaranteeing the agent runs easily on Google Colab with none API dependencies. Take a look at the FULL CODES right here.

import ast, operator as op
OPS = {ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul, ast.Div: op.truediv, ast.Pow: op.pow, ast.USub: op.neg, ast.FloorDiv: op.floordiv, ast.Mod: op.mod}
def _eval_node(n):
   if isinstance(n, ast.Num): return n.n
   if isinstance(n, ast.UnaryOp) and sort(n.op) in OPS: return OPS[type(n.op)](_eval_node(n.operand))
   if isinstance(n, ast.BinOp) and sort(n.op) in OPS: return OPS[type(n.op)](_eval_node(n.left), _eval_node(n.proper))
   increase ValueError("Unsafe expression")
def calc(expr: str):
   node = ast.parse(expr, mode="eval").physique
   return _eval_node(node)
class FoldingMemory:
   def __init__(self, max_chars:int=800):
       self.energetic=[]; self.folds=[]; self.max_chars=max_chars
   def add(self,textual content:str):
       self.energetic.append(textual content.strip())
       whereas len(self.active_text())>self.max_chars and len(self.energetic)>1:
           popped=self.energetic.pop(0)
           fold=f"- Folded: {popped[:120]}..."
           self.folds.append(fold)
   def fold_in(self,abstract:str): self.folds.append(abstract.strip())
   def active_text(self)->str: return "n".be part of(self.energetic)
   def folded_text(self)->str: return "n".be part of(self.folds)
   def snapshot(self)->Dict: return {"active_chars":len(self.active_text()),"n_folds":len(self.folds)}

We outline a easy calculator device for primary arithmetic and create a reminiscence system that dynamically folds previous context into concise summaries. This helps us keep a manageable energetic reminiscence whereas retaining important info. Take a look at the FULL CODES right here.

SUBTASK_DECOMP_PROMPT="""You might be an knowledgeable planner. Decompose the duty under into 2-4 crisp subtasks.
Return every subtask as a bullet beginning with '- ' in precedence order.
Process: "{activity}" """
SUBTASK_SOLVER_PROMPT="""You're a exact drawback solver with minimal steps.
If a calculation is required, write one line 'CALC(expr)'.
In any other case write 'ANSWER: '.
Suppose briefly; keep away from chit-chat.


Process: {activity}
Subtask: {subtask}
Notes (folded context):
{notes}


Now reply with both CALC(...) or ANSWER: ..."""
SUBTASK_SUMMARY_PROMPT="""Summarize the subtask end result in <=3 bullets, whole <=50 tokens.
Subtask: {identify}
Steps:
{hint}
Remaining: {remaining}
Return solely bullets beginning with '- '."""
FINAL_SYNTH_PROMPT="""You're a senior agent. Synthesize a remaining, coherent resolution utilizing ONLY:
- The unique activity
- Folded summaries (under)
Keep away from repeating steps. Be concise and actionable.


Process: {activity}
Folded summaries:
{folds}


Remaining reply:"""
def parse_bullets(textual content:str)->Record[str]:
   return [ln[2:].strip() for ln in textual content.splitlines() if ln.strip().startswith("- ")]

We design immediate templates that information the agent in decomposing duties, fixing subtasks, and summarizing outcomes. These structured prompts allow clear communication between reasoning steps and the mannequin’s responses. Take a look at the FULL CODES right here.

def run_subtask(activity:str, subtask:str, reminiscence:FoldingMemory, max_tool_iters:int=3)->Tuple[str,str,List[str]]:
   notes=(reminiscence.folded_text() or "(none)")
   hint=[]; remaining=""
   for _ in vary(max_tool_iters):
       immediate=SUBTASK_SOLVER_PROMPT.format(activity=activity,subtask=subtask,notes=notes)
       out=llm_gen(immediate,max_new_tokens=96); hint.append(out)
       m=re.search(r"CALC((.+?))",out)
       if m:
           strive:
               val=calc(m.group(1))
               hint.append(f"TOOL:CALC -> {val}")
               out2=llm_gen(immediate+f"nTool end result: {val}nNow produce 'ANSWER: ...' solely.",max_new_tokens=64)
               hint.append(out2)
               if out2.strip().startswith("ANSWER:"):
                   remaining=out2.break up("ANSWER:",1)[1].strip(); break
           besides Exception as e:
               hint.append(f"TOOL:CALC ERROR -> {e}")
       if out.strip().startswith("ANSWER:"):
           remaining=out.break up("ANSWER:",1)[1].strip(); break
   if not remaining:
       remaining="No definitive reply; partial reasoning:n"+"n".be part of(hint[-2:])
   summ=llm_gen(SUBTASK_SUMMARY_PROMPT.format(identify=subtask,hint="n".be part of(hint),remaining=remaining),max_new_tokens=80)
   summary_bullets="n".be part of(parse_bullets(summ)[:3]) or f"- {subtask}: {remaining[:60]}..."
   return remaining, summary_bullets, hint
class ContextFoldingAgent:
   def __init__(self,max_active_chars:int=800):
       self.reminiscence=FoldingMemory(max_chars=max_active_chars)
       self.metrics={"subtasks":0,"tool_calls":0,"chars_saved_est":0}
   def decompose(self,activity:str)->Record[str]:
       plan=llm_gen(SUBTASK_DECOMP_PROMPT.format(activity=activity),max_new_tokens=96)
       subs=parse_bullets(plan)
       return subs[:4] if subs else ["Main solution"]
   def run(self,activity:str)->Dict:
       t0=time.time()
       self.reminiscence.add(f"TASK: {activity}")
       subtasks=self.decompose(activity)
       self.metrics["subtasks"]=len(subtasks)
       folded=[]
       for st in subtasks:
           self.reminiscence.add(f"SUBTASK: {st}")
           remaining,fold_summary,hint=run_subtask(activity,st,self.reminiscence)
           self.reminiscence.fold_in(fold_summary)
           folded.append(f"- {st}: {remaining}")
           self.reminiscence.add(f"SUBTASK_DONE: {st}")
       remaining=llm_gen(FINAL_SYNTH_PROMPT.format(activity=activity,folds=self.reminiscence.folded_text()),max_new_tokens=200)
       t1=time.time()
       return {"activity":activity,"remaining":remaining.strip(),"folded_summaries":self.reminiscence.folded_text(),
               "active_context_chars":len(self.reminiscence.active_text()),
               "subtask_finals":folded,"runtime_sec":spherical(t1-t0,2)}

We implement the agent’s core logic, during which every subtask is executed, summarized, and folded again into reminiscence. This step demonstrates how context folding permits the agent to cause iteratively with out shedding monitor of prior reasoning. Take a look at the FULL CODES right here.

DEMO_TASKS=[
   "Plan a 3-day study schedule for ML with daily workouts and simple meals; include time blocks.",
   "Compute a small project budget with 3 items (laptop 799.99, course 149.5, snacks 23.75), add 8% tax and 5% buffer, and present a one-paragraph recommendation."
]
def fairly(d): return json.dumps(d, indent=2, ensure_ascii=False)
if __name__=="__main__":
   agent=ContextFoldingAgent(max_active_chars=700)
   for i,activity in enumerate(DEMO_TASKS,1):
       print("="*70)
       print(f"DEMO #{i}: {activity}")
       res=agent.run(activity)
       print("n--- Folded Summaries ---n"+(res["folded_summaries"] or "(none)"))
       print("n--- Remaining Reply ---n"+res["final"])
       print("n--- Diagnostics ---")
       diag={okay:res[k] for okay in ["active_context_chars","runtime_sec"]}
       diag["n_subtasks"]=len(agent.decompose(activity))
       print(fairly(diag))

We run the agent on pattern duties to look at the way it plans, executes, and synthesizes remaining outcomes. Via these examples, we see the entire context-folding course of in motion, producing concise and coherent outputs.

In conclusion, we exhibit how context folding permits long-horizon reasoning whereas avoiding reminiscence overload. We see how every subtask is deliberate, executed, summarized, and distilled into compact data, mimicking how an clever agent would deal with advanced workflows over time. By combining decomposition, device use, and context compression, we create a light-weight but highly effective agentic system that scales reasoning effectively.

Take a look at the FULL CODES right here and Paper . Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be able to be part of us on telegram as properly.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

🙌 Comply with MARKTECHPOST: Add us as a most well-liked supply on Google.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s traits right now: learn extra, subscribe to our e-newsletter, and grow to be a part of the NextTech neighborhood at NextTech-news.com

What's Hot

EB Video games Canada opening large idea retailer in Montreal on October 16

Delta Electronics drives Business 5.0 transformation with good and sustainable options at ITAP 2025

QeRL: NVFP4-Quantized Reinforcement Studying (RL) Brings 32B LLM Coaching to a Single H100—Whereas Bettering Exploration

Constructing a Context-Folding LLM Agent for Lengthy-Horizon Reasoning with Reminiscence Compression and Software Use

QeRL: NVFP4-Quantized Reinforcement Studying (RL) Brings 32B LLM Coaching to a Single H100—Whereas Bettering Exploration

Anthropic Launches Claude Haiku 4.5: Small AI Mannequin that Delivers Sonnet-4-Degree Coding Efficiency at One-Third the Price and greater than Twice the Velocity

High 8 Knowledge Classification Corporations in 2025

EB Video games Canada opening large idea retailer in Montreal on October 16

Delta Electronics drives Business 5.0 transformation with good and sustainable options at ITAP 2025

QeRL: NVFP4-Quantized Reinforcement Studying (RL) Brings 32B LLM Coaching to a Single H100—Whereas Bettering Exploration

EB Video games Canada opening large idea retailer in Montreal on October 16

Delta Electronics drives Business 5.0 transformation with good and sustainable options at ITAP 2025

QeRL: NVFP4-Quantized Reinforcement Studying (RL) Brings 32B LLM Coaching to a Single H100—Whereas Bettering Exploration

What's Hot

Constructing a Context-Folding LLM Agent for Lengthy-Horizon Reasoning with Reminiscence Compression and Software Use

Related Posts

Subscribe For Latest Updates