YouTip LogoYouTip

Langchain Wrap Model Call

@wrap_model_call is one of the most powerful hooks in Middleware. Unlike before/after which only observe, @wrap_model_call can completely control the model's execution process β€” retry, fallback, cache, or even skip the model and use preset responses. * * * ## Understanding the handler callback The core of @wrap_model_call is a handler callback function. Calling handler(request) will actually execute the model; not calling it will skip the model. ## Example # Basic structure of wrap_model_call # request: contains all information such as model, messages, tools # handler: a callable object, executing it will actually call the model @wrap_model_call def my_middleware(request, handler): # Can do anything before model call print("Model is about to be called...") # Calling handler(request) actually executes the model response = handler(request) # Can do anything after model call print("Model call completed") return response * * * ## Scenario 1: Retry Mechanism This is the most common scenario β€” model calls may fail due to network issues, automatic retry can improve reliability: ## Example from dotenv import load_dotenv load_dotenv() from langchain.agents import create_agent from langchain.agents.middleware import wrap_model_call from langchain.chat_models import init_chat_model from langchain.messages import HumanMessage @wrap_model_call def retry_on_error(request, handler): """Automatically retry on model call failure, up to 3 times""" max_retries =3 last_error =None for attempt in range(max_retries): try: result = handler(request) if attempt >0: print(f" Attempt {attempt + 1}") return result except Exception as e: last_error = e if attempt request.override() is an immutable method β€” it returns a new copy of the request, does not modify the original object. This ensures each call is independent and safe. * * * ## Scenario 3: Caching Model Responses For repeated queries, you can cache model responses to reduce API call costs: ## Example from langchain.agents.middleware import wrap_model_call from langchain.messages import AIMessage # Simple in-memory cache cache ={} @wrap_model_call def cache_responses(request, handler): """Cache model responses, don't call model for same questions""" # Use the last user message content as cache key messages = request.messages if not messages: return handler(request) # Generate cache key last_content =str(messages.content)if hasattr(messages,'content')else"" cache_key = last_content[:200]# Truncate too long content # Check cache if cache_key in cache: print(f" Returning cached result directly") cached = cache return AIMessage(content=f"{cached}nn*(From cache)*") # Cache miss, call model result = handler(request) # AIMessage content may be in a list, get the first one directly if hasattr(result,'content'): cache= result.content print(f" Stored in cache, currently {len(cache)} items") elif hasattr(result,'model_response'): # For ExtendedModelResponse pass return result model = init_chat_model("deepseek:deepseek-v4-flash", temperature=0) agent = create_agent( model=model, middleware=, system_prompt="You are Tutorial's assistant. Keep answers concise.", ) # First query (cache miss) result = agent.invoke({ "messages": [{"role": "user","content": "What is Python?"}] }) print(f"First time: {result['messages'].content[:80]}...n") # Second query with same question (cache hit) result = agent.invoke({ "messages": [{"role": "user","content": "What is Python?"}] }) print(f"Second time: {result['messages'].content[:80]}...") Running result: Stored in cache, currently 1 itemsFirst time: Python is a high-level programming language known for its concise and readable syntax... Returning cached result directlySecond time: Python is a high-level programming language known for its concise and readable syntax...*(From cache)* * * * ## Scenario 4: Modifying request β€” Dynamically Injecting System Messages ## Example from datetime import datetime from langchain.agents.middleware import wrap_model_call from langchain.messages import SystemMessage
← Langchain Before After AgentLangchain Middleware Concepts β†’