Have you ever felt the frustration of getting an LLM response that's almost what you need, but still requires a lot of manual work to transform it into something usable? You're not alone. We've all been in a situation where you ask a language model a question, and it spits out a string of text. It looks promising, but the real work is just beginning. You have to manually parse that text, extract the relevant information, and then shape it into a format that your application can understand. It's a tedious and error-prone process, especially when dealing with complex queries or multiple responses.
If you've tried using JSON prompts, you might think you've found a solution. While JSON provides a structured format, it's still a string that needs to be parsed into usable objects. It's better than raw text, but it's far from ideal. Wouldn't it be great if you could get direct, structured responses from your LLM?
Imagine receiving Python objects – dictionaries, lists, or custom classes – straight from your language model, ready to be used in your application. No more string manipulation, no more parsing headaches. With LangChain's with_structured_output
, this is now possible. In this blog, I'll show you how to use with_structured_output
to simplify your workflow and eliminate the need for manual parsing. You'll be able to focus on building better applications, faster.
What is with_structured_output
?
At its core, with_structured_output
is a feature in LangChain that allows you to receive structured data directly from your LLM as objects. Instead of the typical string-based responses, which often require extra parsing and manipulation, LangChain’s with_structured_output
ensures that the data comes back in a usable, object-based format—without you having to write additional code.
In simple terms, with_structured_output
transforms LLM responses from raw text into structured objects (such as dictionaries, lists, or custom classes). This means you don’t need to write additional code to convert the output into something meaningful. The data is ready for use right out of the box.
For example, let’s say you’re building an application that needs to extract certain fields like a person’s name, age, and address.
Without with_structured_output
:
You might get a string response that you have to parse manually and convert it into something structured, like a dictionary or a class instance.
# The LLM return a string that needs parsing
llm_response = "Name: John, Age: 25, Address: 123 Main St"
With with_structured_output
:
When using with_structured_output
, the LLM returns the data directly as an instance of a Person class, like so:\
# The LLM will return a Person object directly:
llm_response = Person(name="John", age=25, address="123 Main St")
Why its better than JSON response?
Many developers use JSON prompts to structure LLM responses. For instance, they might ask the model to return output in a specific JSON format by including an example in the prompt. While this approach works, it has some major limitations:
- Parsing hassles: JSON outputs are still strings. You need to parse them into Python objects (e.g., dictionaries or classes) before they can be used, adding extra steps and complexity.
- Error-prone: LLMs may occasionally generate invalid JSON due to their probabilistic nature. Missing commas, unmatched brackets, or malformed structures can break your parsing code.
- Inconsistent keys: Without strict enforcement, the keys in JSON outputs might vary slightly (e.g., first_name vs. firstname), leading to errors in your application.
LangChain’s with_structured_output
takes JSON prompts to the next level. Instead of generating raw strings, it uses Python’s native class structure to ensure that the output is returned as valid objects, eliminating the need for manual parsing or validation.
Here’s why LangChain's with_structured_output
is a game-changer:
- Direct object output: The LLM directly returns an object (e.g., an instance of a class like Person), ready to use. No parsing required.
- Error-free: You avoid issues with malformed JSON. The structure is predefined, and the LLM adheres to it.
- Easier debugging: Working with objects is easier to debug and more intuitive compared to string-based JSON.
- Clean code: Your application logic becomes cleaner because you’re directly working with objects instead of processing and converting strings.
The bottom line is that JSON prompts are a good workaround, but with_structured_output
is a more robust, reliable, and developer-friendly solution for getting structured, object-based data directly from your LLM.
How does with_structured_output
work?
LangChain’s with_structured_output
feature leverages schema-based validation to ensure that the output from the LLM matches a predefined structure. Instead of returning a free-form string or even loosely formatted JSON, the model adheres to a strict schema and directly returns a Python object.
Example: Workout assistant
Imagine an AI that specializes in creating a workout plan. A user asks for a specific plan, and the model provides the response as a structured Python object, perfect for immediate use in applications like workout assistants or workout websites.
With with_structured_output
, the process is seamless, ensuring that the response is accurate and follows a predefined structure. Let’s break it down into three simple steps:
Step 1 - Define the Schema
The first step is to define a schema for the expected structured output. This is done by creating a Python class using Pydantic, which serves as the blueprint for the data. For a workout plan assistant, the schema might look like this:
from pydantic import BaseModel, Field
from typing import List
class WorkoutPlan(BaseModel):
name: str = Field(description="Name of the workout plan")
duration_weeks: int = Field(description="Duration of the plan in weeks")
workouts: List[str] = Field(description="List of workouts for each day")
goals: List[str] = Field(description="Fitness goals for the plan")
equipment_needed: List[str] = Field(description="List of required equipment")
This schema serves two purposes: -
• It tells the LLM the structure you want.
• It ensures the data returned is valid and adheres to this structure.
Step 2 - Setup with_structured_output
Next, we use LangChain’s with_structured_output
to bind the schema to the LLM. This ensures that the LLM returns a WorkoutPlan object directly, without needing to embed JSON examples in the prompt.
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain_core.runnables.base import RunnableSequence
# Initialize the LLM
model = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
# Enable structured output using the WorkoutPlan schema
structured_llm = model.with_structured_output(WorkoutPlan)
# Define the prompt template
prompt = """
You are a fitness expert knowledgeable in creating workout plans. Your task is to:
1. Provide a personalized workout plan depending on the user’s needs.
2. Do not answer questions unrelated to workouts.
Question: {question}
"""
# Create a chain with the prompt and structured LLM
prompt_template = PromptTemplate(template=prompt, input_variables=["question"])
chain = RunnableSequence(prompt_template | structured_llm)
This setup ensures the model understands both the task and the required output format.
Step 3 - Run chain
Now, let’s ask the AI for a specific workout plan.
# User query
question = "Can you create a 4-week workout plan for weight loss?"
# Invoke the chain to get the structured response
workout_plan: WorkoutPlan = chain.invoke({"question": question})
Output
The response from the LLM is directly a WorkoutPlan object:
print(workout_plan)
"""
Output:
WorkoutPlan(
name="4-Week Weight Loss Plan",
duration_weeks=4,
workouts=[
"Day 1: Full Body Strength Training",
"Day 2: 30-minute Cardio",
"Day 3: HIIT Workout",
"Day 4: Rest",
"Day 5: Lower Body Focus",
"Day 6: 45-minute Jogging",
"Day 7: Rest",
],
goals=[
"Lose weight",
"Increase endurance",
"Build muscle",
],
equipment_needed=[
"Dumbbells",
"Exercise mat",
"Resistance bands",
"Running shoes",
],
)
"""
This example illustrates how to create a structured response for a fitness workout plan, allowing for easy integration into various applications focused on health and fitness.
Streamline your workflows with confidence
LangChain's with_structured_output
isn't just a tool; it's a game-changer for developers and businesses looking to streamline their workflows. By eliminating the need to parse raw text or format messy JSON, this feature ensures you get clean, structured data directly from your LLM—saving time, reducing errors, and improving the overall efficiency of your applications. Ready to Leverage the Power of Structured Outputs? The engineers at Opcito are experts in LLM technology and can help you implement with_structured_output
effectively.