The Limitation I Hit with Spring AI Tools
While building an AI product with Spring AI, I ran into a frustrating limitation:
When tools are configured with returnDirect = false, the model consumes the tool result internally, but the frontend mostly receives generated text.
That means the model sees structured data, but your UI does not.
I did not want another plain text chatbot response. I wanted renderable UI blocks - weather cards, profile panels, and structured components that can be displayed directly in the client.
If you are building an AI chatbot with Spring AI and SSE, this is one of the most important architecture gaps to solve early.
The Core Problem
The flow looked like this:
- LLM calls tool
- Tool returns structured data
- LLM uses data for reasoning
- Client receives only text
So the useful structured result gets hidden inside the model pipeline.
For real products, this is a blocker. Interactive UX needs structured events, not only natural language.
The Approach I Used
Instead of forcing the model to expose tool results back, I treated tool outputs as system-level data and captured them directly.
I wrapped each tool with a custom callback:
@Override
public String call(String toolInput, ToolContext toolContext) {
String result = delegate.call(toolInput, toolContext);
String requestId = (String) toolContext.getContext().get("requestId");
String toolName = getToolDefinition().name();
log.debug("Tool input schema for {} is {}", toolName, this.getToolDefinition().inputSchema());
if (requestId != null) {
toolResultStore.put(requestId, toolName, this.gson.fromJson(result, JsonObject.class));
}
return result;
}
The model flow stays untouched. Tools execute normally.
But after each tool execution, I intercept and persist structured results in Redis by requestId.
Streaming Text + UI Events
The response stream still goes to the client via SSE as usual.
After streaming completes, I fetch all tool results for that request, transform them into UI events, and send those events to the frontend.
So the client can render:
- streamed assistant text
- structured widgets generated from tool outputs
This creates a more agentic UI pattern: model reasoning and UI rendering stay decoupled but synchronized.
Important Detail: Multiple Calls to the Same Tool
I store each tool execution with a unique identifier.
That prevents overwriting when the same tool is called multiple times in one request flow.
Without this, you lose intermediate outputs and UI consistency breaks.
With unique tool-call persistence, repeated calls can still produce correct UI rendering.
Why This Matters
If you are building production AI apps, you will likely hit this limitation.
The fix is usually not “prompt harder” or “ask model for better formatting.”
The fix is controlling orchestration:
- capture structured tool outputs yourself
- stream conversational text separately
- emit UI events from reliable backend state
That is how you move from chatbot output to real product UX.
Next Improvement
A natural next step is letting the model explicitly signal when to render a UI component during streaming, not only after the stream ends.
That would allow widgets to appear exactly when the assistant references them, creating a more fluid and interactive experience.
Further Reading and References
Internal links
- Building Production-Ready Chatbots on Top of ChatGPT
- Building AI-Powered MVPs
- AI Chatbot Development Services
- Contact for AI Product Development
External references
Original post inspiration: How I Built Agentic UI on Top of Spring AI Tool Results
