Skip to content

Commit c80d18f

Browse files
authored
fix: expose AI SDK tool metadata (e.g. toolCallId, abort signal) via … (#754)
* fix: expose AI SDK tool metadata (e.g. toolCallId, abort signal) via ToolExecuteOptions - #746 * feat: simplify tool execution API by merging OperationContext into ToolExecuteOptions * fix: unit tests * feat: encapsulate tool-specific metadata in toolContext * feat: encapsulate tool-specific metadata in toolContext + prevent AI SDK context collision * feat: add providerOptions support to tools for provider-specific feat… (#760) * feat: add providerOptions support to tools for provider-specific features * feat: add multi-modal tool results support with toModelOutput - #722 * fix: createToolExecutionFactory types
1 parent 7d64420 commit c80d18f

File tree

14 files changed

+716
-74
lines changed

14 files changed

+716
-74
lines changed

.changeset/all-zebras-grin.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
---
2+
"@voltagent/core": patch
3+
---
4+
5+
feat: encapsulate tool-specific metadata in toolContext + prevent AI SDK context collision
6+
7+
## Changes
8+
9+
### 1. Tool Context Encapsulation
10+
11+
Tool-specific metadata now organized under optional `toolContext` field for better separation and future-proofing.
12+
13+
**Migration:**
14+
15+
```typescript
16+
// Before
17+
execute: async ({ location }, options) => {
18+
// Fields were flat (planned, not released)
19+
};
20+
21+
// After
22+
execute: async ({ location }, options) => {
23+
const { name, callId, messages, abortSignal } = options?.toolContext || {};
24+
25+
// Session context remains flat
26+
const userId = options?.userId;
27+
const logger = options?.logger;
28+
const context = options?.context;
29+
};
30+
```
31+
32+
### 2. AI SDK Context Field Protection
33+
34+
Explicitly exclude `context` from being spread into AI SDK calls to prevent future naming collisions if AI SDK renames `experimental_context``context`.
35+
36+
## Benefits
37+
38+
- ✅ Better organization - tool metadata in one place
39+
- ✅ Clearer separation - session context vs tool context
40+
- ✅ Future-proof - easy to add new tool metadata fields
41+
- ✅ Namespace safety - no collision with OperationContext or AI SDK fields
42+
- ✅ Backward compatible - `toolContext` is optional for external callers (MCP servers)
43+
- ✅ Protected from AI SDK breaking changes

.changeset/lemon-shirts-look.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
---
2+
"@voltagent/core": patch
3+
---
4+
5+
feat: add multi-modal tool results support with toModelOutput - #722
6+
7+
Tools can now return images, media, and rich content to AI models using the `toModelOutput` function.
8+
9+
## The Problem
10+
11+
AI agents couldn't receive visual information from tools - everything had to be text or JSON. This limited use cases like:
12+
13+
- Computer use agents that need to see screenshots
14+
- Image analysis workflows
15+
- Visual debugging tools
16+
- Any tool that produces media output
17+
18+
## The Solution
19+
20+
Added `toModelOutput?: (output) => ToolResultOutput` to tool options. This function transforms your tool's output into a format the AI model can understand, including images and media.
21+
22+
```typescript
23+
import { createTool } from "@voltagent/core";
24+
import fs from "fs";
25+
26+
const screenshotTool = createTool({
27+
name: "take_screenshot",
28+
description: "Takes a screenshot of the screen",
29+
parameters: z.object({
30+
region: z.string().optional().describe("Region to capture"),
31+
}),
32+
execute: async ({ region }) => {
33+
const imageData = fs.readFileSync("./screenshot.png").toString("base64");
34+
return {
35+
type: "image",
36+
data: imageData,
37+
timestamp: new Date().toISOString(),
38+
};
39+
},
40+
toModelOutput: (result) => ({
41+
type: "content",
42+
value: [
43+
{ type: "text", text: `Screenshot captured at ${result.timestamp}` },
44+
{ type: "media", data: result.data, mediaType: "image/png" },
45+
],
46+
}),
47+
});
48+
```
49+
50+
## Return Formats
51+
52+
The `toModelOutput` function can return multiple formats:
53+
54+
**Text output:**
55+
56+
```typescript
57+
toModelOutput: (result) => ({
58+
type: "text",
59+
value: result.summary,
60+
});
61+
```
62+
63+
**JSON output:**
64+
65+
```typescript
66+
toModelOutput: (result) => ({
67+
type: "json",
68+
value: { status: "success", data: result },
69+
});
70+
```
71+
72+
**Multi-modal content (text + media):**
73+
74+
```typescript
75+
toModelOutput: (result) => ({
76+
type: "content",
77+
value: [
78+
{ type: "text", text: "Analysis complete" },
79+
{ type: "media", data: result.imageBase64, mediaType: "image/png" },
80+
],
81+
});
82+
```
83+
84+
**Error handling:**
85+
86+
```typescript
87+
toModelOutput: (result) => ({
88+
type: "error-text",
89+
value: result.errorMessage,
90+
});
91+
```
92+
93+
## Impact
94+
95+
- **Visual AI Workflows**: Build computer use agents that can see and interact with UIs
96+
- **Image Generation**: Tools can return generated images directly to the model
97+
- **Debugging**: Return screenshots and visual debugging information
98+
- **Rich Responses**: Combine text explanations with visual evidence
99+
100+
## Usage with Anthropic
101+
102+
```typescript
103+
const agent = createAgent({
104+
name: "visual-assistant",
105+
tools: [screenshotTool],
106+
model: anthropic("claude-3-5-sonnet-20241022"),
107+
});
108+
109+
const result = await agent.generateText({
110+
prompt: "Take a screenshot and describe what you see",
111+
});
112+
// Agent receives both text and image, can analyze the screenshot
113+
```
114+
115+
See [AI SDK documentation](https://sdk.vercel.ai/docs/ai-sdk-core/tools-and-tool-calling#multi-modal-tool-results) for more details on multi-modal tool results.

.changeset/polite-cups-wear.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
"@voltagent/core": patch
3+
---
4+
5+
feat: add providerOptions support to tools for provider-specific features - #759
6+
7+
Tools can now accept `providerOptions` to enable provider-specific features like Anthropic's cache control. This aligns VoltAgent tools with the AI SDK's tool API.
8+
9+
## The Problem
10+
11+
Users wanted to use provider-specific features like Anthropic's prompt caching to reduce costs and latency, but VoltAgent's `createTool()` didn't support the `providerOptions` field that AI SDK tools have.
12+
13+
## The Solution
14+
15+
**What Changed:**
16+
17+
- Added `providerOptions?: ProviderOptions` field to `ToolOptions` type
18+
- VoltAgent tools now accept and pass through provider options to the AI SDK
19+
- Supports all provider-specific features: cache control, reasoning settings, etc.
20+
21+
**What Gets Enabled:**
22+
23+
```typescript
24+
import { createTool } from "@voltagent/core";
25+
import { z } from "zod";
26+
27+
const cityAttractionsTool = createTool({
28+
name: "get_city_attractions",
29+
description: "Get tourist attractions for a city",
30+
parameters: z.object({
31+
city: z.string().describe("The city name"),
32+
}),
33+
providerOptions: {
34+
anthropic: {
35+
cacheControl: { type: "ephemeral" },
36+
},
37+
},
38+
execute: async ({ city }) => {
39+
return await fetchAttractions(city);
40+
},
41+
});
42+
```
43+
44+
## Impact
45+
46+
- **Cost Optimization:** Anthropic cache control reduces API costs for repeated tool calls
47+
- **Future-Proof:** Any new provider features work automatically
48+
- **Type-Safe:** Uses official AI SDK `ProviderOptions` type
49+
- **Zero Breaking Changes:** Optional field, fully backward compatible
50+
51+
## Usage
52+
53+
Use with any provider that supports provider-specific options:
54+
55+
```typescript
56+
const agent = new Agent({
57+
name: "Travel Assistant",
58+
model: anthropic("claude-3-5-sonnet"),
59+
tools: [cityAttractionsTool], // Tool with cacheControl enabled
60+
});
61+
62+
await agent.generateText("What are the top attractions in Paris?");
63+
// Tool definition cached by Anthropic for improved performance
64+
```
65+
66+
Learn more: [Anthropic Cache Control](https://ai-sdk.dev/providers/ai-sdk-providers/anthropic#cache-control)

packages/core/src/agent/agent.ts

Lines changed: 45 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
1-
import type { ModelMessage, ProviderOptions, SystemModelMessage } from "@ai-sdk/provider-utils";
1+
import type {
2+
ModelMessage,
3+
ProviderOptions,
4+
SystemModelMessage,
5+
ToolCallOptions,
6+
} from "@ai-sdk/provider-utils";
27
import type { Span } from "@opentelemetry/api";
38
import { SpanKind, SpanStatusCode } from "@opentelemetry/api";
49
import type { Logger } from "@voltagent/internal";
@@ -510,6 +515,7 @@ export class Agent {
510515
const {
511516
userId,
512517
conversationId,
518+
context, // Explicitly exclude to prevent collision with AI SDK's future 'context' field
513519
parentAgentId,
514520
parentOperationContext,
515521
hooks,
@@ -724,6 +730,7 @@ export class Agent {
724730
const {
725731
userId,
726732
conversationId,
733+
context, // Explicitly exclude to prevent collision with AI SDK's future 'context' field
727734
parentAgentId,
728735
parentOperationContext,
729736
hooks,
@@ -1191,6 +1198,7 @@ export class Agent {
11911198
const {
11921199
userId,
11931200
conversationId,
1201+
context, // Explicitly exclude to prevent collision with AI SDK's future 'context' field
11941202
parentAgentId,
11951203
parentOperationContext,
11961204
hooks,
@@ -1414,6 +1422,7 @@ export class Agent {
14141422
const {
14151423
userId,
14161424
conversationId,
1425+
context, // Explicitly exclude to prevent collision with AI SDK's future 'context' field
14171426
parentAgentId,
14181427
parentOperationContext,
14191428
hooks,
@@ -2691,9 +2700,23 @@ export class Agent {
26912700
private createToolExecutionFactory(
26922701
oc: OperationContext,
26932702
hooks: AgentHooks,
2694-
): (tool: BaseTool) => (args: any, options?: ToolExecuteOptions) => Promise<any> {
2695-
return (tool: BaseTool) => async (args: any, options?: ToolExecuteOptions) => {
2703+
): (tool: BaseTool) => (args: any, options?: ToolCallOptions) => Promise<any> {
2704+
return (tool: BaseTool) => async (args: any, options?: ToolCallOptions) => {
2705+
// AI SDK passes ToolCallOptions with fields: toolCallId, messages, abortSignal
26962706
const toolCallId = options?.toolCallId ?? randomUUID();
2707+
const messages = options?.messages ?? [];
2708+
const abortSignal = options?.abortSignal;
2709+
2710+
// Convert ToolCallOptions to ToolExecuteOptions by merging with OperationContext
2711+
const executionOptions: ToolExecuteOptions = {
2712+
...oc,
2713+
toolContext: {
2714+
name: tool.name,
2715+
callId: toolCallId,
2716+
messages: messages,
2717+
abortSignal: abortSignal,
2718+
},
2719+
};
26972720

26982721
// Event tracking now handled by OpenTelemetry spans
26992722
const toolSpan = oc.traceContext.createChildSpan(`tool.execution:${tool.name}`, "tool", {
@@ -2715,13 +2738,19 @@ export class Agent {
27152738
return await oc.traceContext.withSpan(toolSpan, async () => {
27162739
try {
27172740
// Call tool start hook - can throw ToolDeniedError
2718-
await hooks.onToolStart?.({ agent: this, tool, context: oc, args, options });
2741+
await hooks.onToolStart?.({
2742+
agent: this,
2743+
tool,
2744+
context: oc,
2745+
args,
2746+
options: executionOptions,
2747+
});
27192748

2720-
// Execute tool with OperationContext directly
2749+
// Execute tool with merged options
27212750
if (!tool.execute) {
27222751
throw new Error(`Tool ${tool.name} does not have "execute" method`);
27232752
}
2724-
const result = await tool.execute(args, oc, options);
2753+
const result = await tool.execute(args, executionOptions);
27252754
const validatedResult = await this.validateToolOutput(result, tool);
27262755

27272756
// End OTEL span
@@ -2736,7 +2765,7 @@ export class Agent {
27362765
output: validatedResult,
27372766
error: undefined,
27382767
context: oc,
2739-
options,
2768+
options: executionOptions,
27402769
});
27412770

27422771
return result;
@@ -2755,7 +2784,7 @@ export class Agent {
27552784
output: undefined,
27562785
error: errorResult as any,
27572786
context: oc,
2758-
options,
2787+
options: executionOptions,
27592788
});
27602789

27612790
if (isToolDeniedError(e)) {
@@ -3379,16 +3408,20 @@ export class Agent {
33793408
name: toolName,
33803409
description: toolDescription,
33813410
parameters: parametersSchema,
3382-
execute: async (args, context) => {
3411+
execute: async (args, options) => {
33833412
// Extract the prompt from args
33843413
const prompt = (args as any).prompt || args;
33853414

3415+
// Extract OperationContext from options if available
3416+
// Since ToolExecuteOptions extends Partial<OperationContext>, we can extract the fields
3417+
const oc = options as OperationContext | undefined;
3418+
33863419
// Generate response using this agent
33873420
const result = await this.generateText(prompt, {
33883421
// Pass through the operation context if available
3389-
parentOperationContext: context,
3390-
conversationId: context?.conversationId,
3391-
userId: context?.userId,
3422+
parentOperationContext: oc,
3423+
conversationId: options?.conversationId,
3424+
userId: options?.userId,
33923425
});
33933426

33943427
// Return the text result

0 commit comments

Comments
 (0)