Skip to content

Commit 27355ba

Browse files
dfokinaanakin87
andauthored
docs: add multimodal (image input) examples to chat generator docs (#10033)
* multimodal examples * Apply suggestions from code review Co-authored-by: Stefano Fiorucci <[email protected]> * standardize names and outputs --------- Co-authored-by: Stefano Fiorucci <[email protected]>
1 parent edd5a99 commit 27355ba

13 files changed

+272
-0
lines changed

docs-website/docs/pipeline-components/generators/amazonbedrockchatgenerator.mdx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,26 @@ response = generator.run(messages)
9696
print(response)
9797
```
9898

99+
With multimodal inputs:
100+
101+
```python
102+
from haystack.dataclasses import ChatMessage, ImageContent
103+
from haystack_integrations.components.generators.amazon_bedrock import AmazonBedrockChatGenerator
104+
105+
llm = AmazonBedrockChatGenerator(model="anthropic.claude-3-5-sonnet-20240620-v1:0")
106+
107+
image = ImageContent.from_file_path("apple.jpg")
108+
user_message = ChatMessage.from_user(content_parts=[
109+
"What does the image show? Max 5 words.",
110+
image
111+
])
112+
113+
response = llm.run([user_message])["replies"][0].text
114+
print(response)
115+
116+
# Red apple on straw mat.
117+
```
118+
99119
### In a pipeline
100120

101121
In a RAG pipeline:

docs-website/docs/pipeline-components/generators/anthropicchatgenerator.mdx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,26 @@ message = ChatMessage.from_user("What's Natural Language Processing? Be brief.")
148148
print(generator.run([message]))
149149
```
150150

151+
With multimodal inputs:
152+
153+
```python
154+
from haystack.dataclasses import ChatMessage, ImageContent
155+
from haystack_integrations.components.generators.anthropic import AnthropicChatGenerator
156+
157+
llm = AnthropicChatGenerator()
158+
159+
image = ImageContent.from_file_path("apple.jpg")
160+
user_message = ChatMessage.from_user(content_parts=[
161+
"What does the image show? Max 5 words.",
162+
image
163+
])
164+
165+
response = llm.run([user_message])["replies"][0].text
166+
print(response)
167+
168+
# Red apple on straw.
169+
```
170+
151171
### In a pipeline
152172

153173
You can also use `AnthropicChatGenerator`with the Anthropic chat models in your pipeline.

docs-website/docs/pipeline-components/generators/azureopenaichatgenerator.mdx

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,29 @@ response = client.run(
149149
print(response)
150150
```
151151

152+
With multimodal inputs:
153+
154+
```python
155+
from haystack.dataclasses import ChatMessage, ImageContent
156+
from haystack.components.generators.chat import AzureOpenAIChatGenerator
157+
158+
llm = AzureOpenAIChatGenerator(
159+
azure_endpoint="<Your Azure endpoint>",
160+
azure_deployment="gpt-4o-mini",
161+
)
162+
163+
image = ImageContent.from_file_path("apple.jpg", detail="low")
164+
user_message = ChatMessage.from_user(content_parts=[
165+
"What does the image show? Max 5 words.",
166+
image
167+
])
168+
169+
response = llm.run([user_message])["replies"][0].text
170+
print(response)
171+
172+
# Fresh red apple on straw.
173+
```
174+
152175
### In a pipeline
153176

154177
```python

docs-website/docs/pipeline-components/generators/coherechatgenerator.mdx

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,27 @@ message = ChatMessage.from_user("What's Natural Language Processing? Be brief.")
8787
print(generator.run([message]))
8888
```
8989

90+
With multimodal inputs:
91+
92+
```python
93+
from haystack.dataclasses import ChatMessage, ImageContent
94+
from haystack_integrations.components.generators.cohere import CohereChatGenerator
95+
96+
# Use a multimodal model like Command A Vision
97+
llm = CohereChatGenerator(model="command-a-vision-07-2025")
98+
99+
image = ImageContent.from_file_path("apple.jpg")
100+
user_message = ChatMessage.from_user(content_parts=[
101+
"What does the image show? Max 5 words.",
102+
image
103+
])
104+
105+
response = llm.run([user_message])["replies"][0].text
106+
print(response)
107+
108+
# Red apple on straw.
109+
```
110+
90111
#### In a Pipeline
91112

92113
You can also use `CohereChatGenerator` to use cohere chat models in your pipeline.

docs-website/docs/pipeline-components/generators/googlegenaichatgenerator.mdx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,26 @@ response = chat_generator.run(messages=messages)
128128
print(response["replies"][0].text)
129129
```
130130

131+
With multimodal inputs:
132+
133+
```python
134+
from haystack.dataclasses import ChatMessage, ImageContent
135+
from haystack_integrations.components.generators.google_genai import GoogleGenAIChatGenerator
136+
137+
llm = GoogleGenAIChatGenerator()
138+
139+
image = ImageContent.from_file_path("apple.jpg")
140+
user_message = ChatMessage.from_user(content_parts=[
141+
"What does the image show? Max 5 words.",
142+
image
143+
])
144+
145+
response = llm.run([user_message])["replies"][0].text
146+
print(response)
147+
148+
# Red apple on straw.
149+
```
150+
131151
You can also easily use function calls. First, define the function locally and convert into a [Tool](https://www.notion.so/docs/tool):
132152

133153
```python

docs-website/docs/pipeline-components/generators/llamacppchatgenerator.mdx

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,33 @@ messages = [ChatMessage.from_user("Who is the best American actor?")]
155155
result = generator.run(messages)
156156
```
157157

158+
### With multimodal (image + text) inputs
159+
160+
```python
161+
from haystack.dataclasses import ChatMessage, ImageContent
162+
from haystack_integrations.components.generators.llama_cpp import LlamaCppChatGenerator
163+
164+
# Initialize with multimodal support
165+
llm = LlamaCppChatGenerator(
166+
model="llava-v1.5-7b-q4_0.gguf",
167+
chat_handler_name="Llava15ChatHandler", # Use llava-1-5 handler
168+
model_clip_path="mmproj-model-f16.gguf", # CLIP model
169+
n_ctx=4096 # Larger context for image processing
170+
)
171+
llm.warm_up()
172+
173+
image = ImageContent.from_file_path("apple.jpg")
174+
user_message = ChatMessage.from_user(content_parts=[
175+
"What does the image show? Max 5 words.",
176+
image
177+
])
178+
179+
response = llm.run([user_message])["replies"][0].text
180+
print(response)
181+
182+
# Red apple on straw.
183+
```
184+
158185
The `generation_kwargs` can also be passed to the `run` method of the generator directly:
159186

160187
```python

docs-website/docs/pipeline-components/generators/metallamachatgenerator.mdx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,26 @@ response = llm.run(
117117
print("\n\n Model used: ", response["replies"][0].meta["model"])
118118
```
119119

120+
With multimodal inputs:
121+
122+
```python
123+
from haystack.dataclasses import ChatMessage, ImageContent
124+
from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator
125+
126+
llm = MetaLlamaChatGenerator(model="Llama-4-Scout-17B-16E-Instruct-FP8")
127+
128+
image = ImageContent.from_file_path("apple.jpg")
129+
user_message = ChatMessage.from_user(content_parts=[
130+
"What does the image show? Max 5 words.",
131+
image
132+
])
133+
134+
response = llm.run([user_message])["replies"][0].text
135+
print(response)
136+
137+
# Red apple on straw.
138+
```
139+
120140
### In a pipeline
121141

122142
```python

docs-website/docs/pipeline-components/generators/mistralchatgenerator.mdx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,26 @@ message = ChatMessage.from_user("What's Natural Language Processing? Be brief.")
9797
print(generator.run([message]))
9898
```
9999

100+
With multimodal inputs:
101+
102+
```python
103+
from haystack.dataclasses import ChatMessage, ImageContent
104+
from haystack_integrations.components.generators.mistral import MistralChatGenerator
105+
106+
llm = MistralChatGenerator(model="pixtral-12b-2409")
107+
108+
image = ImageContent.from_file_path("apple.jpg")
109+
user_message = ChatMessage.from_user(content_parts=[
110+
"What does the image show? Max 5 words.",
111+
image
112+
])
113+
114+
response = llm.run([user_message])["replies"][0].text
115+
print(response)
116+
117+
# Red apple on straw.
118+
```
119+
100120
#### In a Pipeline
101121

102122
Below is an example RAG Pipeline where we answer questions based on the URL contents. We add the contents of the URL into our `messages` in the `ChatPromptBuilder` and generate an answer with the `MistralChatGenerator`.

docs-website/docs/pipeline-components/generators/nvidiachatgenerator.mdx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,26 @@ print(result["replies"])
9292
print(result["meta"])
9393
```
9494

95+
With multimodal inputs:
96+
97+
```python
98+
from haystack.dataclasses import ChatMessage, ImageContent
99+
from haystack_integrations.components.generators.nvidia import NvidiaChatGenerator
100+
101+
llm = NvidiaChatGenerator(model="meta/llama-3.2-11b-vision-instruct")
102+
103+
image = ImageContent.from_file_path("apple.jpg")
104+
user_message = ChatMessage.from_user(content_parts=[
105+
"What does the image show? Max 5 words.",
106+
image
107+
])
108+
109+
response = llm.run([user_message])["replies"][0].text
110+
print(response)
111+
112+
# Red apple on straw.
113+
```
114+
95115
### In a Pipeline
96116

97117
```python

docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,26 @@ print(generator.run(messages=messages))
167167
}
168168
```
169169

170+
With multimodal inputs:
171+
172+
```python
173+
from haystack.dataclasses import ChatMessage, ImageContent
174+
from haystack_integrations.components.generators.ollama import OllamaChatGenerator
175+
176+
llm = OllamaChatGenerator(model="llava", url="http://localhost:11434")
177+
178+
image = ImageContent.from_file_path("apple.jpg")
179+
user_message = ChatMessage.from_user(content_parts=[
180+
"What does the image show? Max 5 words.",
181+
image
182+
])
183+
184+
response = llm.run([user_message])["replies"][0].text
185+
print(response)
186+
187+
# Red apple on straw.
188+
```
189+
170190
### In a Pipeline
171191

172192
```python

0 commit comments

Comments
 (0)