docs: add multimodal (image input) examples to chat generator docs (#10033)

dfokina · anakin87 · web-flow · commit 27355ba0b158 · 2025-11-10T17:30:33.000+01:00
* multimodal examples

* Apply suggestions from code review

Co-authored-by: Stefano Fiorucci &lt;stefanofiorucci@gmail.com&gt;

* standardize names and outputs

---------

Co-authored-by: Stefano Fiorucci &lt;stefanofiorucci@gmail.com&gt;
diff --git a/docs-website/docs/pipeline-components/generators/amazonbedrockchatgenerator.mdx b/docs-website/docs/pipeline-components/generators/amazonbedrockchatgenerator.mdx
@@ -96,6 +96,26 @@ response = generator.run(messages)
 print(response)
 ```
 
+With multimodal inputs:
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack_integrations.components.generators.amazon_bedrock import AmazonBedrockChatGenerator
+
+llm = AmazonBedrockChatGenerator(model="anthropic.claude-3-5-sonnet-20240620-v1:0")
+
+image = ImageContent.from_file_path("apple.jpg")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Red apple on straw mat.
+```
+
 ### In a pipeline
 
 In a RAG pipeline:
diff --git a/docs-website/docs/pipeline-components/generators/anthropicchatgenerator.mdx b/docs-website/docs/pipeline-components/generators/anthropicchatgenerator.mdx
@@ -148,6 +148,26 @@ message = ChatMessage.from_user("What's Natural Language Processing? Be brief.")
 print(generator.run([message]))
 ```
 
+With multimodal inputs:
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack_integrations.components.generators.anthropic import AnthropicChatGenerator
+
+llm = AnthropicChatGenerator()
+
+image = ImageContent.from_file_path("apple.jpg")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Red apple on straw.
+```
+
 ### In a pipeline
 
 You can also use `AnthropicChatGenerator`with the Anthropic chat models in your pipeline.
diff --git a/docs-website/docs/pipeline-components/generators/azureopenaichatgenerator.mdx b/docs-website/docs/pipeline-components/generators/azureopenaichatgenerator.mdx
@@ -149,6 +149,29 @@ response = client.run(
 print(response)
 ```
 
+With multimodal inputs:
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack.components.generators.chat import AzureOpenAIChatGenerator
+
+llm = AzureOpenAIChatGenerator(
+    azure_endpoint="<Your Azure endpoint>",
+    azure_deployment="gpt-4o-mini",
+)
+
+image = ImageContent.from_file_path("apple.jpg", detail="low")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Fresh red apple on straw.
+```
+
 ### In a pipeline
 
 ```python
diff --git a/docs-website/docs/pipeline-components/generators/coherechatgenerator.mdx b/docs-website/docs/pipeline-components/generators/coherechatgenerator.mdx
@@ -87,6 +87,27 @@ message = ChatMessage.from_user("What's Natural Language Processing? Be brief.")
 print(generator.run([message]))
 ```
 
+With multimodal inputs:
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack_integrations.components.generators.cohere import CohereChatGenerator
+
+# Use a multimodal model like Command A Vision
+llm = CohereChatGenerator(model="command-a-vision-07-2025")
+
+image = ImageContent.from_file_path("apple.jpg")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Red apple on straw.
+```
+
 #### In a Pipeline
 
 You can also use `CohereChatGenerator` to use cohere chat models in your pipeline.
diff --git a/docs-website/docs/pipeline-components/generators/googlegenaichatgenerator.mdx b/docs-website/docs/pipeline-components/generators/googlegenaichatgenerator.mdx
@@ -128,6 +128,26 @@ response = chat_generator.run(messages=messages)
 print(response["replies"][0].text)
 ```
 
+With multimodal inputs:
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack_integrations.components.generators.google_genai import GoogleGenAIChatGenerator
+
+llm = GoogleGenAIChatGenerator()
+
+image = ImageContent.from_file_path("apple.jpg")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Red apple on straw.
+```
+
 You can also easily use function calls. First, define the function locally and convert into a [Tool](https://www.notion.so/docs/tool):
 
 ```python
diff --git a/docs-website/docs/pipeline-components/generators/llamacppchatgenerator.mdx b/docs-website/docs/pipeline-components/generators/llamacppchatgenerator.mdx
@@ -155,6 +155,33 @@ messages = [ChatMessage.from_user("Who is the best American actor?")]
 result = generator.run(messages)
 ```
 
+### With multimodal (image + text) inputs
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack_integrations.components.generators.llama_cpp import LlamaCppChatGenerator
+
+# Initialize with multimodal support
+llm = LlamaCppChatGenerator(
+    model="llava-v1.5-7b-q4_0.gguf",
+    chat_handler_name="Llava15ChatHandler",  # Use llava-1-5 handler
+    model_clip_path="mmproj-model-f16.gguf",  # CLIP model
+    n_ctx=4096  # Larger context for image processing
+)
+llm.warm_up()
+
+image = ImageContent.from_file_path("apple.jpg")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Red apple on straw.
+```
+
 The `generation_kwargs` can also be passed to the `run` method of the generator directly:
 
 ```python
diff --git a/docs-website/docs/pipeline-components/generators/metallamachatgenerator.mdx b/docs-website/docs/pipeline-components/generators/metallamachatgenerator.mdx
@@ -117,6 +117,26 @@ response = llm.run(
 print("\n\n Model used: ", response["replies"][0].meta["model"])
 ```
 
+With multimodal inputs:
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack_integrations.components.generators.meta_llama import MetaLlamaChatGenerator
+
+llm = MetaLlamaChatGenerator(model="Llama-4-Scout-17B-16E-Instruct-FP8")
+
+image = ImageContent.from_file_path("apple.jpg")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Red apple on straw.
+```
+
 ### In a pipeline
 
 ```python
diff --git a/docs-website/docs/pipeline-components/generators/mistralchatgenerator.mdx b/docs-website/docs/pipeline-components/generators/mistralchatgenerator.mdx
@@ -97,6 +97,26 @@ message = ChatMessage.from_user("What's Natural Language Processing? Be brief.")
 print(generator.run([message]))
 ```
 
+With multimodal inputs:
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack_integrations.components.generators.mistral import MistralChatGenerator
+
+llm = MistralChatGenerator(model="pixtral-12b-2409")
+
+image = ImageContent.from_file_path("apple.jpg")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Red apple on straw.
+```
+
 #### In a Pipeline
 
 Below is an example RAG Pipeline where we answer questions based on the URL contents. We add the contents of the URL into our `messages` in the `ChatPromptBuilder` and generate an answer with the `MistralChatGenerator`.
diff --git a/docs-website/docs/pipeline-components/generators/nvidiachatgenerator.mdx b/docs-website/docs/pipeline-components/generators/nvidiachatgenerator.mdx
@@ -92,6 +92,26 @@ print(result["replies"])
 print(result["meta"])
 ```
 
+With multimodal inputs:
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack_integrations.components.generators.nvidia import NvidiaChatGenerator
+
+llm = NvidiaChatGenerator(model="meta/llama-3.2-11b-vision-instruct")
+
+image = ImageContent.from_file_path("apple.jpg")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Red apple on straw.
+```
+
 ### In a Pipeline
 
 ```python
diff --git a/docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx b/docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx
@@ -167,6 +167,26 @@ print(generator.run(messages=messages))
 }
 ```
 
+With multimodal inputs:
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack_integrations.components.generators.ollama import OllamaChatGenerator
+
+llm = OllamaChatGenerator(model="llava", url="http://localhost:11434")
+
+image = ImageContent.from_file_path("apple.jpg")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Red apple on straw.
+```
+
 ### In a Pipeline
 
 ```python
diff --git a/docs-website/docs/pipeline-components/generators/openrouterchatgenerator.mdx b/docs-website/docs/pipeline-components/generators/openrouterchatgenerator.mdx
@@ -108,6 +108,26 @@ response = client.run(
 print("\n\n Model used: ", response["replies"][0].meta["model"])
 ```
 
+With multimodal inputs:
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack_integrations.components.generators.openrouter import OpenRouterChatGenerator
+
+llm = OpenRouterChatGenerator(model="anthropic/claude-3-5-sonnet")
+
+image = ImageContent.from_file_path("apple.jpg")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Red apple on straw.
+```
+
 ### In a pipeline
 
 ```python
diff --git a/docs-website/docs/pipeline-components/generators/stackitchatgenerator.mdx b/docs-website/docs/pipeline-components/generators/stackitchatgenerator.mdx
@@ -63,6 +63,26 @@ result = generator.run([ChatMessage.from_user("Tell me a joke.")])
 print(result)
 ```
 
+With multimodal inputs:
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack_integrations.components.generators.stackit import STACKITChatGenerator
+
+llm = STACKITChatGenerator(model="meta-llama/Llama-3.2-11B-Vision-Instruct")
+
+image = ImageContent.from_file_path("apple.jpg")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Red apple on straw.
+```
+
 ### In a pipeline
 
 You can also use `STACKITChatGenerator` in your pipeline.
diff --git a/docs-website/docs/pipeline-components/generators/watsonxchatgenerator.mdx b/docs-website/docs/pipeline-components/generators/watsonxchatgenerator.mdx
@@ -64,6 +64,27 @@ message = ChatMessage.from_user("What's Natural Language Processing? Be brief.")
 print(generator.run([message]))
 ```
 
+With multimodal inputs:
+
+```python
+from haystack.dataclasses import ChatMessage, ImageContent
+from haystack_integrations.components.generators.watsonx.chat.chat_generator import WatsonxChatGenerator
+
+# Use a multimodal model
+llm = WatsonxChatGenerator(model="meta-llama/llama-3-2-11b-vision-instruct")
+
+image = ImageContent.from_file_path("apple.jpg")
+user_message = ChatMessage.from_user(content_parts=[
+	"What does the image show? Max 5 words.",
+	image
+	])
+
+response = llm.run([user_message])["replies"][0].text
+print(response)
+
+# Red apple on straw.
+```
+
 #### In a Pipeline
 
 You can also use `WatsonxChatGenerator` to use IBM watsonx.ai chat models in your pipeline.