v2.15.0
⭐️ Highlights
Parallel Tool Calling for Faster Agents
ToolInvokernow processes all tool calls passed torunorrun_asyncin parallel using an internalThreadPoolExecutor. This improves performance by reducing the time spent on sequential tool invocations.- This parallel execution capability enables
ToolInvokerto batch and process multiple tool calls concurrently, allowing Agents to run complex pipelines efficiently with decreased latency. - You no longer need to pass an
async_executor.ToolInvokermanages its own executor, configurable via themax_workersparameter ininit.
Introducing LLMMessagesRouter
The new LLMMessagesRouter component that classifies and routes incoming ChatMessage objects to different connections using a generative LLM. This component can be used with general-purpose LLMs and with specialized LLMs for moderation like Llama Guard.
Usage example:
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.components.routers.llm_messages_router import LLMMessagesRouter
from haystack.dataclasses import ChatMessage
chat_generator = HuggingFaceAPIChatGenerator(api_type="serverless_inference_api", api_params={"model": "meta-llama/Llama-Guard-4-12B", "provider": "groq"}, )
router = LLMMessagesRouter(chat_generator=chat_generator, output_names=["unsafe", "safe"], output_patterns=["unsafe", "safe"])
print(router.run([ChatMessage.from_user("How to rob a bank?")]))New HuggingFaceTEIRanker Component
HuggingFaceTEIRanker enables end-to-end reranking via the Text Embeddings Inference (TEI) API. It supports both self-hosted TEI services and Hugging Face Inference Endpoints, giving you flexible, high-quality reranking out of the box.
🚀 New Features
-
Added a
ComponentInfodataclass to haystack to store information about the component. We pass it toStreamingChunkso we can tell from which component a stream is coming. -
Pass the
component_infoto theStreamingChunkin theOpenAIChatGenerator,AzureOpenAIChatGenerator,HuggingFaceAPIChatGenerator,HuggingFaceGenerator,HugginFaceLocalGeneratorandHuggingFaceLocalChatGenerator. -
Added the
enable_streaming_callback_passthroughto theinit,runandrun_asyncmethods ofToolInvoker. If set toTruetheToolInvokerwill try and pass thestreaming_callbackfunction to a tool's invoke method only if the tool's invoke method hasstreaming_callbackin its signature. -
Added dedicated
finish_reasonfield toStreamingChunkclass to improve type safety and enable sophisticated streaming UI logic. The field uses aFinishReasontype alias with standard values: "stop", "length", "tool_calls", "content_filter", plus Haystack-specific value "tool_call_results" (used by ToolInvoker to indicate tool execution completion). -
Updated
ToolInvokercomponent to use the newfinish_reasonfield when streaming tool results. The component now setsfinish_reason="tool_call_results"in the final streaming chunk to indicate that tool execution has completed, while maintaining backward compatibility by also setting the value inmeta["finish_reason"]. -
Added a
raise_on_failureboolean parameter toOpenAIDocumentEmbedderandAzureOpenAIDocumentEmbedder. If set toTruethen the component will raise an exception when there is an error with the API request. It is set toFalseby default so the previous behavior of logging an exception and continuing is still the default. -
Add
AsyncHFTokenStreamingHandlerfor async streaming support inHuggingFaceLocalChatGenerator -
For
HuggingFaceAPIGeneratorandHuggingFaceAPIChatGeneratorall additional key, value pairs passed inapi_paramsare now passed to the initializations of the underlying Inference Clients. This allows passing of additional parameters to the clients liketimeout,headers,provider, etc. This means we now can easily specify a different inference provider by passing theproviderkey inapi_params. -
Updated StreamingChunk to add the fields
tool_calls,tool_call_result,index, andstartto make it easier to format the stream in a streaming callback.- Added new dataclass
ToolCallDeltafor theStreamingChunk.tool_callsfield to reflect that the arguments can be a string delta. - Updated
print_streaming_chunkand_convert_streaming_chunks_to_chat_messageutility methods to use these new fields. This especially improves the formatting when usingprint_streaming_chunkwith Agent. - Updated
OpenAIGenerator,OpenAIChatGenerator,HuggingFaceAPIGenerator,HuggingFaceAPIChatGenerator,HuggingFaceLocalGeneratorandHuggingFaceLocalChatGeneratorto follow the new dataclasses. - Updated
ToolInvokerto follow the StreamingChunk dataclass.
- Added new dataclass
⚡️ Enhancement Notes
-
Added a new
deserialize_component_inplacefunction to handle generic component deserialization that works with any component type. -
Made doc-parser a core dependency since
ComponentToolthat uses it is one of the coreToolcomponents. -
Make the
PipelineBase().validate_inputmethod public so users can use it with the confidence that it won't receive breaking changes without warning. This method is useful for checking that all required connections in a pipeline have a connection and is automatically called in the run method of Pipeline. It is being exposed as public for users who would like to call this method before runtime to validate the pipeline. -
For component run Datadog tracing, set the span resource name to the component name instead of the operation name.
-
Added a
trust_remote_codeparameter to theSentenceTransformersSimilarityRankercomponent. When set to True, this enables execution of custom models and scripts hosted on the Hugging Face Hub. -
Add a new parameter
require_tool_call_idstoChatMessage.to_openai_dict_format. The default isTrue, for compatibility with OpenAI's Chat API: if theidfield is missing in a Tool Call, an error is raised. UsingFalseis useful for shallow OpenAI-compatible APIs, where theidfield is not required. -
Haystack's core modules are now "type complete", meaning that all function parameters and return types are explicitly annotated. This increases the usefulness of the newly added
py.typedmarker and sidesteps differences in type inference between the various type checker implementations. -
HuggingFaceAPIChatGeneratornow uses the util method_convert_streaming_chunks_to_chat_message. This is to help with being consistent for how we convertStreamingChunksinto a finalChatMessage.- If only system messages are provided as input a warning will be logged to the user indicating that this likely not intended and that they should probably also provide user messages.
⚠️ Deprecation Notes
async_executorparameter inToolInvokeris deprecated in favor ofmax_workersparameter and will be removed in Haystack 2.16.0. You can usemax_workersparameter to control the number of threads used for parallel tool calling.
🐛 Bug Fixes
- Fixed the
to_dictandfrom_dictofToolInvokerto properly serialize thestreaming_callbackinit parameter. - Fix bug where if
raise_on_failure=Falseand an error occurs mid-batch that the following embeddings would be paired with the wrong documents. - Fix component_invoker used by
ComponentToolto work when a dataclass likeChatMessageis directly passed tocomponent_tool.invoke(...). Previously this would either cause an error or silently skip your input. - Fixed a bug in the
LLMMetadataExtractorthat occurred when processingDocumentobjects withNoneor empty string content. The component now gracefully handles these cases by marking such documents as failed and providing an appropriate error message in their metadata, without attempting an LLM call. - RecursiveDocumentSplitter now generates a unique
Document.idfor every chunk. The meta fields (split_id,parent_id, etc.) are populated beforeDocumentcreation, so the hash used foridgeneration is always unique. - In
ConditionalRouterfixed theto_dictandfrom_dictmethods to properly handle the case whenoutput_typeis aListof types or aListof strings. This occurs when a user specifies a route inConditionalRouterto have multiple outputs. - Fix serialization of
GeneratedAnswerwhenChatMessageobjects are nested inmeta. - Fix the serialization of
ComponentToolandToolwhen specifyingoutputs_to_string. Previously an error occurred on deserialization right after serializing if outputs_to_string is not None. - When calling
set_output_typeswe now also check that the decorator@component.output_typesis not present on therun_asyncmethod of aComponent. Previously we only checked that the Component.run method did not possess the decorator. - Fix type comparison in schema validation by replacing
is notwith!=when checking the typeList[ChatMessage]. This prevents false mismatches due to Python'sisoperator comparing object identity instead of equality. - Re-export symbols in
__init__.pyfiles. This ensures that short imports likefrom haystack.components.builders import ChatPromptBuilderwork equivalently tofrom haystack.components.builders.chat_prompt_builder import ChatPromptBuilder, without causing errors or warnings in mypy/Pylance. - The
SuperComponentclass can now correctly serialize and deserialize aSuperComponentbased on an async pipeline. Previously, theSuperComponentclass always assumed the underlying pipeline was synchronous. - Fixed a bug in
OpenAIDocumentEmbedderandAzureOpenAIDocumentEmbedderwhere if an OpenAI API error occurred mid-batch then the following embeddings would be paired with the wrong documents.