-
Notifications
You must be signed in to change notification settings - Fork 841
Introduce SectionChunker #7015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Introduce SectionChunker #7015
Conversation
| foreach (var chunk in chunks) | ||
| { | ||
| yield return chunk; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Won't chunks always be empty here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will, I've missed it in the initial version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new SectionChunker class for the Microsoft.Extensions.DataIngestion library, which treats each IngestionDocumentSection as a separate entity for chunking. The implementation handles nested sections and maintains header context across chunks.
- Adds
SectionChunkerclass to support section-based document chunking - Creates comprehensive test coverage for the new chunker through
SectionChunkerTests - Introduces a base test class
DocumentChunkerTestsfor shared chunker test scenarios
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/Libraries/Microsoft.Extensions.DataIngestion/Chunkers/SectionChunker.cs |
Implements the new SectionChunker class with support for nested sections and header context preservation |
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Chunkers/SectionChunkerTests.cs |
Provides comprehensive test coverage including single/multiple sections, nested sections, size limits, and headers |
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Chunkers/DocumentChunkerTests.cs |
Defines abstract base test class with common test scenarios for all document chunkers |
|
|
||
| private void Process(IngestionDocument document, IngestionDocumentSection section, List<IngestionChunk<string>> chunks, string? parentContext = null) | ||
| { | ||
| List<IngestionDocumentElement> elements = new(section.Elements.Count); |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final loop (lines 46-49) is unreachable dead code. The chunks list is cleared after each section iteration (line 43), so it will always be empty when this loop executes. This code should be removed.
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Chunkers/SectionChunkerTests.cs
Outdated
Show resolved
Hide resolved
…rs/SectionChunkerTests.cs Co-authored-by: Copilot <[email protected]>
Microsoft Reviewers: Open in CodeFlow