-
Notifications
You must be signed in to change notification settings - Fork 10
fix: transactional savings in Autopipeline & nnUNetPipeline #415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughImplements atomic, two-phase save workflows in nnUNetOutput and SampleOutput by staging writes to a temporary directory, then moving to final destinations. Updates nnUNetOutput.call to return AnnotatedPathSequence, adds broader error collection, ensures directory creation, and introduces a temp writer. SampleOutput adopts batch-level atomic error handling. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
src/imgtools/io/sample_output.py (2)
195-195: Consider using a more secure temporary directory prefixThe prefix
.tmp_sample_starts with a dot, making these directories hidden on Unix-like systems. While this can reduce clutter, it may make debugging harder if cleanup fails. Consider using a more descriptive prefix without the leading dot, such astmp_sample_atomic_.- temp_dir = Path(tempfile.mkdtemp(prefix=".tmp_sample_", dir=self.writer.root_directory)) + temp_dir = Path(tempfile.mkdtemp(prefix="tmp_sample_atomic_", dir=self.writer.root_directory))
243-248: Consider preserving the original exception typeWhile catching all exceptions ensures cleanup happens, the generic error message loses valuable debugging context. Consider preserving the original exception type and details in the error message.
except Exception as e: - errmsg = f"Failed to save sample atomically: {e}" + errmsg = f"Failed to save sample atomically: {type(e).__name__}: {e}" image_context = data[0] if data else Nonesrc/imgtools/io/nnunet_output.py (3)
345-345: Use consistent temporary directory namingFor consistency with
sample_output.py, consider using a similar prefix pattern without the leading dot for better visibility during debugging.- temp_dir = Path(tempfile.mkdtemp(prefix=".tmp_nnunet_", dir=self.writer.root_directory)) + temp_dir = Path(tempfile.mkdtemp(prefix="tmp_nnunet_atomic_", dir=self.writer.root_directory))
393-394: Type checking could be more specificThe error message mentions "Expected Scan or VectorMask" but the actual logic only processes Scan images here (VectorMask is handled earlier). Consider making the error message more accurate or restructuring the logic.
elif not isinstance(image, VectorMask): - raise TypeError(f"Unsupported image type: {type(image)}. Expected Scan or VectorMask.") + # VectorMask already processed above, only Scan expected here + logger.debug(f"Skipping non-Scan image of type {type(image).__name__}")
402-407: Preserve exception details for better debuggingSimilar to the comment in sample_output.py, consider preserving the exception type in the error message for better debugging context.
except Exception as e: - errmsg = f"Failed to save nnUNet sample atomically: {e}" + errmsg = f"Failed to save nnUNet sample atomically: {type(e).__name__}: {e}" image_context = data[0] if data else None
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/imgtools/io/nnunet_output.py(4 hunks)src/imgtools/io/sample_output.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.py
⚙️ CodeRabbit Configuration File
Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure the following: - Variables and functions follow meaningful naming conventions. - Docstrings are present, accurate, and align with the implementation. - Code is efficient and avoids redundancy while adhering to DRY principles. - Consider suggestions to enhance readability and maintainability. - Highlight any potential performance issues, edge cases, or logical errors. - Ensure all imported libraries are used and necessary.
Files:
src/imgtools/io/sample_output.pysrc/imgtools/io/nnunet_output.py
🧠 Learnings (1)
📚 Learning: 2024-11-29T21:18:38.153Z
Learnt from: jjjermiah
PR: bhklab/med-imagetools#145
File: src/imgtools/utils/nnunet.py:0-0
Timestamp: 2024-11-29T21:18:38.153Z
Learning: Suggestions to modify the `save_json` function in `src/imgtools/utils/nnunet.py` to fix type annotations or add error handling are considered out of scope.
Applied to files:
src/imgtools/io/nnunet_output.py
🧬 Code Graph Analysis (2)
src/imgtools/io/sample_output.py (5)
src/imgtools/io/writers/nifti_writer.py (2)
NIFTIWriter(34-224)save(135-224)src/imgtools/io/writers/abstract_base_writer.py (3)
ExistingFileMode(57-79)AbstractBaseWriter(90-601)save(205-216)src/imgtools/coretypes/base_masks.py (2)
VectorMask(42-470)iter_masks(190-203)src/imgtools/coretypes/base_medimage.py (1)
MedImage(21-235)src/imgtools/autopipeline.py (1)
process_one_sample(162-278)
src/imgtools/io/nnunet_output.py (5)
src/imgtools/io/sample_output.py (3)
FailedToSaveSingleImageError(29-34)AnnotatedPathSequence(37-76)writer(165-169)src/imgtools/io/writers/nifti_writer.py (1)
NIFTIWriter(34-224)src/imgtools/io/writers/abstract_base_writer.py (1)
ExistingFileMode(57-79)src/imgtools/coretypes/base_masks.py (4)
to_label_image(285-312)to_sparse_mask(263-283)to_region_mask(314-354)VectorMask(42-470)src/imgtools/coretypes/imagetypes/scan.py (1)
Scan(50-148)
🔇 Additional comments (6)
src/imgtools/io/sample_output.py (3)
201-201: Comment accuracy: ExistingFileMode.FAIL is appropriate for temp directoriesThe inline comment "Should not have conflicts in a new temp dir" is correct. Since each transaction gets a unique temporary directory via
mkdtemp, file conflicts are impossible, makingFAILthe right choice.
238-241: Good atomic commit implementation!The two-phase commit pattern (staging to temp, then moving to final) ensures atomicity. Using
shutil.moveis appropriate as it provides atomic semantics when source and destination are on the same filesystem. The explicit directory creation before moving prevents potential failures.
250-253: Cleanup implementation looks solidThe finally block ensures cleanup even on exceptions, and the existence check prevents errors if temp_dir creation failed. Good defensive programming.
src/imgtools/io/nnunet_output.py (3)
361-365: Clever use of dictionary dispatch for mask strategyThe dictionary-based dispatch pattern elegantly maps strategies to their corresponding methods. This is more maintainable than if-elif chains.
396-400: Excellent atomic commit patternThe staging and commit implementation properly ensures atomicity. Creating parent directories before moving and using
shutil.movefor atomic filesystem operations is the right approach.
410-412: Proper cleanup in finally blockThe cleanup logic correctly ensures the temporary directory is removed even if an exception occurs, preventing accumulation of temporary files.
| if mask is None: | ||
| raise MaskSavingStrategyError(f"Unknown mask saving strategy: {self.mask_saving_strategy}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code: mask will never be None
The dictionary lookup on lines 361-365 will always return a callable method (to_label_image, to_sparse_mask, or to_region_mask) because self.mask_saving_strategy is validated as an enum. The .get() call with no default will return None only for invalid enum values, which can't happen. This check is unreachable.
- if mask is None:
- raise MaskSavingStrategyError(f"Unknown mask saving strategy: {self.mask_saving_strategy}")
+ # No need to check for None - enum validation ensures valid strategyCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/imgtools/io/nnunet_output.py around lines 367-368, the if-check raising
MaskSavingStrategyError when mask is None is dead code because the dict lookup
for mask saving strategy always returns a callable for validated enum values;
remove this unreachable if-block and its raise, and if you need a defensive
guarantee for static analysis keep a short assert like "assert callable(mask)"
immediately after the lookup (or use typing.cast) instead of the conditional
raise.
Currently, in saving processed image samples, if the pipeline successfully saves the first file (e.g., CT.nii.gz) but then fails while saving the second file (e.g., RTSTRUCT.nii.gz), the process aborts, leaving an incomplete and corrupt sample in the output directory. This leads to inconsistent datasets that are not machine-learning-ready and can silently corrupt analysis downstream.
This PR fixes the lack of transactional behavior when saving image samples. In both the
AutopipelineandnnUNetPipeline, a single sample can consist of multiple files (e.g., a CT scan and its corresponding segmentation mask). The saving process writes these files sequentially.Summary by CodeRabbit
New Features
Bug Fixes