Enhance Docker Compose and Helm detectors for parallel processing#1819
Merged
Conversation
…ssing and file size handling
…d hash-delimited tokens in image references
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves Docker Compose and Helm image reference detection by expanding placeholder detection, reducing YAML parsing allocations, adding safety checks for oversized Helm values files, and enabling parallel processing to scale scanning across many manifests.
Changes:
- Expanded
DockerReferenceUtility.HasUnresolvedVariablesto treat additional templating patterns (double-underscore and#...#tokens) as unresolved so they’re skipped instead of logged as parse failures. - Updated Docker Compose and Helm detectors to parse YAML directly from the component stream and enabled parallel processing for better throughput.
- Added Helm values file size guarding (20 MB) and expanded unit tests for the new placeholder patterns / logging behavior.
Show a summary per file
| File | Description |
|---|---|
| test/Microsoft.ComponentDetection.Common.Tests/DockerReferenceUtilityTests.cs | Adds test coverage for new templating placeholder patterns and ensures templated references don’t log warnings. |
| src/Microsoft.ComponentDetection.Detectors/helm/HelmComponentDetector.cs | Enables parallelism, adds a max values-file size guard, and parses YAML from stream to reduce allocations. |
| src/Microsoft.ComponentDetection.Detectors/dockercompose/DockerComposeComponentDetector.cs | Enables parallelism and parses YAML from stream to reduce allocations. |
| src/Microsoft.ComponentDetection.Common/DockerReference/DockerReferenceUtility.cs | Expands placeholder detection logic to skip templated references earlier. |
Copilot's findings
Comments suppressed due to low confidence (2)
src/Microsoft.ComponentDetection.Detectors/dockercompose/DockerComposeComponentDetector.cs:55
cancellationTokenis no longer used after switching to synchronousYamlStream.Load(reader). This means compose parsing will continue even after a scan is cancelled, which can waste CPU under parallelism. Add an early cancellation check before doing any work.
protected override Task OnFileFoundAsync(ProcessRequest processRequest, IDictionary<string, string> detectorArgs, CancellationToken cancellationToken = default)
{
var singleFileComponentRecorder = processRequest.SingleFileComponentRecorder;
var file = processRequest.ComponentStream;
src/Microsoft.ComponentDetection.Detectors/helm/HelmComponentDetector.cs:88
cancellationTokenis no longer used after switching to synchronousYamlStream.Load(reader). This can make scan cancellation less responsive, especially withEnableParallelism = true. Add an early cancellation check before file-size checks / parsing begins.
protected override Task OnFileFoundAsync(ProcessRequest processRequest, IDictionary<string, string> detectorArgs, CancellationToken cancellationToken = default)
{
var file = processRequest.ComponentStream;
// OnPrepareDetectionAsync has already filtered to values files co-located
- Files reviewed: 4/4 changed files
- Comments generated: 1
JamieMagee
approved these changes
Jun 4, 2026
|
👋 Hi! It looks like you modified some files in the
If none of the above scenarios apply, feel free to ignore this comment 🙂 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request improves the accuracy and performance of Docker and Helm image reference detection by enhancing placeholder detection, optimizing YAML parsing, and adding safety checks for large files. It also introduces parallel processing for both Docker Compose and Helm detectors to improve scalability, and expands test coverage for these scenarios.
Enhancements to placeholder detection and parsing:
DockerReferenceUtility.HasUnresolvedVariablesmethod to detect additional templating patterns, including double-underscore tokens (e.g.,__IMAGE_TAG__) and hash-delimited tokens (e.g.,#imageTag#), ensuring that unresolved or templated image references are correctly skipped and not reported as parse failures.Performance and scalability improvements:
DockerComposeComponentDetectorandHelmComponentDetectorby settingEnableParallelism = true, allowing multiple files to be parsed concurrently for better performance on large repositories. [1] [2]YAML parsing optimizations:
Safety and resource usage:
HelmComponentDetectorto skip parsing of excessively large values files (over 20 MB), preventing timeouts or memory exhaustion from pathological files. [1] [2]Code quality and maintainability: