Fix audio decoder overflow issue#1078
Open
apsonawane wants to merge 4 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the audio decoder’s format detection against short inputs by adding explicit size validation, and adds a regression test to prevent reintroducing the out-of-bounds read.
Changes:
- Updated
AudioDecoder::ReadStreamFormatto acceptdata_sizeand reject inputs shorter than 4 bytes before checking magic bytes. - Updated the decoder call site to pass the input byte length into
ReadStreamFormat. - Added a regression test that exercises 0–3 byte inputs and asserts decoding fails gracefully.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
operators/audio/audio_decoder.h |
Extends ReadStreamFormat signature to include the input size. |
operators/audio/audio_decoder.cc |
Adds a < 4 bytes guard before reading the 4-byte marker; updates call site to pass length. |
test/pp_api_test/test_decode_audio.cc |
Adds a regression test for short buffers in format detection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| if (stream_format == AudioStreamType::kDefault) { | ||
| if (data_size < 4) { |
Contributor
There was a problem hiding this comment.
Why 4? Will this break for audio chunks that have no speech but are passed in or will this still work for those as well?
Contributor
Author
There was a problem hiding this comment.
It will not break valid audio with no speech.
4 comes from format detection logic below, it reads 4-byte magic marker
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request improves the robustness of the audio decoder by adding input validation to prevent buffer overflows when detecting audio formats, and introduces a regression test to ensure this behavior. The main changes are as follows:
Audio Decoder Input Validation:
ReadStreamFormatmethod inAudioDecodernow takes adata_sizeparameter and checks that the input audio data is at least 4 bytes long before attempting to detect the format, returning an error if it is too short. This prevents a potential heap-buffer-overflow. [1] [2] [3] [4]Testing Improvements:
TooShortForFormatDetectionintest_decode_audio.ccto verify that audio data shorter than 4 bytes is rejected gracefully, preventing regressions related to buffer overflows in format detection.