Skip to content

Fix audio decoder overflow issue#1078

Open
apsonawane wants to merge 4 commits into
mainfrom
asonawane/audio-decoder
Open

Fix audio decoder overflow issue#1078
apsonawane wants to merge 4 commits into
mainfrom
asonawane/audio-decoder

Conversation

@apsonawane

Copy link
Copy Markdown
Contributor

This pull request improves the robustness of the audio decoder by adding input validation to prevent buffer overflows when detecting audio formats, and introduces a regression test to ensure this behavior. The main changes are as follows:

Audio Decoder Input Validation:

  • The ReadStreamFormat method in AudioDecoder now takes a data_size parameter and checks that the input audio data is at least 4 bytes long before attempting to detect the format, returning an error if it is too short. This prevents a potential heap-buffer-overflow. [1] [2] [3] [4]

Testing Improvements:

  • Added a new test case TooShortForFormatDetection in test_decode_audio.cc to verify that audio data shorter than 4 bytes is rejected gracefully, preventing regressions related to buffer overflows in format detection.

@apsonawane apsonawane requested a review from a team as a code owner June 12, 2026 05:12
Copilot AI review requested due to automatic review settings June 12, 2026 05:12
@apsonawane apsonawane enabled auto-merge (squash) June 12, 2026 05:13

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the audio decoder’s format detection against short inputs by adding explicit size validation, and adds a regression test to prevent reintroducing the out-of-bounds read.

Changes:

  • Updated AudioDecoder::ReadStreamFormat to accept data_size and reject inputs shorter than 4 bytes before checking magic bytes.
  • Updated the decoder call site to pass the input byte length into ReadStreamFormat.
  • Added a regression test that exercises 0–3 byte inputs and asserts decoding fails gracefully.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
operators/audio/audio_decoder.h Extends ReadStreamFormat signature to include the input size.
operators/audio/audio_decoder.cc Adds a < 4 bytes guard before reading the 4-byte marker; updates call site to pass length.
test/pp_api_test/test_decode_audio.cc Adds a regression test for short buffers in format detection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/pp_api_test/test_decode_audio.cc Outdated
}

if (stream_format == AudioStreamType::kDefault) {
if (data_size < 4) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 4? Will this break for audio chunks that have no speech but are passed in or will this still work for those as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not break valid audio with no speech.
4 comes from format detection logic below, it reads 4-byte magic marker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants