Fix audio decoder overflow issue by apsonawane · Pull Request #1078 · microsoft/onnxruntime-extensions

apsonawane · 2026-06-12T05:12:19Z

This pull request improves the robustness of the audio decoder by adding input validation to prevent buffer overflows when detecting audio formats, and introduces a regression test to ensure this behavior. The main changes are as follows:

Audio Decoder Input Validation:

The ReadStreamFormat method in AudioDecoder now takes a data_size parameter and checks that the input audio data is at least 4 bytes long before attempting to detect the format, returning an error if it is too short. This prevents a potential heap-buffer-overflow. [1] [2] [3] [4]

Testing Improvements:

Added a new test case TooShortForFormatDetection in test_decode_audio.cc to verify that audio data shorter than 4 bytes is rejected gracefully, preventing regressions related to buffer overflows in format detection.

Copilot

Pull request overview

This PR hardens the audio decoder’s format detection against short inputs by adding explicit size validation, and adds a regression test to prevent reintroducing the out-of-bounds read.

Changes:

Updated AudioDecoder::ReadStreamFormat to accept data_size and reject inputs shorter than 4 bytes before checking magic bytes.
Updated the decoder call site to pass the input byte length into ReadStreamFormat.
Added a regression test that exercises 0–3 byte inputs and asserts decoding fails gracefully.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`operators/audio/audio_decoder.h`	Extends `ReadStreamFormat` signature to include the input size.
`operators/audio/audio_decoder.cc`	Adds a `< 4` bytes guard before reading the 4-byte marker; updates call site to pass length.
`test/pp_api_test/test_decode_audio.cc`	Adds a regression test for short buffers in format detection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

kunal-vaishnavi · 2026-06-12T19:35:20Z

  }

  if (stream_format == AudioStreamType::kDefault) {
+    if (data_size < 4) {


Why 4? Will this break for audio chunks that have no speech but are passed in or will this still work for those as well?

It will not break valid audio with no speech.
4 comes from format detection logic below, it reads 4-byte magic marker

Fix audio decoder overflow

5b4c04c

apsonawane requested a review from a team as a code owner June 12, 2026 05:12

Copilot AI review requested due to automatic review settings June 12, 2026 05:12

Copilot started reviewing on behalf of apsonawane June 12, 2026 05:12 View session

apsonawane enabled auto-merge (squash) June 12, 2026 05:13

Copilot AI reviewed Jun 12, 2026

View reviewed changes

Comment thread test/pp_api_test/test_decode_audio.cc Outdated

Address comments

ebd336f

kunal-vaishnavi reviewed Jun 12, 2026

View reviewed changes

apsonawane added 2 commits June 12, 2026 16:39

Merge branch 'main' into asonawane/audio-decoder

efc3146

Merge branch 'main' into asonawane/audio-decoder

90e81ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix audio decoder overflow issue#1078

Fix audio decoder overflow issue#1078
apsonawane wants to merge 4 commits into
mainfrom
asonawane/audio-decoder

apsonawane commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

kunal-vaishnavi Jun 12, 2026

Uh oh!

apsonawane Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

apsonawane commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

kunal-vaishnavi Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

apsonawane Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants