Skip to content

feat: add assistant response audio playback#595

Draft
Wolfy18 wants to merge 1 commit intoDimillian:mainfrom
Wolfy18:feat/agent-response-audio-i23byi
Draft

feat: add assistant response audio playback#595
Wolfy18 wants to merge 1 commit intoDimillian:mainfrom
Wolfy18:feat/agent-response-audio-i23byi

Conversation

@Wolfy18
Copy link
Copy Markdown

@Wolfy18 Wolfy18 commented Apr 9, 2026

Issue

Assistant responses in CodexMonitor only exposed quote and copy actions.

Users had no way to listen to long responses, and there was no option to use the current conversation model to produce a shorter audio summary when full playback was too verbose.

That forced users into manual reading even when the response content was better consumed as audio.

Root Cause

The message UI had no audio action surface and no shared playback controller.

The frontend also had no way to extract speakable text from rendered markdown, and the backend had no hidden summary-generation command that respected the selected conversation model across both app and daemon runtimes.

Fix

This PR adds assistant response audio playback for full responses and model-generated summaries.

  • added a message audio action and popover menu beside the existing assistant bubble actions
  • introduced a shared frontend audio hook that manages browser speech synthesis, cancellation, active state, and summary caching
  • derived full-response playback text from rendered markdown content so spoken output follows the visible response instead of raw markdown syntax
  • threaded the selected model through the message surface so summary playback uses the current conversation model
  • added a new generate_message_audio_summary backend command in shared core, app, daemon, and frontend IPC layers
  • added frontend and service tests covering playback controls, summary generation, cancellation, caching, and the new Tauri wrapper

User Impact

Assistant responses with text now show an audio control that lets users listen to the full reply or request a shorter spoken summary using the current conversation model.

Playback is limited to one active response at a time, and users can stop audio immediately from the same control.

Validation

Passed:

  • npm run test -- src/features/messages/hooks/useMessageAudio.test.tsx src/features/messages/components/Messages.test.tsx src/services/tauri.test.ts
  • npm run typecheck

Attempted but blocked in this environment:

  • cd src-tauri && cargo check
    • blocked because whisper-rs-sys requires cmake, and cmake is not installed in this environment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant