feat: add multimodal image support to chat input#2305
feat: add multimodal image support to chat input#2305riyo264 wants to merge 55 commits intoarc53:mainfrom
Conversation
|
@riyo264 is attempting to deploy a commit to the Arc53 Team on Vercel. A member of the Team first needs to authorize it. |
221f437 to
97ea125
Compare
Remove unnecessary check for stream_result length.
Change return values on exception to empty string and list.
Change return values on exception handling to None.
Updated mock StreamProcessor to return new values for pre-fetch methods.
Updated mock StreamProcessor to return modified values for pre_fetch_docs and pre_fetch_tools.
Bumps [i18next-browser-languagedetector](https://github.com/i18next/i18next-browser-languageDetector) from 8.2.0 to 8.2.1. - [Changelog](https://github.com/i18next/i18next-browser-languageDetector/blob/master/CHANGELOG.md) - [Commits](i18next/i18next-browser-languageDetector@v8.2.0...v8.2.1) --- updated-dependencies: - dependency-name: i18next-browser-languagedetector dependency-version: 8.2.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [react-i18next](https://github.com/i18next/react-i18next) from 16.2.4 to 16.5.8. - [Changelog](https://github.com/i18next/react-i18next/blob/master/CHANGELOG.md) - [Commits](i18next/react-i18next@v16.2.4...v16.5.8) --- updated-dependencies: - dependency-name: react-i18next dependency-version: 16.5.8 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [prettier-plugin-tailwindcss](https://github.com/tailwindlabs/prettier-plugin-tailwindcss) from 0.7.1 to 0.7.2. - [Release notes](https://github.com/tailwindlabs/prettier-plugin-tailwindcss/releases) - [Changelog](https://github.com/tailwindlabs/prettier-plugin-tailwindcss/blob/main/CHANGELOG.md) - [Commits](tailwindlabs/prettier-plugin-tailwindcss@v0.7.1...v0.7.2) --- updated-dependencies: - dependency-name: prettier-plugin-tailwindcss dependency-version: 0.7.2 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [@typescript-eslint/eslint-plugin](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/eslint-plugin) from 8.46.3 to 8.57.1. - [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases) - [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/eslint-plugin/CHANGELOG.md) - [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.57.1/packages/eslint-plugin) --- updated-dependencies: - dependency-name: "@typescript-eslint/eslint-plugin" dependency-version: 8.57.1 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… config Agent-Logs-Url: https://github.com/arc53/DocsGPT/sessions/c6bfd68d-4dac-46ec-8404-fe5bfda0e8f3 Co-authored-by: dartpain <15183589+dartpain@users.noreply.github.com>
Bumps [langchain-core](https://github.com/langchain-ai/langchain) from 1.2.23 to 1.2.26. - [Release notes](https://github.com/langchain-ai/langchain/releases) - [Commits](langchain-ai/langchain@langchain-core==1.2.23...langchain-core==1.2.26) --- updated-dependencies: - dependency-name: langchain-core dependency-version: 1.2.26 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [tzdata](https://github.com/python/tzdata) from 2025.3 to 2026.1. - [Release notes](https://github.com/python/tzdata/releases) - [Changelog](https://github.com/python/tzdata/blob/master/NEWS.md) - [Commits](python/tzdata@2025.3...2026.1) --- updated-dependencies: - dependency-name: tzdata dependency-version: '2026.1' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps the pip group with 1 update in the /application directory: [cryptography](https://github.com/pyca/cryptography). Updates `cryptography` from 46.0.6 to 46.0.7 - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@46.0.6...46.0.7) --- updated-dependencies: - dependency-name: cryptography dependency-version: 46.0.7 dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [@babel/core](https://github.com/babel/babel/tree/HEAD/packages/babel-core) from 7.24.6 to 7.29.0. - [Release notes](https://github.com/babel/babel/releases) - [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md) - [Commits](https://github.com/babel/babel/commits/v7.29.0/packages/babel-core) --- updated-dependencies: - dependency-name: "@babel/core" dependency-version: 7.29.0 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [mermaid](https://github.com/mermaid-js/mermaid) from 11.13.0 to 11.14.0. - [Release notes](https://github.com/mermaid-js/mermaid/releases) - [Commits](https://github.com/mermaid-js/mermaid/compare/mermaid@11.13.0...mermaid@11.14.0) --- updated-dependencies: - dependency-name: mermaid dependency-version: 11.14.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [tailwindcss](https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss) from 4.2.1 to 4.2.2. - [Release notes](https://github.com/tailwindlabs/tailwindcss/releases) - [Changelog](https://github.com/tailwindlabs/tailwindcss/blob/main/CHANGELOG.md) - [Commits](https://github.com/tailwindlabs/tailwindcss/commits/v4.2.2/packages/tailwindcss) --- updated-dependencies: - dependency-name: tailwindcss dependency-version: 4.2.2 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [@tailwindcss/postcss](https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss) from 4.1.16 to 4.2.2. - [Release notes](https://github.com/tailwindlabs/tailwindcss/releases) - [Changelog](https://github.com/tailwindlabs/tailwindcss/blob/main/CHANGELOG.md) - [Commits](https://github.com/tailwindlabs/tailwindcss/commits/v4.2.2/packages/@tailwindcss-postcss) --- updated-dependencies: - dependency-name: "@tailwindcss/postcss" dependency-version: 4.2.2 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [react-dom](https://github.com/facebook/react/tree/HEAD/packages/react-dom) and [@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom). These dependencies needed to be updated together. Updates `react-dom` from 19.2.0 to 19.2.5 - [Release notes](https://github.com/facebook/react/releases) - [Changelog](https://github.com/facebook/react/blob/main/CHANGELOG.md) - [Commits](https://github.com/facebook/react/commits/v19.2.5/packages/react-dom) Updates `@types/react-dom` from 19.2.2 to 19.2.3 - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom) --- updated-dependencies: - dependency-name: react-dom dependency-version: 19.2.5 dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: "@types/react-dom" dependency-version: 19.2.3 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [@babel/preset-env](https://github.com/babel/babel/tree/HEAD/packages/babel-preset-env) from 7.24.6 to 7.29.2. - [Release notes](https://github.com/babel/babel/releases) - [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md) - [Commits](https://github.com/babel/babel/commits/v7.29.2/packages/babel-preset-env) --- updated-dependencies: - dependency-name: "@babel/preset-env" dependency-version: 7.29.2 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [eslint-plugin-n](https://github.com/eslint-community/eslint-plugin-n) from 17.23.1 to 17.24.0. - [Release notes](https://github.com/eslint-community/eslint-plugin-n/releases) - [Changelog](https://github.com/eslint-community/eslint-plugin-n/blob/master/CHANGELOG.md) - [Commits](eslint-community/eslint-plugin-n@v17.23.1...v17.24.0) --- updated-dependencies: - dependency-name: eslint-plugin-n dependency-version: 17.24.0 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [svgo](https://github.com/svg/svgo) from 3.3.3 to 4.0.1. - [Release notes](https://github.com/svg/svgo/releases) - [Commits](svg/svgo@v3.3.3...v4.0.1) --- updated-dependencies: - dependency-name: svgo dependency-version: 4.0.1 dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [eslint-plugin-prettier](https://github.com/prettier/eslint-plugin-prettier) from 5.5.4 to 5.5.5. - [Release notes](https://github.com/prettier/eslint-plugin-prettier/releases) - [Changelog](https://github.com/prettier/eslint-plugin-prettier/blob/main/CHANGELOG.md) - [Commits](prettier/eslint-plugin-prettier@v5.5.4...v5.5.5) --- updated-dependencies: - dependency-name: eslint-plugin-prettier dependency-version: 5.5.5 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react) from 0.562.0 to 1.8.0. - [Release notes](https://github.com/lucide-icons/lucide/releases) - [Commits](https://github.com/lucide-icons/lucide/commits/1.8.0/packages/lucide-react) --- updated-dependencies: - dependency-name: lucide-react dependency-version: 1.8.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
* feat: postgres tests * feat: mongo cutoff * feat: mongo cutoff * feat: adjust docs and compose files * fix: mini code mongo removals * fix: tests and k8s mongo stuff * feat: test fixes * fix: ruff * fix: vale * Potential fix for pull request finding 'CodeQL / Clear-text logging of sensitive information' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * fix: mini suggestions * vale lint fix 2 * fix: codeql columns thing * fix: test mongo * fix: tests coverage * feat: better tests 4 * feat: more tests * feat: decent coverage * fix: ruff fixes * fix: remove mongo mock * feat: enhance workflow engine and API routes; add document retrieval and source handling * feat: e2e tests * fix: mcp, mongo and more * fix: mini codeql warning * fix: agent chunk view * fix: mini issues * fix: more pg fixes * feat: postgres prep on start * feat: qa tests * fix: mini improvements * fix: tests --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Siddhant Rai <siddhant.rai.5686@gmail.com>
Bumps [react-router-dom](https://github.com/remix-run/react-router/tree/HEAD/packages/react-router-dom) from 7.13.1 to 7.14.1. - [Release notes](https://github.com/remix-run/react-router/releases) - [Changelog](https://github.com/remix-run/react-router/blob/main/packages/react-router-dom/CHANGELOG.md) - [Commits](https://github.com/remix-run/react-router/commits/react-router-dom@7.14.1/packages/react-router-dom) --- updated-dependencies: - dependency-name: react-router-dom dependency-version: 7.14.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
69354ba to
b063012
Compare
ManishMadan2882
left a comment
There was a problem hiding this comment.
Thanks for the effort here, @riyo264 - sharing concrete feedback on design and scope so a follow-up can land cleanly.
Design critique
1. This duplicates an existing feature instead of extending it.
DocsGPT already has a working multimodal attachment pipeline:
PR #1733
POST /api/store_attachment-> Celeryattachment_worker->attachmentstable- Poll
/api/task_status-> getattachment_id - Pass
attachments: ["<id>"]to/stream(ordocsgpt.attachmentson/v1/chat/completions)
Images already flow through the provider layer (_upload_attachment_to_google), with test coverage and docs in place. Introducing a separate image_base64 path that lives in Redux and bypasses Celery, StreamProcessor, and the provider abstraction creates a parallel system that will drift over time.
2. multimodal_service.run_multimodal_completion bypasses core architecture.
This sidesteps LLMCreator, prompt rendering, tool execution, usage tracking, persistence, and streaming. That leads to:
- Hardcoded system prompt overriding agent configuration
- Ignoring prompt templates (
docs_togetherinjected as raw text) - Non-standard SSE events (missing
conversation_id,sources, etc.) - No persistence despite claims in the PR
- Usage not recorded (billing and limits bypassed)
- No fallback models or API key resolution
The correct integration point here is StreamProcessor, not a parallel execution path.
3. Control-flow regression in routes.
In both answer.py and stream.py, the multimodal branch is added before the existing flow rather than replacing it. For non-image requests this causes:
- Duplicate prefetch and usage checks
- Duplicate stream execution (first result discarded)
- Two agent instances per request, with inconsistent state
This is a silent regression affecting all traffic, not just multimodal. It needs to be fixed before anything ships.
4. Base64 images in Redux are the wrong abstraction.
Storing multi-MB blobs per message in client state does not scale. The current UUID-based attachment model offloads storage to backend or CDN. This change increases memory usage and bloats persisted state.
5. Provider routing is reimplemented (and diverges).
run_multimodal_completion reintroduces provider selection and even overrides model_id implicitly. This logic already exists in LLMCreator. Duplicating it guarantees drift, and silently changing models is not a safe default.
6. Additional issues compounding the above:
ChatOpenAI(thinking_budget=0)-> invalid arg, will errornormalize_question_payloadintroduces a "legacy" format that does not exist- Duplicate
extract_markdown_image_urls, with one unused - Missing dependency (
langchain-google-genai)
Scope creep
This PR goes well beyond multimodal input and introduces unrelated regressions:
application/llm/google_ai.py: breaks tool-calling and streaming behavior on Google providersettings.py: unrelated config churn (likely rebase residue)requirements.txt: unnecessary dependency changes- Frontend files: multiple unrelated edits
ConversationBubble.tsx: massive reformat obscuring the actual changeMessageInput.tsx: breaking API changes and removal of cleanup logicconversationHandlers.ts: breaking function signature change- No tests for new logic
- Lint, build, and test issues in current state
This makes the PR hard to review and risky to merge.
leaning towards closing this PR.
|
Thank you so much for the detailed breakdown @ManishMadan2882. You are completely right, I fundamentally misunderstood the existing attachment pipeline and ended up building a parallel system that bypasses the core architecture. I'm going to close this PR and keep all the things you mentioned in mind and start fresh. I really appreciate the guidance through the architecture. |
🚀 feat: Multimodal Vision Support (Gemini & OpenAI)
📝 Summary
This PR introduces multimodal capabilities to DocsGPT, allowing users to upload images alongside their text queries. The system can now analyze visual data (diagrams, screenshots, etc.) while leveraging the existing RAG pipeline to provide context-aware answers.
🛠️ Key Changes
multimodal_service.py: A new centralized service to handle routing between Google Gemini and OpenAI Vision models.
Docling/RAG Integration: The service explicitly uses the docs_together context retrieved from the Docling pipeline, ensuring visual analysis is grounded in the provided documentation.
API Normalization: Added normalize_question_payload to handle both camelCase and snake_case keys and support legacy JSON-string formats for backward compatibility.
Stable API Handshaking: Targeted the Gemini v1 stable endpoint to ensure production reliability and prevent v1beta handshake errors.
State Persistence: Updated sharedConversationSlice.ts and conversationSlice.ts to store imageBase64 within the Query objects. This ensures that uploaded images persist in the chat history and remain visible when switching between conversations.
API Handlers: Updated conversationHandlers.ts to pass multimodal data through both standard and streaming paths.
Models: Expanded RetrievalPayload and Query interfaces in conversationModels.ts to support image data.
Updated requirements.txt to include langchain-google-genai>=1.0.0.
⚙️ Environment Configuration
To enable this feature, the following variables should be added to the .env file:
GOOGLE_API_KEY: Required for Gemini support.
LLM_PROVIDER: Set to google or openai.
LLM_NAME: Set to a vision-capable model (e.g., gemini-1.5-flash).
🧪 Testing & Verification
Full-Stack Data Flow: Confirmed that images are correctly encoded to Base64 on the frontend and decoded in the service layer.
Isolation: Verified that the multimodal path only triggers when an image is present, ensuring zero impact on standard text-only RAG or OCR/Docling ingestion paths.
State Integrity: Confirmed that images remain associated with their respective queries when navigating the conversation history.
This change was needed to:
Bridge the Visual Gap: Allow users to troubleshoot "real-world" scenarios by uploading screenshots of error messages or configurations.
Enhance Docling Context: Complement the Docling ingestion pipeline by allowing the LLM to perform native visual reasoning on diagrams that are difficult to represent through OCR alone.
Future-Proof the Architecture: Update the project's state management and routing "plumbing" to handle multimodal data (Base64/MIME), setting the foundation for future media-rich interactions.
Note:
During local testing, some 404 NOT_FOUND errors were observed due to local API key/region versioning quirks (v1beta). This has been mitigated in the code by explicitly targeting the stable v1 endpoint, which is the standard for production environments.
Fixes #1451