Agentic wrangling by bobular · Pull Request #42 · VEuPathDB/study-wrangler

bobular · 2025-07-21T16:25:01Z

Currently just an architecture document (started off in regular Claude Opus 4 chat). Refining this with Claude Code (Sonnet 4) before deciding if and when to implement.

- Explored the R package structure and exported functions - Updated function signatures and workflow patterns with real API - Added actual Entity/Study class structure and metadata schema - Corrected entity wrangling workflow with proper function names - Added study validation step before STF export - Documented text output patterns for parser implementation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Restructured project layout to keep R package files at root level - Added apps/, packages/, docker/ directories for web application - Changed tooling references from pnpm/turbo to yarn workspace + nx - Ensures remotes::install_github('VEuPathDB/study-wrangler') continues working - Clear separation between existing R package and new web app components 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Modified workflow to use entity_from_file() + inspect() for each uploaded file - Send full inspect() output to Claude for comprehensive entity planning - Added new Claude service methods: generateFileAnalysisStep() and planEntityMappings() - Updated workflow service to handle file-by-file analysis approach - Added file type restrictions and processing approach documentation - Specified supported formats: TSV, CSV with auto-detection - Set file size limits: 10MB per file, 50MB total, 10 files max 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

… R output - Removed TextParserService from orchestrator components and project structure - Updated WorkflowService to pass raw R output directly to Claude - Added Claude.interpretValidationResult() method for validation checking - Removed text-patterns.ts from shared packages - Simplified workflow - Claude handles all text interpretation natively - This reduces complexity and leverages Claude's excellent text understanding 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Hardcode file analysis R code instead of using Claude to generate it - Use cleaner pipe syntax: entity_from_file("filename") %>% inspect() - Remove unnecessary generateFileAnalysisStep() method from Claude service - Define EntityMapping interface with name, filename, ID columns, and parent relationships - Update WorkflowState to include entityMapping and processingOrder from Phase 1 - This saves API calls and makes the file analysis step more reliable 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Added comprehensive context-passing section for Claude API calls - Every entity wrangling step includes: user instructions, entity mapping, previous R output, current entity context - Rely on wrangler's built-in guidance rather than full API docs in prompts - Added security section with prompt injection protection: - Input sanitization for user instructions - Generated code validation with function whitelisting - R environment sandboxing (containerized, no network, resource limits) - Error handling for malicious code detection - Updated generateEntityWranglingStep method with full context structure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Added isFinalAttempt field to WranglerStep interface - Moved steps.push() from executeStep to wrangleEntity for better control - Check for "Entity is valid" directly instead of calling Claude interpretValidationResult - Store steps after marking final attempt (better style) - Added assembleFullScript() method that filters for final attempts only - Removed unused interpretValidationResult() method from Claude service - Full audit trail preserved while final script contains only working code 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Added comprehensive testing strategy with five key bullet points - Mock Claude API calls with deterministic fixtures to avoid costs/flakiness - R code execution sandbox testing with known datasets - End-to-end workflow tests covering full pipeline - WebSocket integration and multiple client connection tests - Error handling and retry logic testing for failure scenarios - Specified key testing infrastructure needs: fixtures directory, R test environment, mocking, containers 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Defines single-user local architecture using: - TypeScript MCP server with Docker container management - Layered Docker images for R + study.wrangler + RServe - Development mode with R source mounting and package reloading - Conversational interface through Claude Code 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…ture - Add bio-wrangler-mcp TypeScript package with full MCP implementation - Implement 7 MCP tools for complete study.wrangler workflow: * start_wrangling_session (container management with dev mode) * reload_study_wrangler (fast R package reloading) * execute_r_code, inspect_file, validate_entity * create_study, export_stf (STF format export) - Create layered Docker setup with shared base image - Support development mode with R source mounting and devtools::load_all() - Use veupathdb/study-wrangler:* naming convention for Docker images - Update README.org to reference new Docker structure - Remove root Dockerfile in favor of layered approach 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

bobular and others added 15 commits July 21, 2025 10:48

add initial architecture doc

082b550

removed note

17cf184

websocket specs

3152674

cleaned up ascii art

35fb028

realign overall roadmap to local CC+MCP then web-embedded chat+MCP

ca31af2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agentic wrangling#42

Agentic wrangling#42
bobular wants to merge 15 commits into
mainfrom
agentic-wrangling

bobular commented Jul 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bobular commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bobular commented Jul 21, 2025 •

edited

Loading