Skip to content

Agentic wrangling#42

Draft
bobular wants to merge 15 commits into
mainfrom
agentic-wrangling
Draft

Agentic wrangling#42
bobular wants to merge 15 commits into
mainfrom
agentic-wrangling

Conversation

@bobular

@bobular bobular commented Jul 21, 2025

Copy link
Copy Markdown
Member

Currently just an architecture document (started off in regular Claude Opus 4 chat). Refining this with Claude Code (Sonnet 4) before deciding if and when to implement.

bobular and others added 15 commits July 21, 2025 10:48
- Explored the R package structure and exported functions
- Updated function signatures and workflow patterns with real API
- Added actual Entity/Study class structure and metadata schema
- Corrected entity wrangling workflow with proper function names
- Added study validation step before STF export
- Documented text output patterns for parser implementation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Restructured project layout to keep R package files at root level
- Added apps/, packages/, docker/ directories for web application
- Changed tooling references from pnpm/turbo to yarn workspace + nx
- Ensures remotes::install_github('VEuPathDB/study-wrangler') continues working
- Clear separation between existing R package and new web app components

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Modified workflow to use entity_from_file() + inspect() for each uploaded file
- Send full inspect() output to Claude for comprehensive entity planning
- Added new Claude service methods: generateFileAnalysisStep() and planEntityMappings()
- Updated workflow service to handle file-by-file analysis approach
- Added file type restrictions and processing approach documentation
- Specified supported formats: TSV, CSV with auto-detection
- Set file size limits: 10MB per file, 50MB total, 10 files max

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
… R output

- Removed TextParserService from orchestrator components and project structure
- Updated WorkflowService to pass raw R output directly to Claude
- Added Claude.interpretValidationResult() method for validation checking
- Removed text-patterns.ts from shared packages
- Simplified workflow - Claude handles all text interpretation natively
- This reduces complexity and leverages Claude's excellent text understanding

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Hardcode file analysis R code instead of using Claude to generate it
- Use cleaner pipe syntax: entity_from_file("filename") %>% inspect()
- Remove unnecessary generateFileAnalysisStep() method from Claude service
- Define EntityMapping interface with name, filename, ID columns, and parent relationships
- Update WorkflowState to include entityMapping and processingOrder from Phase 1
- This saves API calls and makes the file analysis step more reliable

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Added comprehensive context-passing section for Claude API calls
- Every entity wrangling step includes: user instructions, entity mapping, previous R output, current entity context
- Rely on wrangler's built-in guidance rather than full API docs in prompts
- Added security section with prompt injection protection:
  - Input sanitization for user instructions
  - Generated code validation with function whitelisting
  - R environment sandboxing (containerized, no network, resource limits)
- Error handling for malicious code detection
- Updated generateEntityWranglingStep method with full context structure

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Added isFinalAttempt field to WranglerStep interface
- Moved steps.push() from executeStep to wrangleEntity for better control
- Check for "Entity is valid" directly instead of calling Claude interpretValidationResult
- Store steps after marking final attempt (better style)
- Added assembleFullScript() method that filters for final attempts only
- Removed unused interpretValidationResult() method from Claude service
- Full audit trail preserved while final script contains only working code

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Added comprehensive testing strategy with five key bullet points
- Mock Claude API calls with deterministic fixtures to avoid costs/flakiness
- R code execution sandbox testing with known datasets
- End-to-end workflow tests covering full pipeline
- WebSocket integration and multiple client connection tests
- Error handling and retry logic testing for failure scenarios
- Specified key testing infrastructure needs: fixtures directory, R test environment, mocking, containers

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Defines single-user local architecture using:
- TypeScript MCP server with Docker container management
- Layered Docker images for R + study.wrangler + RServe
- Development mode with R source mounting and package reloading
- Conversational interface through Claude Code

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ture

- Add bio-wrangler-mcp TypeScript package with full MCP implementation
- Implement 7 MCP tools for complete study.wrangler workflow:
  * start_wrangling_session (container management with dev mode)
  * reload_study_wrangler (fast R package reloading)
  * execute_r_code, inspect_file, validate_entity
  * create_study, export_stf (STF format export)
- Create layered Docker setup with shared base image
- Support development mode with R source mounting and devtools::load_all()
- Use veupathdb/study-wrangler:* naming convention for Docker images
- Update README.org to reference new Docker structure
- Remove root Dockerfile in favor of layered approach

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant