Agentic wrangling#42
Draft
bobular wants to merge 15 commits into
Draft
Conversation
- Explored the R package structure and exported functions - Updated function signatures and workflow patterns with real API - Added actual Entity/Study class structure and metadata schema - Corrected entity wrangling workflow with proper function names - Added study validation step before STF export - Documented text output patterns for parser implementation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Restructured project layout to keep R package files at root level
- Added apps/, packages/, docker/ directories for web application
- Changed tooling references from pnpm/turbo to yarn workspace + nx
- Ensures remotes::install_github('VEuPathDB/study-wrangler') continues working
- Clear separation between existing R package and new web app components
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Modified workflow to use entity_from_file() + inspect() for each uploaded file - Send full inspect() output to Claude for comprehensive entity planning - Added new Claude service methods: generateFileAnalysisStep() and planEntityMappings() - Updated workflow service to handle file-by-file analysis approach - Added file type restrictions and processing approach documentation - Specified supported formats: TSV, CSV with auto-detection - Set file size limits: 10MB per file, 50MB total, 10 files max 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
… R output - Removed TextParserService from orchestrator components and project structure - Updated WorkflowService to pass raw R output directly to Claude - Added Claude.interpretValidationResult() method for validation checking - Removed text-patterns.ts from shared packages - Simplified workflow - Claude handles all text interpretation natively - This reduces complexity and leverages Claude's excellent text understanding 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Hardcode file analysis R code instead of using Claude to generate it
- Use cleaner pipe syntax: entity_from_file("filename") %>% inspect()
- Remove unnecessary generateFileAnalysisStep() method from Claude service
- Define EntityMapping interface with name, filename, ID columns, and parent relationships
- Update WorkflowState to include entityMapping and processingOrder from Phase 1
- This saves API calls and makes the file analysis step more reliable
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Added comprehensive context-passing section for Claude API calls - Every entity wrangling step includes: user instructions, entity mapping, previous R output, current entity context - Rely on wrangler's built-in guidance rather than full API docs in prompts - Added security section with prompt injection protection: - Input sanitization for user instructions - Generated code validation with function whitelisting - R environment sandboxing (containerized, no network, resource limits) - Error handling for malicious code detection - Updated generateEntityWranglingStep method with full context structure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Added isFinalAttempt field to WranglerStep interface - Moved steps.push() from executeStep to wrangleEntity for better control - Check for "Entity is valid" directly instead of calling Claude interpretValidationResult - Store steps after marking final attempt (better style) - Added assembleFullScript() method that filters for final attempts only - Removed unused interpretValidationResult() method from Claude service - Full audit trail preserved while final script contains only working code 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Added comprehensive testing strategy with five key bullet points - Mock Claude API calls with deterministic fixtures to avoid costs/flakiness - R code execution sandbox testing with known datasets - End-to-end workflow tests covering full pipeline - WebSocket integration and multiple client connection tests - Error handling and retry logic testing for failure scenarios - Specified key testing infrastructure needs: fixtures directory, R test environment, mocking, containers 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Defines single-user local architecture using: - TypeScript MCP server with Docker container management - Layered Docker images for R + study.wrangler + RServe - Development mode with R source mounting and package reloading - Conversational interface through Claude Code 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…ture - Add bio-wrangler-mcp TypeScript package with full MCP implementation - Implement 7 MCP tools for complete study.wrangler workflow: * start_wrangling_session (container management with dev mode) * reload_study_wrangler (fast R package reloading) * execute_r_code, inspect_file, validate_entity * create_study, export_stf (STF format export) - Create layered Docker setup with shared base image - Support development mode with R source mounting and devtools::load_all() - Use veupathdb/study-wrangler:* naming convention for Docker images - Update README.org to reference new Docker structure - Remove root Dockerfile in favor of layered approach 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently just an architecture document (started off in regular Claude Opus 4 chat). Refining this with Claude Code (Sonnet 4) before deciding if and when to implement.