Conversation
Introduces the foundational IBrowserContext interface and all supporting types (ElementRef, AriaRole, AccessibilityTree, options, results, etc.) that all browser backend implementations will target. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a module-level BrowserSessionRegistry for managing IBrowserContext sessions by UUID, a MockBrowserContext test helper with MOCK_SNAPSHOT, and a full vitest test suite covering all registry operations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements IBrowserContext using Playwright with lazy dynamic import (optional dependency). Supports local chromium launch and CDP cloud connection modes, ARIA snapshot parsing with ref assignment, full element interaction, tabs, dialogs, file upload/download, and waits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements two workflow task nodes for managing browser sessions: BrowserSessionTask creates a session via configured backend and registers it; BrowserCloseTask disconnects and unregisters by sessionId. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…on tasks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements Tasks 10-13: BrowserPressKeyTask, BrowserTypeTask, BrowserScrollTask, BrowserUploadTask, BrowserWaitTask, BrowserNewTabTask, BrowserSwitchTabTask, BrowserCloseTabTask, and BrowserLoginTask (with dynamic entitlements for credential mode). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Create index files for clean re-exports (tasks/index.ts, browser/index.ts), a register.ts that registers all 26 browser tasks with TaskRegistry, and wire up registerBrowserDeps with PlaywrightBackend + file-system profile storage in bun.ts and node.ts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements IBrowserContext for Electron's native webContents via Chrome DevTools Protocol, with session isolation via session.fromPartition(). Uses lazy dynamic import pattern (same as PlaywrightBackend) since electron types are not installed in the libs repo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@workglow/cli
@workglow/ai
@workglow/ai-provider
@workglow/job-queue
@workglow/knowledge-base
@workglow/storage
@workglow/task-graph
@workglow/tasks
@workglow/util
workglow
commit: |
There was a problem hiding this comment.
Pull request overview
Adds a unified browser automation surface to @workglow/tasks by introducing an IBrowserContext abstraction, platform-registered dependencies, two concrete backends (Playwright/Electron), and a suite of browser tasks with accompanying unit tests.
Changes:
- Introduces
IBrowserContext,BrowserSessionRegistry, and DI wiring viaBrowserTaskDeps+registerBrowserDeps(). - Adds Playwright and Electron backends implementing
IBrowserContextoperations (navigation, a11y snapshotting, interaction, etc.). - Adds a set of browser tasks (session/navigation/interaction/extraction/tabs/wait/login) and unit tests using a mock browser context.
Reviewed changes
Copilot reviewed 43 out of 43 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/test/src/test/browser/MockBrowserContext.ts | Adds a test double implementing IBrowserContext and recording calls. |
| packages/test/src/test/browser/BrowserSessionTask.test.ts | Tests session creation and closing behavior. |
| packages/test/src/test/browser/BrowserSessionRegistry.test.ts | Tests registry lifecycle operations (register/get/unregister/clear/disconnectAll). |
| packages/test/src/test/browser/BrowserNavigateTask.test.ts | Tests navigation/back/forward/reload tasks against the mock context. |
| packages/test/src/test/browser/BrowserInteractionTask.test.ts | Tests click/fill tasks and input validation paths. |
| packages/tasks/src/util/BrowserTaskDeps.ts | Adds DI token + registration/getter for browser dependencies and profile storage interface. |
| packages/tasks/src/task/browser/tasks/index.ts | Re-exports browser task classes from a single barrel. |
| packages/tasks/src/task/browser/tasks/BrowserWaitTask.ts | Adds wait task (navigation/selector/idle) with config-driven behavior. |
| packages/tasks/src/task/browser/tasks/BrowserUploadTask.ts | Adds file upload task (ref + file paths). |
| packages/tasks/src/task/browser/tasks/BrowserTypeTask.ts | Adds typing task for the currently focused element. |
| packages/tasks/src/task/browser/tasks/BrowserSwitchTabTask.ts | Adds tab switching task. |
| packages/tasks/src/task/browser/tasks/BrowserSnapshotTask.ts | Adds accessibility snapshot task returning AccessibilityTree. |
| packages/tasks/src/task/browser/tasks/BrowserSessionTask.ts | Adds session creation task with backend selection + entitlements. |
| packages/tasks/src/task/browser/tasks/BrowserSelectTask.ts | Adds select-option task (ref-based). |
| packages/tasks/src/task/browser/tasks/BrowserScrollTask.ts | Adds scroll task (page or element via optional ref). |
| packages/tasks/src/task/browser/tasks/BrowserScreenshotTask.ts | Adds screenshot task returning binary image data. |
| packages/tasks/src/task/browser/tasks/BrowserReloadTask.ts | Adds reload task. |
| packages/tasks/src/task/browser/tasks/BrowserQuerySelectorTask.ts | Adds CSS query task returning element refs. |
| packages/tasks/src/task/browser/tasks/BrowserPressKeyTask.ts | Adds key press task with modifiers config. |
| packages/tasks/src/task/browser/tasks/BrowserNewTabTask.ts | Adds new tab task returning tabId. |
| packages/tasks/src/task/browser/tasks/BrowserNavigateTask.ts | Adds navigation task (URL + waitUntil) + entitlements. |
| packages/tasks/src/task/browser/tasks/BrowserLoginTask.ts | Adds login task scaffold with mode selection + dynamic entitlements. |
| packages/tasks/src/task/browser/tasks/BrowserHoverTask.ts | Adds hover task (ref-based). |
| packages/tasks/src/task/browser/tasks/BrowserForwardTask.ts | Adds forward-history task returning URL. |
| packages/tasks/src/task/browser/tasks/BrowserFillTask.ts | Adds fill task (ref or label). |
| packages/tasks/src/task/browser/tasks/BrowserExtractTextTask.ts | Adds extract-text task (currently config-driven ref). |
| packages/tasks/src/task/browser/tasks/BrowserExtractHtmlTask.ts | Adds extract-HTML task (config ref or selector). |
| packages/tasks/src/task/browser/tasks/BrowserEvaluateTask.ts | Adds JS evaluation task + entitlements. |
| packages/tasks/src/task/browser/tasks/BrowserCloseTask.ts | Adds session close task (disconnect + unregister). |
| packages/tasks/src/task/browser/tasks/BrowserCloseTabTask.ts | Adds close-tab task (config-driven tabId). |
| packages/tasks/src/task/browser/tasks/BrowserClickTask.ts | Adds click task (ref or role+name) with optional modifiers. |
| packages/tasks/src/task/browser/tasks/BrowserBackTask.ts | Adds back-history task returning URL. |
| packages/tasks/src/task/browser/tasks/BrowserAttributeTask.ts | Adds attribute retrieval task (ref + attribute name). |
| packages/tasks/src/task/browser/register.ts | Registers all browser tasks into TaskRegistry. |
| packages/tasks/src/task/browser/PlaywrightBackend.ts | Implements IBrowserContext using Playwright (lazy optional dependency). |
| packages/tasks/src/task/browser/index.ts | Exposes browser types/registry/tasks via package exports. |
| packages/tasks/src/task/browser/IBrowserContext.ts | Defines the unified browser context interface and shared types. |
| packages/tasks/src/task/browser/ElectronBackend.ts | Implements IBrowserContext using Electron webContents.debugger + CDP. |
| packages/tasks/src/task/browser/BrowserSessionRegistry.ts | Adds global session map keyed by UUID. |
| packages/tasks/src/node.ts | Wires browser deps for Node entry point (Playwright + filesystem profile storage). |
| packages/tasks/src/common.ts | Exports browser APIs and registers browser tasks alongside common tasks. |
| packages/tasks/src/bun.ts | Wires browser deps for Bun entry point (Playwright + filesystem profile storage). |
| packages/task-graph/src/task/TaskEntitlements.ts | Adds new entitlements for browser automation actions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import { registerBrowserDeps } from "./util/BrowserTaskDeps"; | ||
| import { PlaywrightBackend } from "./task/browser/PlaywrightBackend"; |
There was a problem hiding this comment.
These import statements appear after executable code (registerMcpTaskDeps(...)). In ESM/TypeScript, static imports must be top-level; this will fail to parse/compile. Move the browser imports to the top of the file with the other imports, or switch to dynamic await import() inside the registration block.
| import { registerBrowserDeps } from "./util/BrowserTaskDeps"; | ||
| import { PlaywrightBackend } from "./task/browser/PlaywrightBackend"; |
There was a problem hiding this comment.
These import statements appear after executable code (registerMcpTaskDeps(...)). In ESM/TypeScript, static imports must be top-level; this will fail to parse/compile. Move the browser imports to the top of the file with the other imports, or switch to dynamic await import() inside the registration block.
| ): Promise<BrowserCloseTabTaskOutput> { | ||
| const ctx = BrowserSessionRegistry.get(input.sessionId); | ||
| await ctx.closeTab(this.config.tabId as string); | ||
| return { sessionId: input.sessionId }; |
There was a problem hiding this comment.
this.config.tabId is optional in the schema/typing, but it’s cast to string and passed to ctx.closeTab(...). If tabId is omitted this will pass undefined at runtime and likely throw. Either make tabId required (schema + type) or implement the “close current tab” behavior explicitly.
| if (!ref && this.config.selector) { | ||
| const found = await ctx.querySelector(this.config.selector); | ||
| if (found) { | ||
| ref = found; | ||
| } | ||
| } | ||
| const html = await ctx.innerHTML(ref as string); |
There was a problem hiding this comment.
If neither config.ref nor config.selector is provided, ref remains undefined but is cast to string and passed into ctx.innerHTML(...), which will throw at runtime. Validate inputs and throw a clear error (or support a full-page fallback) instead of using as string.
| if (!ref && this.config.selector) { | |
| const found = await ctx.querySelector(this.config.selector); | |
| if (found) { | |
| ref = found; | |
| } | |
| } | |
| const html = await ctx.innerHTML(ref as string); | |
| if (!ref) { | |
| if (!this.config.selector) { | |
| throw new Error("BrowserExtractHtmlTask requires either config.ref or config.selector"); | |
| } | |
| const found = await ctx.querySelector(this.config.selector); | |
| if (!found) { | |
| throw new Error( | |
| `BrowserExtractHtmlTask could not find an element matching selector: ${this.config.selector}` | |
| ); | |
| } | |
| ref = found; | |
| } | |
| const html = await ctx.innerHTML(ref); |
| _executeContext: IExecuteContext | ||
| ): Promise<BrowserExtractTextTaskOutput> { | ||
| const ctx = BrowserSessionRegistry.get(input.sessionId); | ||
| const text = await ctx.textContent(this.config.ref as string); |
There was a problem hiding this comment.
config.ref is optional but is cast to string and passed into ctx.textContent(...). This will throw when ref isn’t set, and it also contradicts the schema description that implies a full-page fallback. Add a real fallback (e.g. extract from page content/body) or make ref required and remove the “full page” claim.
| const text = await ctx.textContent(this.config.ref as string); | |
| const selector = this.config.ref ?? "body"; | |
| const text = await ctx.textContent(selector); |
| override async execute( | ||
| input: BrowserPressKeyTaskInput, | ||
| _executeContext: IExecuteContext | ||
| ): Promise<BrowserPressKeyTaskOutput> { | ||
| const ctx = BrowserSessionRegistry.get(input.sessionId); | ||
| await ctx.pressKey(input.key, this.config.modifiers as any); |
There was a problem hiding this comment.
pressKey() accepts WaitOptions, but this passes modifiers (an array) via as any, so modifiers are ignored/mis-typed and callers won’t get the behavior implied by the config schema. Either encode modifiers into the key string (e.g. "Control+Enter") and keep the IBrowserContext signature, or extend IBrowserContext.pressKey to accept modifiers explicitly and update both backends accordingly.
| override async execute( | |
| input: BrowserPressKeyTaskInput, | |
| _executeContext: IExecuteContext | |
| ): Promise<BrowserPressKeyTaskOutput> { | |
| const ctx = BrowserSessionRegistry.get(input.sessionId); | |
| await ctx.pressKey(input.key, this.config.modifiers as any); | |
| private buildKeyChord(key: string): string { | |
| const modifiers = this.config.modifiers?.filter(Boolean) ?? []; | |
| return modifiers.length > 0 ? `${modifiers.join("+")}+${key}` : key; | |
| } | |
| override async execute( | |
| input: BrowserPressKeyTaskInput, | |
| _executeContext: IExecuteContext | |
| ): Promise<BrowserPressKeyTaskOutput> { | |
| const ctx = BrowserSessionRegistry.get(input.sessionId); | |
| await ctx.pressKey(this.buildKeyChord(input.key)); |
| if (name) { | ||
| return `getByRole:${role}:${name}`; | ||
| } | ||
| // Roles that are typically text nodes | ||
| if (role === "text" || role === "StaticText") { | ||
| return `getByText:${name}`; | ||
| } |
There was a problem hiding this comment.
The text-node handling is unreachable: when name is non-empty (typical for "text"/"StaticText" nodes), the function returns getByRole before checking the role. This can produce locators like getByRole:text:... that Playwright may not support. Check for text roles before the if (name) early return, and generate a getByText locator when appropriate.
| if (name) { | |
| return `getByRole:${role}:${name}`; | |
| } | |
| // Roles that are typically text nodes | |
| if (role === "text" || role === "StaticText") { | |
| return `getByText:${name}`; | |
| } | |
| // Roles that are typically text nodes | |
| if (role === "text" || role === "StaticText") { | |
| return `getByText:${name}`; | |
| } | |
| if (name) { | |
| return `getByRole:${role}:${name}`; | |
| } |
| executeContext.updateProgress(20, "Navigated to login page"); | ||
|
|
||
| switch (mode) { | ||
| case "manual": | ||
| executeContext.updateProgress(50, "Waiting for manual login..."); |
There was a problem hiding this comment.
executeContext.updateProgress(...) returns a Promise and is awaited throughout the codebase; here it’s called without await, which can lead to unhandled rejections or progress updates arriving out of order. Await this call.
| executeContext.updateProgress(20, "Navigated to login page"); | |
| switch (mode) { | |
| case "manual": | |
| executeContext.updateProgress(50, "Waiting for manual login..."); | |
| await executeContext.updateProgress(20, "Navigated to login page"); | |
| switch (mode) { | |
| case "manual": | |
| await executeContext.updateProgress(50, "Waiting for manual login..."); |
| executeContext.updateProgress(20, "Navigated to login page"); | ||
|
|
||
| switch (mode) { | ||
| case "manual": | ||
| executeContext.updateProgress(50, "Waiting for manual login..."); | ||
| // Placeholder for future HumanInputTask integration | ||
| console.log("BrowserLoginTask: manual login mode — waiting for user to complete login"); |
There was a problem hiding this comment.
Within the manual login branch, executeContext.updateProgress(...) should be awaited (it returns a Promise). Also, using console.log inside a task introduces uncontrolled stdout noise in library code; prefer the project’s logging facilities (or rely on progress updates) instead.
| executeContext.updateProgress(20, "Navigated to login page"); | |
| switch (mode) { | |
| case "manual": | |
| executeContext.updateProgress(50, "Waiting for manual login..."); | |
| // Placeholder for future HumanInputTask integration | |
| console.log("BrowserLoginTask: manual login mode — waiting for user to complete login"); | |
| await executeContext.updateProgress(20, "Navigated to login page"); | |
| switch (mode) { | |
| case "manual": | |
| await executeContext.updateProgress(50, "Waiting for manual login..."); | |
| // Placeholder for future HumanInputTask integration |
|
@copilot apply changes based on the comments in this thread |
…zation' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…caping or encoding' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…me correctness Agent-Logs-Url: https://github.com/workglow-dev/workglow/sessions/8edcaa84-2963-4012-b73d-404ed1bb396b Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>
All review comments applied in c79b3bd:
CodeQL scan now shows 0 alerts. |
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
- Updated the task registration process by replacing the previous implementation with streamlined server functions: `registerMcpTaskDepsServer` and `registerBrowserDepsServer`. - Adjusted the TypeScript configuration to include all source files under the `src` directory for better coverage.
…nd entitlements - Fix path traversal in profile storage by validating projectId/profileName - Fix ElectronBackend snapshot() to keep monotonic ref counter across snapshots - Fix querySelector/querySelectorAll to use DOM.getDocument for root nodeId - Fix download() with os.tmpdir(), timeout support, correct will-download signature - Fix onDialog() to use CDP Page.javascriptDialogOpening instead of non-existent event - Add FILESYSTEM_READ entitlement to BrowserUploadTask - Rewrite fillByLabel() with CDP-first approach and proper JS fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6829ea4 to
ed740c9
Compare
- Added Playwright as an optional dependency in package.json and bun.lock. - Exported PlaywrightBackend from bun.ts and node.ts to facilitate browser automation tasks. - Updated index.ts to ensure proper export of IBrowserContext for better integration. This integration enhances the capabilities of the browser tasks by leveraging Playwright's features.
When BrowserSessionTask creates a browser session, it now registers a cleanup function on the executeContext's ResourceScope. This ensures the browser context is properly disconnected and unregistered from the BrowserSessionRegistry when the scope is disposed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After AI task execution, register a disposer on the ResourceScope that calls the provider's UnloadModelTask run function for the specific model. This lets callers unload models when they are no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…1 and undici to 8.1.0 - Bump version of @typescript/native-preview to 7.0.0-dev.20260413.1 in bun.lock, package.json, and tasks package.json. - Update undici dependency from 8.0.2 to 8.1.0 in tasks package.json.
Design spec for eliminating duplicated tool-call parsing code across HFT and LlamaCpp providers, removing FunctionGemma support, and consolidating shared utilities into ToolCallParsers.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9-task plan covering: delete dead HFT_ToolParser, remove FunctionGemma from all files and tests, move shared utilities to ToolCallParsers.ts, update HFT and LlamaCpp consumers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove 918-line stale duplicate parser — zero imports across the codebase. HFT_ToolCalling.ts already uses the shared ToolCallParsers.ts instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove all FunctionGemma detection, prompt-building helpers, parsers, buildRawCompletionPrompt, supportsNativeFunctions, and truncateAtTurnBoundary from LlamaCpp_ToolParser.ts as part of the FunctionGemma support removal refactor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…LlamaCpp_ToolCalling Fix broken imports from LlamaCpp_ToolParser (buildRawCompletionPrompt, supportsNativeFunctions, truncateAtTurnBoundary were deleted in Task 3) and remove all dead code that depended on them. Both the non-streaming and streaming functions now use only the LlamaChat code path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rsers Add canonical shared versions of extractMessageText, toolChoiceForcesToolCall, forcedToolSelection, resolveParsedToolName, and adaptParserResult to ToolCallParsers.ts so HFT and LlamaCpp consumers can import from one place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove local duplicate definitions of adaptParserResult, forcedToolSelection, and extractMessageText from HFT_ToolCalling.ts; import them from ToolCallParsers.ts. Also fix pre-existing build errors in LlamaCpp_ToolCalling.ts and LlamaCpp_ToolParser.ts where toolChoiceForcesToolCall was incorrectly imported from LlamaCpp_ToolParser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a new module `registerProvider.ts` containing shared functions for registering AI providers in both worker and inline contexts. Update existing provider registration functions for Anthropic, Chrome, Gemini, Hugging Face, LlamaCpp, and Ollama to utilize these new shared utilities, improving code consistency and reducing duplication.
Adds KbToDocumentsTask to the AI task registry to list documents from a knowledge base, optionally filtering to only those without chunks (stale). Also relaxes dimension validation in upsertChunk/upsertChunksBulk to skip the check when dimensions are 0 (unset). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Modified KbToDocumentsTask to utilize DocumentNode for the documentTree structure, enhancing type safety. Adjusted the output schema to reflect this change and streamlined the export function for better readability.
… primary key support Updated KnowledgeBaseRepository to support generic schema and primary key types, improving type safety and flexibility. Adjusted method signatures for add, remove, get, and enumerate functions to utilize the new generic types, ensuring better integration with various schema definitions.
…hema and primary key support" This reverts commit f4b8137.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…il in HFT and Ollama ModelInfoTasks
…torage classes Updated various storage classes to replace the generic vector constructor type with a direct reference to TypedArrayConstructor. This change enhances type consistency and simplifies the constructor signatures for better clarity and maintainability.
| executeContext: IExecuteContext | ||
| ): Promise<BrowserLoginTaskOutput> { | ||
| const parsed = new URL(input.url, "https://placeholder"); | ||
| if (parsed.protocol === "javascript:") { |
| _executeContext: IExecuteContext | ||
| ): Promise<BrowserNavigateTaskOutput> { | ||
| const parsed = new URL(input.url, "https://placeholder"); | ||
| if (parsed.protocol === "javascript:") { |
Summary
IBrowserContextinterface with pluggable backends for browser automation across all four Workglow environments (CLI, server, Electron, web browser)PlaywrightBackend(CLI/server/cloud) andElectronBackend(desktop) with accessibility-first element interactionregisterBrowserDeps()dependency injection in bun.ts/node.ts entry pointsArchitecture
IBrowserContext— thin interface all backends implement (accessibility snapshots, element refs, CSS selector escape hatch, JS evaluation)BrowserSessionRegistry— global session map keyed by UUID, shared across task executions in a workflow runBrowserTaskDeps— dependency injection following the existingMcpTaskDepspatternPlaywrightBackend— lazy-imports Playwright (optional peer dep), supports local launch and cloud CDP connectionElectronBackend— uses nativewebContents.debuggerfor CDP,session.fromPartition()for per-project cookie isolationIBrowserContextoperation with proper schemas, entitlements, and sessionId passthroughTest Plan
npx tsc --noEmit)🤖 Generated with Claude Code