This Kernel app runs a lightweight computer-use agent powered by Moondream vision models, Groq fast LLM orchestration.
-
Get your API keys:
- Moondream: moondream.ai
- Groq: console.groq.com
-
Deploy the app:
kernel login
cp .env.example .env # Add your MOONDREAM_API_KEY and GROQ_API_KEY
kernel deploy main.py --env-file .envNatural-language query (Groq LLM orchestrates Moondream + Kernel):
kernel invoke python-moondream-cua cua-task --payload '{"query": "Navigate to https://example.com and describe the page"}'Structured steps (optional fallback for deterministic automation):
kernel invoke python-moondream-cua cua-task --payload '{
"steps": [
{"action": "navigate", "url": "https://example.com"},
{"action": "caption"},
{"action": "click", "target": "More information link", "retries": 4},
{"action": "type", "target": "Search input", "text": "kernel", "press_enter": true}
]
}'Each step is a JSON object with an action field. Supported actions:
navigate:{ "url": "https://..." }click:{ "target": "Button label or description" }type:{ "target": "Input field description", "text": "...", "press_enter": false }scroll:{ "direction": "down" }or{ "x": 0.5, "y": 0.5, "direction": "down" }query:{ "question": "Is there a login button?" }caption:{ "length": "short" | "normal" | "long" }wait:{ "seconds": 2.5 }key:{ "keys": "ctrl+l" }go_back,go_forward,search,open_web_browser
Optional step fields:
retries: override retry attempts for point/click/typeretry_delay_ms: wait between retriesx,y: normalized (0-1) or pixel coordinates to bypass Moondream pointing (pixel coords use detected screenshot size)
Add "record_replay": true to the payload to capture a video replay (paid Kernel plans only).
- The agent uses Moondream for visual reasoning and pointing.
- Kernel screenshots are PNG; Moondream queries are sent as base64 data URLs.
- The Groq LLM must output JSON actions; the agent repairs and parses JSON with json-repair.