UFO² provides a flexible framework for application developers and users to enhance AppAgent capabilities for specific applications. AppAgent enhancement is about augmenting the existing AppAgent's capabilities through:
- Knowledge (help documents, demonstrations) to guide decision-making
- Native API tools (via MCP servers) for efficient automation
- Application-specific context for better understanding
The AppAgent can be enhanced through three complementary approaches:
| Component | Description | Tutorial | Implementation Guide |
|---|---|---|---|
| Help Documents | Provide application-specific guidance and instructions to help the agent understand tasks and workflows | Provision Guide | Learning from Help Documents |
| User Demonstrations | Supply recorded user interactions to teach the agent how to perform specific tasks through examples | Provision Guide | Learning from Demonstrations |
| Native API Tools | Create custom MCP action servers that wrap application COM APIs or other native interfaces for efficient automation | Wrapping Guide | Creating MCP Servers |
graph TB
Enhancement[AppAgent Enhancement Workflow]
Enhancement --> KnowledgeLayer[Knowledge Layer<br/>RAG-based]
Enhancement --> ToolLayer[Tool Layer<br/>MCP Servers]
KnowledgeLayer --> HelpDocs[Help<br/>Documents]
KnowledgeLayer --> DemoTraj[User<br/>Demonstrations]
ToolLayer --> UITools[UI Automation<br/>Tools]
ToolLayer --> APITools[Native API<br/>Tools]
HelpDocs --> EnhancedAgent[Enhanced AppAgent]
DemoTraj --> EnhancedAgent
UITools --> EnhancedAgent
APITools --> EnhancedAgent
style Enhancement fill:#e1f5ff,stroke:#01579b,stroke-width:3px
style KnowledgeLayer fill:#fff3e0,stroke:#e65100,stroke-width:2px
style ToolLayer fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
style HelpDocs fill:#fffde7,stroke:#f57f17,stroke-width:2px
style DemoTraj fill:#fffde7,stroke:#f57f17,stroke-width:2px
style UITools fill:#fce4ec,stroke:#880e4f,stroke-width:2px
style APITools fill:#fce4ec,stroke:#880e4f,stroke-width:2px
style EnhancedAgent fill:#e8f5e9,stroke:#1b5e20,stroke-width:3px
Use when:
- You have official documentation, tutorials, or guides for your application
- Tasks require domain-specific knowledge or procedures
- You want the agent to understand application concepts and terminology
Example: Providing Excel formula documentation to help the agent use advanced Excel functions correctly.
Use when:
- You can demonstrate the task yourself
- The task involves a specific sequence of UI interactions
- Visual/procedural knowledge is easier to show than describe
Example: Recording how to create a pivot table in Excel to teach the agent the exact steps.
Use when:
- Your application exposes COM APIs, REST APIs, or other programmable interfaces
- GUI automation is slow or unreliable for certain operations
- You need deterministic, high-performance automation
Example: Creating an MCP server that wraps Excel's COM API for inserting tables, formatting cells, etc.
!!!tip "Hybrid Approach for Best Results" Combine all three components for maximum effectiveness:
1. **Knowledge Foundation**: Provide help documents for conceptual understanding
2. **Procedural Learning**: Add demonstrations for complex workflows
3. **Efficient Execution**: Implement native API tools for performance-critical operations
The AppAgent will:
- Use knowledge to **understand** what to do
- Reference demonstrations to **learn** how to do it
- Leverage API tools when available for **efficient** execution
- Fall back to UI automation when needed
Follow the tutorials in order to enhance your AppAgent:
- Provide Help Documents - Start with knowledge
- Add User Demonstrations - Teach by example
- Wrap Native APIs - Enable efficient automation
- AppAgent Overview - Understanding AppAgent architecture
- Knowledge Substrate - How knowledge enhancement works
- Creating MCP Servers - Building custom automation tools
- MCP Configuration - Registering MCP servers with AppAgent
- Hybrid GUI–API Actions - Understanding dual-mode automation