Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions edgeai/ondevice-eval-agent/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
**/__pycache__
**/*.pyc
**/*.pyo
.git
.gitignore
.pytest_cache
.venv
venv
tests/
frontend/node_modules
frontend/dist
19 changes: 19 additions & 0 deletions edgeai/ondevice-eval-agent/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Python
__pycache__/
*.py[cod]
*.egg-info/
.pytest_cache/
.venv/
venv/

# Node / frontend
frontend/node_modules/
frontend/dist/
frontend/tsconfig.tsbuildinfo

# Environment
.env
.env.local

# OS
.DS_Store
62 changes: 62 additions & 0 deletions edgeai/ondevice-eval-agent/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Build context: ondevice-eval-agent/
#
# Multi-stage build:
# stage 1 (node) — compile the React SPA (frontend/ → dist/)
# stage 2 (python) — install deps, copy backend, drop SPA dist into webapp/spa/
#
# The Python runtime serves both the API and the built SPA on :8080,
# so the whole app is a single image and a single port.

# ---------- Stage 1: build the React SPA ----------
# Pinned to BUILDPLATFORM so multi-arch builds compile the SPA natively
# (output is static JS/HTML/CSS — arch-neutral) instead of via qemu, which
# can crash esbuild.
FROM --platform=$BUILDPLATFORM node:20-alpine AS spa-builder

# Pin pnpm to a version that still supports Node 20. Without this, corepack
# auto-fetches the newest pnpm (11+) which requires Node 22's built-in
# `node:sqlite` and crashes on cold install.
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate

WORKDIR /spa

COPY frontend/package.json ./
RUN pnpm install --no-frozen-lockfile

COPY frontend/ ./
# Same-origin API — the Flask server serves the SPA and the /agent/* routes.
ENV VITE_API_BASE=""
RUN pnpm build


# ---------- Stage 2: Python runtime ----------
FROM python:3.11-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
libgl1 \
libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY client/ client/
COPY webapp/ webapp/

# Drop the built SPA where core.py (SPA_DIST) expects it.
COPY --from=spa-builder /spa/dist/ webapp/spa/

ENV MAX_STARTUP_WAIT=300 \
HEALTH_CHECK_INTERVAL=10 \
SPA_DIST=/app/webapp/spa \
PYTHONUNBUFFERED=1

EXPOSE 8080

HEALTHCHECK --interval=30s --timeout=5s --start-period=20s --retries=3 \
CMD curl -fsS http://localhost:8080/agent/status || exit 1

CMD ["python", "webapp/app.py"]
296 changes: 296 additions & 0 deletions edgeai/ondevice-eval-agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,296 @@
# ZEDEDA On-Device AI Agent - Client Container

Flask-based web application for ML model inference with an AI-powered assistant for model exploration and integration guidance.

## Features

- Web interface for image upload and classification
- Multi-model support with dynamic discovery
- Real-time processing logs
- API endpoints for programmatic access
- Customizable preprocessing
- **AI Agent for model exploration and integration guidance**

## AI Agent (Agentic Demo POC)

The business logic includes an intelligent AI assistant that helps developers understand and integrate with deployed ML models.

### Agent Capabilities

| Capability | Description |
|------------|-------------|
| **Model Discovery** | Identifies available models on Triton/OpenVINO inference servers |
| **Input Requirements** | Explains image formats, preprocessing, and camera feed recommendations |
| **Output Interpretation** | Describes model outputs (bounding boxes, labels, masks) and post-processing |
| **Integration Guidance** | Provides code examples for JavaScript, Python, React, and cURL |

### Example Questions

- "What model is currently running on the server?"
- "How should I structure the frontend/client logic to call this model?"
- "What images or camera feed characteristics will this model respond to reliably?"
- "How do I interpret the bounding box outputs from this detection model?"
- "Show me how to preprocess images for this model"

### Agent API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/agent/chat` | POST | Send a message to the AI agent |
| `/agent/status` | GET | Check if agent is enabled |

#### Chat Request Example

```bash
curl -X POST http://localhost:8080/agent/chat \
-H "Content-Type: application/json" \
-d '{"message": "What model is running?", "session_id": "my-session"}'
```

#### Response Format

```json
{
"success": true,
"response": "Agent's response text...",
"session_id": "my-session",
"enabled": true,
"tool_calls": [...],
"tokens": {"input": 150, "output": 200}
}
```

### Agent Tools

The agent has access to these tools for real-time model exploration:

| Tool | Purpose |
|------|---------|
| `list_available_models` | Discover all models on the inference server |
| `get_model_metadata` | Get detailed model specifications |
| `get_model_input_requirements` | Get preprocessing and input format guidance |
| `get_model_output_interpretation` | Understand model outputs and post-processing |
| `analyze_model_type` | Infer model type from tensor shapes |
| `get_server_status` | Check inference server health |
| `get_api_examples` | Get cURL commands for API testing |
| `get_frontend_integration_guide` | Get full integration code examples |

### Enabling the Agent

The agent supports multiple LLM backends. Set one of the following:

#### Option 1: Anthropic Claude (Recommended)

Best for reliable tool calling and high-quality responses.

```bash
export ANTHROPIC_API_KEY=sk-ant-your-key-here
export ANTHROPIC_MODEL=claude-sonnet-4-20250514 # optional
```

#### Option 2: OpenAI

Use GPT-4o or other OpenAI models.

```bash
export OPENAI_API_KEY=sk-your-key-here
export OPENAI_MODEL=gpt-4o # optional, defaults to gpt-4o
```

#### Option 3: Google Gemini

Use Gemini 1.5 Pro or other Google models.

```bash
export GOOGLE_API_KEY=your-key-here
export GOOGLE_MODEL=gemini-1.5-pro # optional
```

#### Option 4: Local LLM Server (OpenAI-Compatible)

Use Ollama, LM Studio, vLLM, or any OpenAI-compatible API.

```bash
export LLM_SERVER_URL=http://your-llm-server:1234
export LLM_MODEL_NAME=your-model-name # optional
export LLM_API_KEY=your-api-key # optional
```

**Server-specific examples:**
```bash
# Ollama
export LLM_SERVER_URL=http://localhost:11434
export LLM_MODEL_NAME=llama3.1

# LM Studio
export LLM_SERVER_URL=http://localhost:1234

# vLLM
export LLM_SERVER_URL=http://localhost:8000
```

> **Priority:** If multiple backends are configured, they are used in this order:
> Anthropic → OpenAI → Google → Local LLM Server

## Configuration

Environment variables:
- `MODEL_SERVER_URL`: URL of the inference server (Triton or OpenVINO)
- `ANTHROPIC_API_KEY`: Anthropic API key (for Claude backend)
- `ANTHROPIC_MODEL`: Claude model to use (default: `claude-sonnet-4-20250514`)
- `OPENAI_API_KEY`: OpenAI API key (for GPT backend)
- `OPENAI_MODEL`: OpenAI model to use (default: `gpt-4o`)
- `GOOGLE_API_KEY`: Google API key (for Gemini backend)
- `GOOGLE_MODEL`: Google model to use (default: `gemini-1.5-pro`)
- `LLM_SERVER_URL`: URL of OpenAI-compatible LLM server
- `LLM_MODEL_NAME`: Model name for OpenAI-compatible server (default: `local-model`)
- `LLM_API_KEY`: API key for OpenAI-compatible server (default: `not-needed`)
- `APP_TITLE`: Application title
- `APP_DESCRIPTION`: Application description
- `LOGO_URL`: URL for logo image
- `PRIMARY_COLOR`: Primary theme color (CSS)

## API Endpoints

- `GET /` - Web interface
- `GET /health` - Health check
- `GET /models` - List available models
- `POST /predict` - Run inference
- `GET /models/<name>/metadata` - Get model metadata
- `POST /agent/chat` - AI agent chat
- `GET /agent/status` - Agent status

## Customization

### Class Names

Edit `class_names.json` with your model's class labels:

```json
[
"cat",
"dog",
"bird",
...
]
```

### Preprocessing

Modify `client.py` to adjust preprocessing for your model:

```python
def preprocess_image(self, image_path, ...):
# Customize resize, normalization, etc.
```

## Local Development

```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Option 1: Use Anthropic Claude
export ANTHROPIC_API_KEY=sk-ant-your-key-here

# Option 2: Use local LLM server
export LLM_SERVER_URL=http://localhost:11434 # e.g., Ollama
export LLM_MODEL_NAME=llama3.1

python webapp/app.py
```

## Architecture

```
business-logic/
├── client.py # Model server client (Triton/OpenVINO)
├── requirements.txt # Python dependencies
├── Dockerfile # Container build
└── webapp/
├── app.py # Flask application
├── static/ # CSS, JS assets
├── templates/ # HTML templates
├── agent/ # Agent package (backward compatibility)
│ ├── tools.py # Re-exports from mcp package
│ └── prompts.py # LLM chat processing
├── router/ # LLM Router package
│ ├── config.py # Provider configuration
│ ├── llm_router.py # Multi-provider routing
│ └── adapters/ # Provider-specific adapters
├── inference/ # Inference client wrapper
└── mcp/ # MCP (Model Context Protocol) tools
├── base.py # Base utilities (ToolResult, ok, error_response)
├── session.py # Session storage management
├── registry.py # Tool registration and execution
└── tools/ # Individual tool modules
├── list_models.py
├── model_metadata.py
├── model_inputs.py
├── model_outputs.py
├── model_type.py
├── server_status.py
├── api_examples.py
├── integration_guide.py
└── recommendations.py
```

## MCP Package

The MCP (Model Context Protocol) package provides a modular tool framework for AI agent interactions with ML inference servers. Each tool is in its own file for easy maintenance and extension.

### Usage

```python
# Direct imports from mcp package
from mcp import execute_tool, TOOL_SCHEMAS, TOOL_FUNCTIONS
from mcp.tools import list_available_models, get_model_metadata

# Or use backward-compatible imports
from agent.tools import TOOL_SCHEMAS, execute_tool
```

### Adding New Tools

To add a new tool, create a file in `webapp/mcp/tools/`:

```python
# mcp/tools/my_new_tool.py
from ..base import ok, error_response, get_client
from ..registry import register_tool

def my_new_tool(param: str) -> Dict[str, Any]:
"""Tool implementation."""
try:
client = get_client()
# Your tool logic here
return ok(result="success", data={...})
except Exception as e:
return error_response(e, operation="my_new_tool")

# Auto-register the tool
register_tool(
name="my_new_tool",
func=my_new_tool,
description="Description for AI agent to understand when to use this tool",
input_schema={
"type": "object",
"properties": {
"param": {
"type": "string",
"description": "Parameter description"
}
},
"required": ["param"]
}
)
```

Then add the import to `mcp/tools/__init__.py`:

```python
from .my_new_tool import my_new_tool
```

The tool will be automatically available to the AI agent.
Loading