Skip to content

aleimu/qcli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

qcli — AI CLI

A minimal, extensible AI coding assistant for the terminal. Built in Go with a clean ReAct / Agent Loop, pluggable LLM backends, and a sandboxed skill system. Inspired by Claude Code and the OpenAI Assistants function-calling protocol.

Status: v0.7 — Tool calling, ESC cancel, skill sandbox, Plan C hybrid streaming, non-interactive -chat mode, and a 27-case test suite are stable. See doc/PROGRESS.md for the full history.


Highlights

  • Agent Loop — full ReAct: User → LLM → tool_use → Skill → result → LLM → … until the model returns text.
  • Pluggable LLM adapters — OpenAI-compatible HTTP (/v1/chat/completions) and a Mock for tests. Anthropic is a stub.
  • Streaming + NonStreaming paths; the agent loop uses NonStreaming to avoid SSE race conditions when arguments are still in flight.
  • Sandboxed skillsfile (read / write / edit) and shell honor a path policy loaded from config.yaml (built-in default denylist for SSH keys, AWS creds, Windows system dirs, /etc, …).
  • Diff-style file editfile skill's edit operation sends only search_text + new_text instead of the whole file, saving tokens on large-file edits.
  • ESC truly cancels the in-flight request — the goroutine is told to stop and the user is returned to the prompt within 2 s.
  • Config layeringconfig.yaml → environment variables → CLI flags. Predictable, debuggable.
  • Prompts loaded from .md files — change qcli's behaviour (output style, workflow, project conventions) without recompiling; see prompts/README.md.
  • 79 unit tests, including a wire-format regression test that pins the nested tool_calls[].function.* shape required by OpenAI / MiniMax.
  • Non-interactive -chat mode — run a single turn end-to-end, write to stdout, exit. Useful for scripting and smoke tests. Honors Plan C streaming via -streaming.

Quick Start

cd qcli
go build -o qcli.exe ./cmd/qcli      # build
./qcli.exe --provider=mock           # run with the offline mock LLM
./qcli.exe                           # run with config.yaml defaults

The first run reads config.yaml from the working directory. Edit it to point at your preferred OpenAI-compatible endpoint (MiniMax, DeepSeek, OpenRouter, local llama.cpp, …).

A typical session

> Run a shell command to list files in the current directory. Use the tool.
<model reasons, then calls>
[TOOL] calling skill "shell" with input {"command":"ls -la"}
[TOOL] shell result: total 12 ...
<model produces a summary>
> write a fun fact to tmp/fun_facts.txt
<model calls the file skill>
[TOOL] calling skill "file" with input {"operation":"write","path":"tmp/fun_facts.txt",...}
[TOOL] file result: File written successfully
> exit
Goodbye!

Press ESC to interrupt a long-running turn.

Non-interactive (-chat) mode

For scripts, smoke tests, and one-shot queries, -chat runs a single turn end-to-end, writes the model's reply to stdout, and exits — no TUI, no log noise on stdout. Tool calls are printed as [tool: name] [args: ...] [result: ...].

# Simple question (NonStreaming by default):
./qcli.exe -provider mock -chat "What is 2+2?"

# Force streaming so text appears incrementally:
./qcli.exe -provider mock -chat "Tell me a story" -streaming

# Tool call path — the shell tool runs and the result is fed back:
./qcli.exe -provider mock -chat "Please use a tool"

CLI Flags

Flag Default Description
--provider mock (if no provider: in config.yaml) LLM backend: openai, mock. (Anthropic currently falls back to mock.)
--model from config / OPENAI_MODEL Model name. E.g. gpt-4o, MiniMax-M2.7-highspeed.
--base-url from config / OPENAI_BASE_URL API base URL.
--api-key from config / OPENAI_API_KEY API key (overrides env).
--debug false Set log level to DEBUG.
--log-file from config / app.log Where to write the structured log.
--config config.yaml Path to the config file.
--chat "<msg>" (TUI mode) Non-interactive: run a single turn with <msg> and exit. Skips the TUI. Supports streaming. Useful for scripting and smoke tests.
--streaming false In -chat mode, force-enable Plan C streaming (overrides llm.streaming in config).

--debug and the config's log_level: debug are equivalent.

Environment variables

Variable Effect
OPENAI_API_KEY OpenAI API key.
OPENAI_MODEL OpenAI model name.
OPENAI_BASE_URL OpenAI base URL (for any OpenAI-compatible service).
ANTHROPIC_API_KEY Reserved for future Anthropic adapter.
PROVIDER, MODEL, BASE_URL, LOG_FILE, DEBUG Mirror the corresponding flags.

Config priority

CLI flag > environment variable > config.yaml (highest first).


config.yaml

# LLM provider
provider: "openai"
debug: true
log_file: "app.log"
log_level: debug
log_append: true            # true=append, false=truncate on startup

# OpenAI-compatible backend
openai:
  api_key: "sk-..."         # or set OPENAI_API_KEY env
  model: "gpt-4o"
  base_url: ""              # leave empty for api.openai.com

# Anthropic (stub — falls back to mock until the real adapter lands)
anthropic:
  api_key: ""
  model: "claude-3-5-sonnet-20241022"

# Skill sandbox — gates file writes and locks shell cwd.
# If this block is absent, the built-in default denylist is used.
# Set deny_write: [] to disable the sandbox entirely.
#
# sandbox:
#   base_dir: ""            # empty = process cwd
#   deny_write: []          # replace defaults with your own list
#   allow_write: []         # patterns that override deny_write

Built-in default denylist

When no sandbox: block is present, the following patterns are denied for writes (reads are unrestricted):

~/.ssh/**        ~/.aws/**       ~/.gnupg/**     ~/.kube/**
~/.docker/config.json   ~/.npmrc    ~/.pypirc       ~/.netrc
C:\Windows\**    C:\Program Files\**    C:\Program Files (x86)\**    C:\ProgramData\**
/etc/**          /var/log/**    /boot/**        /private/etc/**

Patterns support ~ (home directory) and ** (any-depth match). The allow_write list can punch holes back open.

Customizing the prompt

All system-prompt text is loaded from .md files at startup, so you can change qcli's behaviour without recompiling. The configuration is a directory pointed at by prompts.dir:

prompts:
  dir: "./prompts"

If prompts.dir is empty, qcli uses the built-in default fragments embedded in the binary (v0.5-era terse style). When set, qcli reads system.md, style.md, and workflow.md from the directory; each file becomes one system role message. Missing files fall back to the embedded default silently.

The shipped prompts/ directory is editable — change the 6 style principles, add project-specific workflow steps, etc. Restart qcli to pick up changes. See prompts/README.md for the full lookup order and customization recipe.

Available template variables in .md files: {{.OS}}, {{.ARCH}}, {{.CWD}}, {{.HOME}}, {{.GOVERSION}}, {{.SKILLS}}.


Project Structure

qcli/
├── cmd/
│   ├── qcli/main.go              # entry point, flag parsing, DI
│   └── dump-req/main.go          # wire-format diagnostic tool
├── internal/
│   ├── agent/loop.go             # core ReAct loop (+ loop_test.go)
│   ├── config/config.go          # YAML + env loader
│   ├── llm/
│   │   ├── adapter.go            # Adapter interface + types
│   │   ├── openai.go             # OpenAI HTTP / SSE adapter
│   │   ├── mock.go               # offline mock
│   │   ├── openai_test.go        # 5 cases incl. wire-format regression
│   │   └── mock_test.go          # 4 cases
│   ├── skill/
│   │   ├── skill.go              # Skill interface
│   │   ├── registry.go           # global registry (+ Unregister for tests)
│   │   ├── shell/shell.go        # shell command execution
│   │   ├── file/file.go          # file read / write
│   │   └── policy/
│   │       ├── policy.go         # PathPolicy + glob matcher
│   │       └── policy_test.go    # 8 cases
│   ├── logging/logger.go         # structured logger, file-backed
│   └── ui/
│       ├── tui.go                # console TUI + per-turn ctx cancel
│       ├── esc_windows.go        # GetAsyncKeyState polling
│       └── esc_unix.go           # no-op stub
├── config.yaml
├── PROGRESS.md                   # detailed dev history
└── README.md                     # you are here

Common Commands

Build

go build -o qcli.exe ./cmd/qcli      # main binary
go build -o dump-req.exe ./cmd/dump-req  # wire-format diagnostic
go build ./...                       # all packages

Test

go test ./...                        # all tests
go test -v ./internal/agent/...      # one package, verbose
go test -run TestOpenAI_NonStreaming_NestedToolCall_WireFormat ./internal/llm/...

The wire-format test is the one to re-run if you change anything in internal/llm/types.go or the adapter's serialization path. It's the regression test for the bug that previously caused MiniMax to return HTTP 500 on every second turn.

Vet

go vet ./...

Run

./qcli.exe                            # use config.yaml
./qcli.exe --provider=mock            # offline test
./qcli.exe --debug                    # enable DEBUG logging to app.log
./qcli.exe --config=/path/to/other.yaml
echo "list files" | ./qcli.exe        # one-shot via stdin

Run the diagnostic

cmd/dump-req is a standalone tool that bypasses the TUI and Agent Loop, hard-codes a 2-turn tool_call sequence, and dumps the full request and response bodies to stdout. Use it whenever an OpenAI-compatible API misbehaves — it's the fastest way to localize wire-format issues.

go build -o dump-req.exe ./cmd/dump-req
./dump-req.exe

Development

Adding a new skill

  1. Create internal/skill/<name>/<name>.go.
  2. Implement the Skill interface from internal/skill/skill.go:
    type Skill interface {
        Name() string
        Description() string
        Execute(ctx context.Context, input string) (string, error)
        ToolSchema() llm.ToolDefinition
    }
  3. Register it in cmd/qcli/main.go after the existing shell.New() and file.New() calls:
    skill.Register(myskill.New())
  4. (Optional) Implement internal/skill/policy-aware enforcement if your skill touches the filesystem. Use policy.Global().CanWrite(absPath).

Adding a new LLM adapter

  1. Create a struct in internal/llm/ implementing the Adapter interface:
    type Adapter interface {
        Stream(ctx context.Context, messages []Message, tools []ToolDefinition) (<-chan Chunk, context.CancelFunc, error)
        NonStreaming(ctx context.Context, messages []Message, tools []ToolDefinition) ([]ToolCall, string, error)
        Name() string
    }
  2. The Message, ToolCall, Chunk, and ToolDefinition types in internal/llm/types.go are provider-agnostic; serialize them to your backend's wire format.
  3. Wire the new adapter into cmd/qcli/main.go (in the switch effectiveProvider block).

Code style

  • No external HTTP clients beyond the standard library + gopkg.in/yaml.v3 for config parsing. SSE is parsed by hand using bufio.Scanner.
  • Skills must not panic on bad input — return an error string and let the LLM decide what to do next.
  • exec.CommandContext (not exec.Command) so ESC and context cancel propagate to child processes.

Debugging tips

  • Set log_level: debug and log_file: app.log in config.yaml (or pass --debug). Every HTTP request body and response is logged.
  • The first 800 bytes of each request body are dumped to the log. For multi-turn tool calls the body can exceed this; use ./dump-req.exe for full-body inspection.
  • If the model emits valid tool calls but the next turn fails, the TestOpenAI_NonStreaming_NestedToolCall_WireFormat test should still pass. If it doesn't, you've regressed the wire format.

Packaging

Cross-platform

The standard Go toolchain builds for any target:

GOOS=linux   GOARCH=amd64 go build -o dist/qcli-linux-amd64    ./cmd/qcli
GOOS=darwin  GOARCH=arm64 go build -o dist/qcli-darwin-arm64   ./cmd/qcli
GOOS=windows GOARCH=amd64 go build -o dist/qcli-windows-amd64.exe ./cmd/qcli

There are no cgo dependencies and no system calls beyond exec and user32.dll (Windows-only, used for ESC key polling). The binary is a single static executable.

Embedding the config

config.yaml is read at runtime, not embedded. To ship a single-file binary, copy config.yaml next to the executable, or use go:embed (not currently used). For development the cwd-relative path is the most convenient.

Vendoring (optional)

go mod vendor        # populate ./vendor (already done in this repo)
go build -mod=vendor ./cmd/qcli

This repo's vendor/ directory is checked in to make building in air-gapped sandboxes possible.

Versioning

git describe --tags is the canonical source. There's no VERSION file or build-time ldflags injection yet. If you ship binaries, tag with git tag v0.5 and git describe will yield something like v0.5-3-g82b933e.


Logging

Each turn produces timestamped, source-tagged output. With log_level: debug, the file shows the full request / response exchange:

[2026-06-11 18:00:00.000] [INFO ] [USER] list files
[2026-06-11 18:00:00.001] [DEBUG] [AGENT] messages to LLM (1), tools (2)
[2026-06-11 18:00:00.001] [DEBUG] [OPENAI] request body: {"messages":[...]}
[2026-06-11 18:00:01.234] [INFO ] [LLM ] ...model output...
[2026-06-11 18:00:01.235] [INFO ] [TOOL] calling skill "shell" with input {"command":"ls"}
[2026-06-11 18:00:01.500] [RESULT] [TOOL] shell result: ...
[2026-06-11 18:00:02.000] [INFO ] [LLM ] final text answer

The user's StreamWrite output (the model's text) is interleaved on stdout without timestamps so the terminal stays clean.


Architecture Notes

Why NonStreaming for tool calls?

Stream returns chunks as soon as they arrive. When the LLM emits a tool_call finish reason, the model might still be streaming function.arguments in the next delta. Cancelling on finish_reason would truncate the arguments. The current implementation uses NonStreaming for tool calls and a separate (currently unused) Stream path for future use. See internal/llm/openai.go and the TestOpenAI_Stream_SSEBasicText test for the SSE parser.

Why a global skill registry?

A CLI is a single process with a single config. There's no use case for per-request skill injection. The global registry is a deliberate simplicity choice. If you need plugin loading, see the Unregister method added in v0.5 for test isolation — it would also support dynamic (un)load.

Why a policy.Global() singleton?

Same reason. The CLI loads its sandbox policy once at startup. A singleton is the simplest way to make the policy available to skills without a dependency-injection ceremony.


Testing

22 unit tests, all fast (no real network calls):

Package File Cases
internal/agent loop_test.go 5 (multi-turn ReAct, cancel mid-flight, tool error, unknown skill, stripThinkTags)
internal/llm openai_test.go 5 (happy text, tool call, HTTP 500, wire format, SSE basic)
internal/llm mock_test.go 4 (no-tool, with-tool, after-tool, stream cancel)
internal/skill/policy policy_test.go 8 (SSH denial, Windows system, /tmp allowed, allow overrides, tilde, abs path, doublestar, SetGlobal nil)

The wire-format test (TestOpenAI_NonStreaming_NestedToolCall_WireFormat) spins up an httptest.Server, sends a synthetic 2-turn message sequence, and asserts that tool_calls[0].function.name (not tool_calls[0].name) exists in the outgoing body. This is the test that would have caught the MiniMax HTTP 500 bug that took half a day to diagnose manually.


License

MIT.

About

ai+loop+tools+skill

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages