qcli — AI CLI

A minimal, extensible AI coding assistant for the terminal. Built in Go with a clean ReAct / Agent Loop, pluggable LLM backends, and a sandboxed skill system. Inspired by Claude Code and the OpenAI Assistants function-calling protocol.

Status: v0.7 — Tool calling, ESC cancel, skill sandbox, Plan C hybrid streaming, non-interactive -chat mode, and a 27-case test suite are stable. See doc/PROGRESS.md for the full history.

Highlights

Agent Loop — full ReAct: User → LLM → tool_use → Skill → result → LLM → … until the model returns text.
Pluggable LLM adapters — OpenAI-compatible HTTP (/v1/chat/completions) and a Mock for tests. Anthropic is a stub.
Streaming + NonStreaming paths; the agent loop uses NonStreaming to avoid SSE race conditions when arguments are still in flight.
Sandboxed skills — file (read / write / edit) and shell honor a path policy loaded from config.yaml (built-in default denylist for SSH keys, AWS creds, Windows system dirs, /etc, …).
Diff-style file edit — file skill's edit operation sends only search_text + new_text instead of the whole file, saving tokens on large-file edits.
ESC truly cancels the in-flight request — the goroutine is told to stop and the user is returned to the prompt within 2 s.
Config layering — config.yaml → environment variables → CLI flags. Predictable, debuggable.
Prompts loaded from .md files — change qcli's behaviour (output style, workflow, project conventions) without recompiling; see prompts/README.md.
79 unit tests, including a wire-format regression test that pins the nested tool_calls[].function.* shape required by OpenAI / MiniMax.
Non-interactive -chat mode — run a single turn end-to-end, write to stdout, exit. Useful for scripting and smoke tests. Honors Plan C streaming via -streaming.

Quick Start

cd qcli
go build -o qcli.exe ./cmd/qcli      # build
./qcli.exe --provider=mock           # run with the offline mock LLM
./qcli.exe                           # run with config.yaml defaults

The first run reads config.yaml from the working directory. Edit it to point at your preferred OpenAI-compatible endpoint (MiniMax, DeepSeek, OpenRouter, local llama.cpp, …).

A typical session

> Run a shell command to list files in the current directory. Use the tool.
<model reasons, then calls>
[TOOL] calling skill "shell" with input {"command":"ls -la"}
[TOOL] shell result: total 12 ...
<model produces a summary>
> write a fun fact to tmp/fun_facts.txt
<model calls the file skill>
[TOOL] calling skill "file" with input {"operation":"write","path":"tmp/fun_facts.txt",...}
[TOOL] file result: File written successfully
> exit
Goodbye!

Press ESC to interrupt a long-running turn.

Non-interactive (`-chat`) mode

For scripts, smoke tests, and one-shot queries, -chat runs a single turn end-to-end, writes the model's reply to stdout, and exits — no TUI, no log noise on stdout. Tool calls are printed as [tool: name] [args: ...] [result: ...].

# Simple question (NonStreaming by default):
./qcli.exe -provider mock -chat "What is 2+2?"

# Force streaming so text appears incrementally:
./qcli.exe -provider mock -chat "Tell me a story" -streaming

# Tool call path — the shell tool runs and the result is fed back:
./qcli.exe -provider mock -chat "Please use a tool"

CLI Flags

Flag	Default	Description
`--provider`	`mock` (if no `provider:` in `config.yaml`)	LLM backend: `openai`, `mock`. (Anthropic currently falls back to mock.)
`--model`	from config / `OPENAI_MODEL`	Model name. E.g. `gpt-4o`, `MiniMax-M2.7-highspeed`.
`--base-url`	from config / `OPENAI_BASE_URL`	API base URL.
`--api-key`	from config / `OPENAI_API_KEY`	API key (overrides env).
`--debug`	`false`	Set log level to DEBUG.
`--log-file`	from config / `app.log`	Where to write the structured log.
`--config`	`config.yaml`	Path to the config file.
`--chat "<msg>"`	(TUI mode)	Non-interactive: run a single turn with `<msg>` and exit. Skips the TUI. Supports streaming. Useful for scripting and smoke tests.
`--streaming`	`false`	In `-chat` mode, force-enable Plan C streaming (overrides `llm.streaming` in config).

--debug and the config's log_level: debug are equivalent.

Environment variables

Variable	Effect
`OPENAI_API_KEY`	OpenAI API key.
`OPENAI_MODEL`	OpenAI model name.
`OPENAI_BASE_URL`	OpenAI base URL (for any OpenAI-compatible service).
`ANTHROPIC_API_KEY`	Reserved for future Anthropic adapter.
`PROVIDER`, `MODEL`, `BASE_URL`, `LOG_FILE`, `DEBUG`	Mirror the corresponding flags.

Config priority

CLI flag > environment variable > config.yaml (highest first).

`config.yaml`

# LLM provider
provider: "openai"
debug: true
log_file: "app.log"
log_level: debug
log_append: true            # true=append, false=truncate on startup

# OpenAI-compatible backend
openai:
  api_key: "sk-..."         # or set OPENAI_API_KEY env
  model: "gpt-4o"
  base_url: ""              # leave empty for api.openai.com

# Anthropic (stub — falls back to mock until the real adapter lands)
anthropic:
  api_key: ""
  model: "claude-3-5-sonnet-20241022"

# Skill sandbox — gates file writes and locks shell cwd.
# If this block is absent, the built-in default denylist is used.
# Set deny_write: [] to disable the sandbox entirely.
#
# sandbox:
#   base_dir: ""            # empty = process cwd
#   deny_write: []          # replace defaults with your own list
#   allow_write: []         # patterns that override deny_write

Built-in default denylist

When no sandbox: block is present, the following patterns are denied for writes (reads are unrestricted):

~/.ssh/**        ~/.aws/**       ~/.gnupg/**     ~/.kube/**
~/.docker/config.json   ~/.npmrc    ~/.pypirc       ~/.netrc
C:\Windows\**    C:\Program Files\**    C:\Program Files (x86)\**    C:\ProgramData\**
/etc/**          /var/log/**    /boot/**        /private/etc/**

Patterns support ~ (home directory) and ** (any-depth match). The allow_write list can punch holes back open.

Customizing the prompt

All system-prompt text is loaded from .md files at startup, so you can change qcli's behaviour without recompiling. The configuration is a directory pointed at by prompts.dir:

prompts:
  dir: "./prompts"

If prompts.dir is empty, qcli uses the built-in default fragments embedded in the binary (v0.5-era terse style). When set, qcli reads system.md, style.md, and workflow.md from the directory; each file becomes one system role message. Missing files fall back to the embedded default silently.

The shipped prompts/ directory is editable — change the 6 style principles, add project-specific workflow steps, etc. Restart qcli to pick up changes. See prompts/README.md for the full lookup order and customization recipe.

Available template variables in .md files: {{.OS}}, {{.ARCH}}, {{.CWD}}, {{.HOME}}, {{.GOVERSION}}, {{.SKILLS}}.

Project Structure

qcli/
├── cmd/
│   ├── qcli/main.go              # entry point, flag parsing, DI
│   └── dump-req/main.go          # wire-format diagnostic tool
├── internal/
│   ├── agent/loop.go             # core ReAct loop (+ loop_test.go)
│   ├── config/config.go          # YAML + env loader
│   ├── llm/
│   │   ├── adapter.go            # Adapter interface + types
│   │   ├── openai.go             # OpenAI HTTP / SSE adapter
│   │   ├── mock.go               # offline mock
│   │   ├── openai_test.go        # 5 cases incl. wire-format regression
│   │   └── mock_test.go          # 4 cases
│   ├── skill/
│   │   ├── skill.go              # Skill interface
│   │   ├── registry.go           # global registry (+ Unregister for tests)
│   │   ├── shell/shell.go        # shell command execution
│   │   ├── file/file.go          # file read / write
│   │   └── policy/
│   │       ├── policy.go         # PathPolicy + glob matcher
│   │       └── policy_test.go    # 8 cases
│   ├── logging/logger.go         # structured logger, file-backed
│   └── ui/
│       ├── tui.go                # console TUI + per-turn ctx cancel
│       ├── esc_windows.go        # GetAsyncKeyState polling
│       └── esc_unix.go           # no-op stub
├── config.yaml
├── PROGRESS.md                   # detailed dev history
└── README.md                     # you are here

Common Commands

Build

go build -o qcli.exe ./cmd/qcli      # main binary
go build -o dump-req.exe ./cmd/dump-req  # wire-format diagnostic
go build ./...                       # all packages

Test

go test ./...                        # all tests
go test -v ./internal/agent/...      # one package, verbose
go test -run TestOpenAI_NonStreaming_NestedToolCall_WireFormat ./internal/llm/...

The wire-format test is the one to re-run if you change anything in internal/llm/types.go or the adapter's serialization path. It's the regression test for the bug that previously caused MiniMax to return HTTP 500 on every second turn.

Vet

go vet ./...

Run

./qcli.exe                            # use config.yaml
./qcli.exe --provider=mock            # offline test
./qcli.exe --debug                    # enable DEBUG logging to app.log
./qcli.exe --config=/path/to/other.yaml
echo "list files" | ./qcli.exe        # one-shot via stdin

Run the diagnostic

cmd/dump-req is a standalone tool that bypasses the TUI and Agent Loop, hard-codes a 2-turn tool_call sequence, and dumps the full request and response bodies to stdout. Use it whenever an OpenAI-compatible API misbehaves — it's the fastest way to localize wire-format issues.

go build -o dump-req.exe ./cmd/dump-req
./dump-req.exe

Development

Adding a new skill

Create internal/skill/<name>/<name>.go.

Implement the Skill interface from internal/skill/skill.go:

type Skill interface {
    Name() string
    Description() string
    Execute(ctx context.Context, input string) (string, error)
    ToolSchema() llm.ToolDefinition
}

Register it in cmd/qcli/main.go after the existing shell.New() and file.New() calls:
```
skill.Register(myskill.New())
```
(Optional) Implement internal/skill/policy-aware enforcement if your skill touches the filesystem. Use policy.Global().CanWrite(absPath).

Adding a new LLM adapter

Create a struct in internal/llm/ implementing the Adapter interface:

type Adapter interface {
    Stream(ctx context.Context, messages []Message, tools []ToolDefinition) (<-chan Chunk, context.CancelFunc, error)
    NonStreaming(ctx context.Context, messages []Message, tools []ToolDefinition) ([]ToolCall, string, error)
    Name() string
}

The Message, ToolCall, Chunk, and ToolDefinition types in internal/llm/types.go are provider-agnostic; serialize them to your backend's wire format.
Wire the new adapter into cmd/qcli/main.go (in the switch effectiveProvider block).

Code style

No external HTTP clients beyond the standard library + gopkg.in/yaml.v3 for config parsing. SSE is parsed by hand using bufio.Scanner.
Skills must not panic on bad input — return an error string and let the LLM decide what to do next.
exec.CommandContext (not exec.Command) so ESC and context cancel propagate to child processes.

Debugging tips

Set log_level: debug and log_file: app.log in config.yaml (or pass --debug). Every HTTP request body and response is logged.
The first 800 bytes of each request body are dumped to the log. For multi-turn tool calls the body can exceed this; use ./dump-req.exe for full-body inspection.
If the model emits valid tool calls but the next turn fails, the TestOpenAI_NonStreaming_NestedToolCall_WireFormat test should still pass. If it doesn't, you've regressed the wire format.

Packaging

Cross-platform

The standard Go toolchain builds for any target:

GOOS=linux   GOARCH=amd64 go build -o dist/qcli-linux-amd64    ./cmd/qcli
GOOS=darwin  GOARCH=arm64 go build -o dist/qcli-darwin-arm64   ./cmd/qcli
GOOS=windows GOARCH=amd64 go build -o dist/qcli-windows-amd64.exe ./cmd/qcli

There are no cgo dependencies and no system calls beyond exec and user32.dll (Windows-only, used for ESC key polling). The binary is a single static executable.

Embedding the config

config.yaml is read at runtime, not embedded. To ship a single-file binary, copy config.yaml next to the executable, or use go:embed (not currently used). For development the cwd-relative path is the most convenient.

Vendoring (optional)

go mod vendor        # populate ./vendor (already done in this repo)
go build -mod=vendor ./cmd/qcli

This repo's vendor/ directory is checked in to make building in air-gapped sandboxes possible.

Versioning

git describe --tags is the canonical source. There's no VERSION file or build-time ldflags injection yet. If you ship binaries, tag with git tag v0.5 and git describe will yield something like v0.5-3-g82b933e.

Logging

Each turn produces timestamped, source-tagged output. With log_level: debug, the file shows the full request / response exchange:

[2026-06-11 18:00:00.000] [INFO ] [USER] list files
[2026-06-11 18:00:00.001] [DEBUG] [AGENT] messages to LLM (1), tools (2)
[2026-06-11 18:00:00.001] [DEBUG] [OPENAI] request body: {"messages":[...]}
[2026-06-11 18:00:01.234] [INFO ] [LLM ] ...model output...
[2026-06-11 18:00:01.235] [INFO ] [TOOL] calling skill "shell" with input {"command":"ls"}
[2026-06-11 18:00:01.500] [RESULT] [TOOL] shell result: ...
[2026-06-11 18:00:02.000] [INFO ] [LLM ] final text answer

The user's StreamWrite output (the model's text) is interleaved on stdout without timestamps so the terminal stays clean.

Architecture Notes

Why NonStreaming for tool calls?

Stream returns chunks as soon as they arrive. When the LLM emits a tool_call finish reason, the model might still be streaming function.arguments in the next delta. Cancelling on finish_reason would truncate the arguments. The current implementation uses NonStreaming for tool calls and a separate (currently unused) Stream path for future use. See internal/llm/openai.go and the TestOpenAI_Stream_SSEBasicText test for the SSE parser.

Why a global skill registry?

A CLI is a single process with a single config. There's no use case for per-request skill injection. The global registry is a deliberate simplicity choice. If you need plugin loading, see the Unregister method added in v0.5 for test isolation — it would also support dynamic (un)load.

Why a `policy.Global()` singleton?

Same reason. The CLI loads its sandbox policy once at startup. A singleton is the simplest way to make the policy available to skills without a dependency-injection ceremony.

Testing

22 unit tests, all fast (no real network calls):

Package	File	Cases
`internal/agent`	`loop_test.go`	5 (multi-turn ReAct, cancel mid-flight, tool error, unknown skill, stripThinkTags)
`internal/llm`	`openai_test.go`	5 (happy text, tool call, HTTP 500, wire format, SSE basic)
`internal/llm`	`mock_test.go`	4 (no-tool, with-tool, after-tool, stream cancel)
`internal/skill/policy`	`policy_test.go`	8 (SSH denial, Windows system, /tmp allowed, allow overrides, tilde, abs path, doublestar, SetGlobal nil)

The wire-format test (TestOpenAI_NonStreaming_NestedToolCall_WireFormat) spins up an httptest.Server, sends a synthetic 2-turn message sequence, and asserts that tool_calls[0].function.name (not tool_calls[0].name) exists in the outgoing body. This is the test that would have caught the MiniMax HTTP 500 bug that took half a day to diagnose manually.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
cmd		cmd
doc		doc
internal		internal
prompts		prompts
tmp		tmp
.gitignore		.gitignore
README.md		README.md
config.example.yaml		config.example.yaml
go.mod		go.mod
go.sum		go.sum
readme_cn.md		readme_cn.md

Folders and files

Latest commit

History

Repository files navigation

qcli — AI CLI

Highlights

Quick Start

A typical session

Non-interactive (-chat) mode

CLI Flags

Environment variables

Config priority

config.yaml

Built-in default denylist

Customizing the prompt

Project Structure

Common Commands

Build

Test

Vet

Run

Run the diagnostic

Development

Adding a new skill

Adding a new LLM adapter

Code style

Debugging tips

Packaging

Cross-platform

Embedding the config

Vendoring (optional)

Versioning

Logging

Architecture Notes

Why NonStreaming for tool calls?

Why a global skill registry?

Why a policy.Global() singleton?

Testing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Non-interactive (`-chat`) mode

`config.yaml`

Why a `policy.Global()` singleton?

Packages