A minimal, extensible AI coding assistant for the terminal. Built in Go with a clean ReAct / Agent Loop, pluggable LLM backends, and a sandboxed skill system. Inspired by Claude Code and the OpenAI Assistants function-calling protocol.
Status: v0.7 — Tool calling, ESC cancel, skill sandbox, Plan C hybrid streaming, non-interactive
-chatmode, and a 27-case test suite are stable. Seedoc/PROGRESS.mdfor the full history.
- Agent Loop — full ReAct:
User → LLM → tool_use → Skill → result → LLM → …until the model returns text. - Pluggable LLM adapters — OpenAI-compatible HTTP (
/v1/chat/completions) and a Mock for tests. Anthropic is a stub. - Streaming + NonStreaming paths; the agent loop uses NonStreaming to avoid SSE race conditions when arguments are still in flight.
- Sandboxed skills —
file(read / write / edit) andshellhonor a path policy loaded fromconfig.yaml(built-in default denylist for SSH keys, AWS creds, Windows system dirs,/etc, …). - Diff-style file edit —
fileskill'seditoperation sends onlysearch_text+new_textinstead of the whole file, saving tokens on large-file edits. - ESC truly cancels the in-flight request — the goroutine is told to stop and the user is returned to the prompt within 2 s.
- Config layering —
config.yaml→ environment variables → CLI flags. Predictable, debuggable. - Prompts loaded from
.mdfiles — change qcli's behaviour (output style, workflow, project conventions) without recompiling; seeprompts/README.md. - 79 unit tests, including a wire-format regression test that pins the nested
tool_calls[].function.*shape required by OpenAI / MiniMax. - Non-interactive
-chatmode — run a single turn end-to-end, write to stdout, exit. Useful for scripting and smoke tests. Honors Plan C streaming via-streaming.
cd qcli
go build -o qcli.exe ./cmd/qcli # build
./qcli.exe --provider=mock # run with the offline mock LLM
./qcli.exe # run with config.yaml defaultsThe first run reads config.yaml from the working directory. Edit it to
point at your preferred OpenAI-compatible endpoint (MiniMax, DeepSeek,
OpenRouter, local llama.cpp, …).
> Run a shell command to list files in the current directory. Use the tool.
<model reasons, then calls>
[TOOL] calling skill "shell" with input {"command":"ls -la"}
[TOOL] shell result: total 12 ...
<model produces a summary>
> write a fun fact to tmp/fun_facts.txt
<model calls the file skill>
[TOOL] calling skill "file" with input {"operation":"write","path":"tmp/fun_facts.txt",...}
[TOOL] file result: File written successfully
> exit
Goodbye!
Press ESC to interrupt a long-running turn.
For scripts, smoke tests, and one-shot queries, -chat runs a single
turn end-to-end, writes the model's reply to stdout, and exits — no
TUI, no log noise on stdout. Tool calls are printed as
[tool: name] [args: ...] [result: ...].
# Simple question (NonStreaming by default):
./qcli.exe -provider mock -chat "What is 2+2?"
# Force streaming so text appears incrementally:
./qcli.exe -provider mock -chat "Tell me a story" -streaming
# Tool call path — the shell tool runs and the result is fed back:
./qcli.exe -provider mock -chat "Please use a tool"| Flag | Default | Description |
|---|---|---|
--provider |
mock (if no provider: in config.yaml) |
LLM backend: openai, mock. (Anthropic currently falls back to mock.) |
--model |
from config / OPENAI_MODEL |
Model name. E.g. gpt-4o, MiniMax-M2.7-highspeed. |
--base-url |
from config / OPENAI_BASE_URL |
API base URL. |
--api-key |
from config / OPENAI_API_KEY |
API key (overrides env). |
--debug |
false |
Set log level to DEBUG. |
--log-file |
from config / app.log |
Where to write the structured log. |
--config |
config.yaml |
Path to the config file. |
--chat "<msg>" |
(TUI mode) | Non-interactive: run a single turn with <msg> and exit. Skips the TUI. Supports streaming. Useful for scripting and smoke tests. |
--streaming |
false |
In -chat mode, force-enable Plan C streaming (overrides llm.streaming in config). |
--debug and the config's log_level: debug are equivalent.
| Variable | Effect |
|---|---|
OPENAI_API_KEY |
OpenAI API key. |
OPENAI_MODEL |
OpenAI model name. |
OPENAI_BASE_URL |
OpenAI base URL (for any OpenAI-compatible service). |
ANTHROPIC_API_KEY |
Reserved for future Anthropic adapter. |
PROVIDER, MODEL, BASE_URL, LOG_FILE, DEBUG |
Mirror the corresponding flags. |
CLI flag > environment variable > config.yaml (highest first).
# LLM provider
provider: "openai"
debug: true
log_file: "app.log"
log_level: debug
log_append: true # true=append, false=truncate on startup
# OpenAI-compatible backend
openai:
api_key: "sk-..." # or set OPENAI_API_KEY env
model: "gpt-4o"
base_url: "" # leave empty for api.openai.com
# Anthropic (stub — falls back to mock until the real adapter lands)
anthropic:
api_key: ""
model: "claude-3-5-sonnet-20241022"
# Skill sandbox — gates file writes and locks shell cwd.
# If this block is absent, the built-in default denylist is used.
# Set deny_write: [] to disable the sandbox entirely.
#
# sandbox:
# base_dir: "" # empty = process cwd
# deny_write: [] # replace defaults with your own list
# allow_write: [] # patterns that override deny_writeWhen no sandbox: block is present, the following patterns are denied for
writes (reads are unrestricted):
~/.ssh/** ~/.aws/** ~/.gnupg/** ~/.kube/**
~/.docker/config.json ~/.npmrc ~/.pypirc ~/.netrc
C:\Windows\** C:\Program Files\** C:\Program Files (x86)\** C:\ProgramData\**
/etc/** /var/log/** /boot/** /private/etc/**
Patterns support ~ (home directory) and ** (any-depth match). The
allow_write list can punch holes back open.
All system-prompt text is loaded from .md files at startup, so
you can change qcli's behaviour without recompiling. The
configuration is a directory pointed at by prompts.dir:
prompts:
dir: "./prompts"If prompts.dir is empty, qcli uses the built-in default fragments
embedded in the binary (v0.5-era terse style). When set, qcli reads
system.md, style.md, and workflow.md from the directory;
each file becomes one system role message. Missing files fall
back to the embedded default silently.
The shipped prompts/ directory is editable — change the 6 style
principles, add project-specific workflow steps, etc. Restart
qcli to pick up changes. See prompts/README.md for the full
lookup order and customization recipe.
Available template variables in .md files: {{.OS}}, {{.ARCH}},
{{.CWD}}, {{.HOME}}, {{.GOVERSION}}, {{.SKILLS}}.
qcli/
├── cmd/
│ ├── qcli/main.go # entry point, flag parsing, DI
│ └── dump-req/main.go # wire-format diagnostic tool
├── internal/
│ ├── agent/loop.go # core ReAct loop (+ loop_test.go)
│ ├── config/config.go # YAML + env loader
│ ├── llm/
│ │ ├── adapter.go # Adapter interface + types
│ │ ├── openai.go # OpenAI HTTP / SSE adapter
│ │ ├── mock.go # offline mock
│ │ ├── openai_test.go # 5 cases incl. wire-format regression
│ │ └── mock_test.go # 4 cases
│ ├── skill/
│ │ ├── skill.go # Skill interface
│ │ ├── registry.go # global registry (+ Unregister for tests)
│ │ ├── shell/shell.go # shell command execution
│ │ ├── file/file.go # file read / write
│ │ └── policy/
│ │ ├── policy.go # PathPolicy + glob matcher
│ │ └── policy_test.go # 8 cases
│ ├── logging/logger.go # structured logger, file-backed
│ └── ui/
│ ├── tui.go # console TUI + per-turn ctx cancel
│ ├── esc_windows.go # GetAsyncKeyState polling
│ └── esc_unix.go # no-op stub
├── config.yaml
├── PROGRESS.md # detailed dev history
└── README.md # you are here
go build -o qcli.exe ./cmd/qcli # main binary
go build -o dump-req.exe ./cmd/dump-req # wire-format diagnostic
go build ./... # all packagesgo test ./... # all tests
go test -v ./internal/agent/... # one package, verbose
go test -run TestOpenAI_NonStreaming_NestedToolCall_WireFormat ./internal/llm/...The wire-format test is the one to re-run if you change anything in
internal/llm/types.go or the adapter's serialization path. It's the
regression test for the bug that previously caused MiniMax to return HTTP
500 on every second turn.
go vet ./..../qcli.exe # use config.yaml
./qcli.exe --provider=mock # offline test
./qcli.exe --debug # enable DEBUG logging to app.log
./qcli.exe --config=/path/to/other.yaml
echo "list files" | ./qcli.exe # one-shot via stdincmd/dump-req is a standalone tool that bypasses the TUI and Agent
Loop, hard-codes a 2-turn tool_call sequence, and dumps the full request
and response bodies to stdout. Use it whenever an OpenAI-compatible API
misbehaves — it's the fastest way to localize wire-format issues.
go build -o dump-req.exe ./cmd/dump-req
./dump-req.exe- Create
internal/skill/<name>/<name>.go. - Implement the
Skillinterface frominternal/skill/skill.go:type Skill interface { Name() string Description() string Execute(ctx context.Context, input string) (string, error) ToolSchema() llm.ToolDefinition }
- Register it in
cmd/qcli/main.goafter the existingshell.New()andfile.New()calls:skill.Register(myskill.New())
- (Optional) Implement
internal/skill/policy-aware enforcement if your skill touches the filesystem. Usepolicy.Global().CanWrite(absPath).
- Create a struct in
internal/llm/implementing theAdapterinterface:type Adapter interface { Stream(ctx context.Context, messages []Message, tools []ToolDefinition) (<-chan Chunk, context.CancelFunc, error) NonStreaming(ctx context.Context, messages []Message, tools []ToolDefinition) ([]ToolCall, string, error) Name() string }
- The
Message,ToolCall,Chunk, andToolDefinitiontypes ininternal/llm/types.goare provider-agnostic; serialize them to your backend's wire format. - Wire the new adapter into
cmd/qcli/main.go(in theswitch effectiveProviderblock).
- No external HTTP clients beyond the standard library +
gopkg.in/yaml.v3for config parsing. SSE is parsed by hand usingbufio.Scanner. - Skills must not panic on bad input — return an error string and let the LLM decide what to do next.
exec.CommandContext(notexec.Command) so ESC and context cancel propagate to child processes.
- Set
log_level: debugandlog_file: app.loginconfig.yaml(or pass--debug). Every HTTP request body and response is logged. - The first 800 bytes of each request body are dumped to the log. For
multi-turn tool calls the body can exceed this; use
./dump-req.exefor full-body inspection. - If the model emits valid tool calls but the next turn fails, the
TestOpenAI_NonStreaming_NestedToolCall_WireFormattest should still pass. If it doesn't, you've regressed the wire format.
The standard Go toolchain builds for any target:
GOOS=linux GOARCH=amd64 go build -o dist/qcli-linux-amd64 ./cmd/qcli
GOOS=darwin GOARCH=arm64 go build -o dist/qcli-darwin-arm64 ./cmd/qcli
GOOS=windows GOARCH=amd64 go build -o dist/qcli-windows-amd64.exe ./cmd/qcliThere are no cgo dependencies and no system calls beyond exec and
user32.dll (Windows-only, used for ESC key polling). The binary is a
single static executable.
config.yaml is read at runtime, not embedded. To ship a single-file
binary, copy config.yaml next to the executable, or use
go:embed (not currently used). For development the cwd-relative path
is the most convenient.
go mod vendor # populate ./vendor (already done in this repo)
go build -mod=vendor ./cmd/qcliThis repo's vendor/ directory is checked in to make building in
air-gapped sandboxes possible.
git describe --tags is the canonical source. There's no VERSION file
or build-time ldflags injection yet. If you ship binaries, tag with
git tag v0.5 and git describe will yield something like
v0.5-3-g82b933e.
Each turn produces timestamped, source-tagged output. With
log_level: debug, the file shows the full request / response exchange:
[2026-06-11 18:00:00.000] [INFO ] [USER] list files
[2026-06-11 18:00:00.001] [DEBUG] [AGENT] messages to LLM (1), tools (2)
[2026-06-11 18:00:00.001] [DEBUG] [OPENAI] request body: {"messages":[...]}
[2026-06-11 18:00:01.234] [INFO ] [LLM ] ...model output...
[2026-06-11 18:00:01.235] [INFO ] [TOOL] calling skill "shell" with input {"command":"ls"}
[2026-06-11 18:00:01.500] [RESULT] [TOOL] shell result: ...
[2026-06-11 18:00:02.000] [INFO ] [LLM ] final text answer
The user's StreamWrite output (the model's text) is interleaved on
stdout without timestamps so the terminal stays clean.
Stream returns chunks as soon as they arrive. When the LLM emits a
tool_call finish reason, the model might still be streaming
function.arguments in the next delta. Cancelling on finish_reason
would truncate the arguments. The current implementation uses
NonStreaming for tool calls and a separate (currently unused) Stream
path for future use. See internal/llm/openai.go and the
TestOpenAI_Stream_SSEBasicText test for the SSE parser.
A CLI is a single process with a single config. There's no use case for
per-request skill injection. The global registry is a deliberate
simplicity choice. If you need plugin loading, see the
Unregister method added in v0.5 for test isolation — it would also
support dynamic (un)load.
Same reason. The CLI loads its sandbox policy once at startup. A singleton is the simplest way to make the policy available to skills without a dependency-injection ceremony.
22 unit tests, all fast (no real network calls):
| Package | File | Cases |
|---|---|---|
internal/agent |
loop_test.go |
5 (multi-turn ReAct, cancel mid-flight, tool error, unknown skill, stripThinkTags) |
internal/llm |
openai_test.go |
5 (happy text, tool call, HTTP 500, wire format, SSE basic) |
internal/llm |
mock_test.go |
4 (no-tool, with-tool, after-tool, stream cancel) |
internal/skill/policy |
policy_test.go |
8 (SSH denial, Windows system, /tmp allowed, allow overrides, tilde, abs path, doublestar, SetGlobal nil) |
The wire-format test (TestOpenAI_NonStreaming_NestedToolCall_WireFormat)
spins up an httptest.Server, sends a synthetic 2-turn message sequence,
and asserts that tool_calls[0].function.name (not tool_calls[0].name)
exists in the outgoing body. This is the test that would have caught the
MiniMax HTTP 500 bug that took half a day to diagnose manually.
MIT.