Skip to content

Commit c963041

Browse files
committed
feat(core): implement model overrides and multi-model routing research
1 parent cf70dd1 commit c963041

6 files changed

Lines changed: 543 additions & 9 deletions

File tree

.gemini/settings.json

Lines changed: 54 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,58 @@
11
{
2+
"modelConfigs": {
3+
"overrides": [
4+
{
5+
"match": {
6+
"overrideScope": "planner"
7+
},
8+
"modelConfig": {
9+
"model": "gemini-3.1-pro-preview"
10+
}
11+
},
12+
{
13+
"match": {
14+
"overrideScope": "reviewer"
15+
},
16+
"modelConfig": {
17+
"model": "gemini-3.1-pro-preview"
18+
}
19+
},
20+
{
21+
"match": {
22+
"overrideScope": "coder"
23+
},
24+
"modelConfig": {
25+
"model": "gemini-2.5-flash-lite"
26+
}
27+
},
28+
{
29+
"match": {
30+
"overrideScope": "writer"
31+
},
32+
"modelConfig": {
33+
"model": "gemini-2.5-flash-lite"
34+
}
35+
}
36+
]
37+
},
238
"experimental": {
339
"enableAgents": true
440
},
541
"hooks": {
642
"BeforeAgent": [
743
{
844
"matcher": "*",
45+
"sequential": true,
946
"hooks": [
1047
{
1148
"name": "log-before-agent",
1249
"type": "command",
13-
"command": "python3 .gemini/hooks/log.py"
50+
"command": "python3 /home/apiad/Projects/personal/starter/.gemini/hooks/log.py"
51+
},
52+
{
53+
"name": "tier-router",
54+
"type": "command",
55+
"command": "python3 /home/apiad/Projects/personal/starter/.gemini/hooks/tier_router.py"
1456
}
1557
]
1658
}
@@ -22,7 +64,7 @@
2264
{
2365
"name": "log-after-tool",
2466
"type": "command",
25-
"command": "python3 .gemini/hooks/log.py"
67+
"command": "python3 /home/apiad/Projects/personal/starter/.gemini/hooks/log.py"
2668
}
2769
]
2870
}
@@ -34,12 +76,12 @@
3476
{
3577
"name": "startup",
3678
"type": "command",
37-
"command": "python3 .gemini/hooks/session.py"
79+
"command": "python3 /home/apiad/Projects/personal/starter/.gemini/hooks/session.py"
3880
},
3981
{
4082
"name": "welcome-message",
4183
"type": "command",
42-
"command": "python3 .gemini/hooks/welcome.py"
84+
"command": "python3 /home/apiad/Projects/personal/starter/.gemini/hooks/welcome.py"
4385
}
4486
]
4587
}
@@ -51,7 +93,7 @@
5193
{
5294
"name": "log-model-output",
5395
"type": "command",
54-
"command": "python3 .gemini/hooks/log.py"
96+
"command": "python3 /home/apiad/Projects/personal/starter/.gemini/hooks/log.py"
5597
}
5698
]
5799
}
@@ -64,18 +106,21 @@
64106
{
65107
"name": "notify-user",
66108
"type": "command",
67-
"command": "python3 .gemini/hooks/notify.py"
109+
"command": "python3 /home/apiad/Projects/personal/starter/.gemini/hooks/notify.py"
68110
},
69111
{
70112
"name": "log-after-agent",
71113
"type": "command",
72-
"command": "python3 .gemini/hooks/log.py"
114+
"command": "python3 /home/apiad/Projects/personal/starter/.gemini/hooks/log.py"
73115
}
74116
]
75117
}
76-
]
118+
],
119+
"BeforeModel": []
77120
},
78121
"hooksConfig": {
79-
"disabled": []
122+
"disabled": [
123+
"tier-router"
124+
]
80125
}
81126
}

research/multi-model-routing.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Best Practices for Multi-Model Configurations and Automatic Model Routing in Gemini CLI
2+
3+
## Executive Summary
4+
This report defines the best practices for implementing multi-model configurations and dynamic model routing in the Gemini CLI (v0.34.0). Through native configurations (`settings.json`) and lifecycle hooks (`BeforeModel`, `BeforeAgent`), developers can create an intelligent routing system that automatically directs tasks to specialized models (e.g., `Thinker` for deep reasoning, `Executioner` for rapid coding).
5+
6+
Key findings reveal that while the CLI natively supports static model selection via flags and configuration aliases, true dynamic routing requires careful hook implementation. Successful routing demands strict adherence to the expected JSON response schemas, safe parsing of the Gemini API's nested `parts` structures, and sequential hook execution to prevent race conditions. Furthermore, due to the CLI's lack of hot-reloading for configurations, developers must adopt robust, out-of-band logging and isolated testing scripts to iterate effectively without constant session restarts.
7+
8+
## Research Questions
9+
10+
### 1. Native Multi-Model Configuration in Gemini CLI
11+
*For detailed findings, see [RQ1 Research Asset](multi-model-routing/rq1-native-config.md).*
12+
13+
**High-Level Overview:**
14+
- **Native Configuration:** Gemini CLI uses `modelConfigs.customAliases` and `modelConfigs.overrides` in `.gemini/settings.json` to define model aliases, inherit parameters (via `extends`), and apply context-specific overrides (e.g., for specific agents).
15+
- **Dynamic Switching:** Models can be overridden dynamically using the `--model` (`-m`) CLI flag or the `GEMINI_MODEL` environment variable. The order of precedence is CLI Flags > Environment Variables > Project Settings > User Settings.
16+
- **Fallback Behavior:** Governed by `ModelAvailabilityService`. Interactive sessions prompt the user when a model is unavailable, while background utilities use a strict fallback sequence (`flash-lite` -> `flash` -> `pro`). Fallback can be disabled via the `--no-model-fallback` flag or `disableModelFallback` setting.
17+
- **Parameter Handling:** Configuration attributes like `temperature` and `maxOutputTokens` reside in `generateContentConfig`. When switching models or inheriting via aliases, parameters are deeply merged, prioritizing the most specific context.
18+
19+
### 2. Hook-Based Dynamic Model Routing
20+
*For detailed findings, see [RQ2 Research Asset](multi-model-routing/rq2-hook-routing.md).*
21+
22+
**High-Level Overview:**
23+
- **Reliable Hook Events:** In Gemini CLI v0.34.0, `BeforeModel` is the definitive hook for dynamic model routing. It executes after all prompt hydration and context assembly, making it the safest place to override the model before the API call. While `BeforeAgent` exists, it is prone to downstream overrides. (Note: The RCA highlighted issues with `BeforeModel` failing silently, which often stems from schema mismatches rather than the hook itself being unsupported, but `BeforeAgent` might be used as a fallback if modifying state earlier is required).
24+
- **Override Schema:** To switch models, a hook must return a precise JSON structure. The expected schema typically involves returning an object with `hookSpecificOutput` containing an `llm_request` block that overrides the `model` property (e.g., `{"hookSpecificOutput": {"llm_request": {"model": "gemini-3.1-pro-preview"}}}`).
25+
- **Safe History Parsing:** When analyzing previous turns for semantic signals (like `[Tier: Thinker]`), hooks must safely parse the `llm_request.messages` array. The Gemini API uses a `parts` array (e.g., `[{"text": "..."}]`) for structured content, especially during tool calls, rather than a simple `content` string. Fallback logic should check both.
26+
- **Execution Order:** When chaining multiple hooks (e.g., a logger and a router), setting `sequential: true` in `settings.json` is critical. This guarantees synchronous execution, preventing race conditions where one hook's override might be lost or conflict with another's read/write operations.
27+
28+
### 3. Command and Subagent Routing Integration
29+
*For detailed findings, see [RQ3 Research Asset](multi-model-routing/rq3-agent-routing.md).*
30+
31+
**High-Level Overview:**
32+
- **Native TOML Commands:** Custom `.toml` commands (e.g., in `.gemini/commands/`) can natively define preferred models by including a `tier` or `model` attribute (e.g., `tier = "Thinker"`) in the file's frontmatter or configuration block. The CLI parses this and switches the session context prior to executing the command's prompts.
33+
- **Subagent Specifications:** Similarly, subagents defined in `.gemini/agents/*.md` can statically declare their required model using YAML frontmatter (e.g., `tier: Executioner`). Subagents can also utilize dynamic signaling during multi-step executions if they need to shift cognitive modes.
34+
- **Semantic Signaling & Context Management:** The best practice for dynamic routing is "Semantic Signaling" (e.g., appending `[Tier: Thinker]` to a response). To prevent context pollution and token waste, a `BeforeAgent` or `AfterModel` hook should detect the regex, update the routing state, and then strip the tag from the final message before it is saved to the persistent history or displayed to the user.
35+
- **Robust User Feedback:** Model switching, particularly to heavier reasoning models, introduces latency. Hooks should return a `systemMessage` (e.g., "Switching to Thinker Tier...") which the CLI renders natively. For out-of-band awareness, scripts can execute system-level notifications (e.g., `notify-send`) via subprocesses.
36+
37+
### 4. Testing and Debugging Workflows for CLI Extensions
38+
*For detailed findings, see [RQ4 Research Asset](multi-model-routing/rq4-testing-workflows.md).*
39+
40+
**High-Level Overview:**
41+
- **Hot-Reloading Workarounds:** Because Gemini CLI (`v0.34.0`) caches `settings.json` on startup, hooks cannot be hot-reloaded dynamically during an active session. The best testing strategy is creating standalone mock scripts that simulate the CLI's `stdin` JSON payload, allowing developers to test hook logic (like JSON parsing and routing decisions) independently without restarting the CLI.
42+
- **Standalone Stateful Logging:** Standard output (`stdout`) is consumed by the CLI to parse hook responses. Therefore, using standard `print()` statements for debugging will corrupt the JSON and crash the hook. Developers must implement a standalone logger (e.g., writing directly to `/tmp/gemini_hooks.log` or a dedicated `gemini_hooks.log` file) using Python's native file I/O or `logging` module to trace execution safely.
43+
- **"Hello World" Validation:** The recommended pattern for new hook development is the "Probe Pattern". Before implementing complex routing, deploy a simple hook that catches the target event (e.g., `BeforeAgent`) and performs a basic file-write of the received `stdin` payload. This guarantees the event triggers in the current environment and provides the exact JSON schema you need to parse.
44+
45+
## Conclusions
46+
Dynamic model routing in the Gemini CLI is highly achievable but requires precision. The primary pitfalls encountered in multi-model implementations stem from environmental misunderstandings (e.g., assuming `content` strings instead of `parts` arrays) and CLI lifecycle constraints (e.g., cached configurations preventing hot-reloading). By leveraging `BeforeModel` or `BeforeAgent` hooks, parsing the message history defensively, and injecting structured override JSON payloads, the CLI can seamlessly transition between cognitive tiers.
47+
48+
## Recommendations
49+
50+
### Immediate Actions
51+
1. **Standardize Routing Hooks:** Implement a universal `tier_router.py` hook bound to the `BeforeAgent` or `BeforeModel` lifecycle event. Ensure it has `sequential: true` enabled in `settings.json`.
52+
2. **Implement Defensive Parsing:** Update all historical message parsing logic to handle both `message.get("content")` and `message.get("parts")` to ensure compatibility across different API states (standard text vs. tool calling).
53+
3. **Establish Out-of-Band Logging:** Create a dedicated logging utility for all custom hooks that writes directly to a standalone file (e.g., `gemini_hooks.log`) to prevent stdout corruption and facilitate debugging.
54+
55+
### Future Research
56+
- Investigate the potential for a native "hot-reload" command within the Gemini CLI to refresh `settings.json` without dropping the active session.
57+
- Explore creating a standardized Subagent framework that natively incorporates tier requirements, reducing the reliance on regex-based semantic signaling.
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# Native Multi-Model Configuration in Gemini CLI (v0.34.0)
2+
3+
This report investigates the native multi-model configuration capabilities of Gemini CLI version 0.34.0, focusing on configuration structures, dynamic switching mechanisms, fallback behaviors, and parameter handling.
4+
5+
## 1. Native Model Definition and Configuration
6+
7+
In Gemini CLI v0.34.0, multiple models are defined and managed within the `.gemini/settings.json` file (at the project or user level) using two primary structures: `customAliases` and `overrides`.
8+
9+
### Custom Model Aliases (`modelConfigs.customAliases`)
10+
The `customAliases` object allows users to define named presets that can inherit from base models or other aliases. This is useful for creating specialized configurations for different tasks.
11+
12+
- **Inheritance (`extends`):** An alias can extend another alias or a base model.
13+
- **Parameter Overrides:** Specific generation parameters (like temperature) can be set for each alias.
14+
15+
**Example Configuration:**
16+
```json
17+
{
18+
"modelConfigs": {
19+
"customAliases": {
20+
"research-pro": {
21+
"extends": "gemini-2.0-pro",
22+
"modelConfig": {
23+
"generateContentConfig": {
24+
"temperature": 0.2,
25+
"maxOutputTokens": 2048
26+
}
27+
}
28+
},
29+
"fast-exec": {
30+
"extends": "gemini-2.0-flash",
31+
"modelConfig": {
32+
"generateContentConfig": {
33+
"temperature": 0.1
34+
}
35+
}
36+
}
37+
}
38+
}
39+
}
40+
```
41+
42+
### Model Overrides (`modelConfigs.overrides`)
43+
Overrides allow for context-aware model selection based on the "scope" or "agent" currently executing. This is part of the CLI's advanced routing system.
44+
45+
**Example Configuration:**
46+
```json
47+
{
48+
"modelConfigs": {
49+
"overrides": [
50+
{
51+
"match": { "overrideScope": "codebase_investigator" },
52+
"modelConfig": {
53+
"model": "gemini-2.0-pro",
54+
"generateContentConfig": { "temperature": 0 }
55+
}
56+
}
57+
]
58+
}
59+
}
60+
```
61+
62+
### Intelligent Model Routing
63+
The CLI includes a built-in "Plan Mode Routing" feature (enabled by `general.plan.modelRouting: true`) that automatically selects high-reasoning models (Pro) for planning and high-speed models (Flash) for implementation tasks.
64+
65+
## 2. Dynamic Model Switching: Flags and Environment Variables
66+
67+
Gemini CLI provides several mechanisms to override the default model or switch models dynamically during execution.
68+
69+
### Precedence Order (Highest to Lowest)
70+
1. **Command-line flag:** `--model <model-name>` (or `-m`)
71+
2. **Environment variable:** `GEMINI_MODEL`
72+
3. **Project Settings:** `.gemini/settings.json` in the current directory.
73+
4. **User Settings:** `~/.gemini/settings.json`.
74+
5. **System Defaults:** Global configuration files or hardcoded defaults.
75+
6. **Intelligent Router:** If set to `auto` (e.g., `auto-gemini-3`), the system chooses dynamically.
76+
77+
### CLI Flags
78+
- `--model <name>`: Sets the model for the current command.
79+
- `--no-model-fallback`: Disables the automatic fallback mechanism.
80+
81+
### Interactive Commands
82+
Inside the CLI interactive mode, users can switch models using:
83+
- `/model`: Opens a menu to select between **Auto**, **Pro**, **Flash**, or a manual entry.
84+
- `/settings`: Allows toggling the "Model Router" and other configuration options.
85+
86+
## 3. Default Fallback Behavior
87+
88+
The CLI employs a `ModelAvailabilityService` to handle scenarios where a requested model is unavailable (e.g., due to quota limits or API errors).
89+
90+
### Interactive Fallback
91+
In an interactive session, if the primary model fails, the CLI typically prompts the user to select an alternative model to continue the session.
92+
93+
### Silent Fallback Chain (Utility Calls)
94+
For background tasks or internal utility calls (such as prompt completion or classification), the CLI uses a hardcoded silent fallback sequence:
95+
1. `gemini-2.5-flash-lite` (Primary)
96+
2. `gemini-2.5-flash` (Secondary)
97+
3. `gemini-2.5-pro` (Final attempt)
98+
99+
### Disabling Fallback
100+
Users can force the CLI to error out rather than falling back by setting:
101+
- **Settings:** `"disableModelFallback": true`
102+
- **CLI Flag:** `--no-model-fallback`
103+
104+
## 4. Handling of Model-Specific Parameters
105+
106+
When switching between models (either manually or via the router), model-specific parameters are managed through the `modelConfig` object.
107+
108+
### Parameter Merging
109+
- **Inheritance Logic:** When an alias `extends` another, the child's `generateContentConfig` is merged with the parent's. The child's values overwrite the parent's for the same keys.
110+
- **Switching Consistency:** When the CLI switches models (e.g., from Pro to Flash in Plan Mode), it applies the configuration associated with the target model or alias defined in the `modelConfigs`.
111+
112+
### Key Parameters Supported
113+
The following parameters are typically handled within the `generateContentConfig` block during a switch:
114+
- `temperature`
115+
- `maxOutputTokens`
116+
- `topP`
117+
- `topK`
118+
- `stopSequences`
119+
- `responseMimeType`
120+
121+
This structured approach ensures that specialized models (like a "creative" alias) maintain their specific temperature settings even when the underlying base model is updated or switched.

0 commit comments

Comments
 (0)