You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md
+50-11Lines changed: 50 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,15 +12,16 @@ This KB explains how to make SQL RBAC changes (`CREATE USER`, `CREATE ROLE`, `GR
12
12
13
13
`Keeper` below means either ClickHouse Keeper or ZooKeeper.
14
14
15
+
TL;DR:
16
+
- By default, SQL RBAC changes (`CREATE USER`, `GRANT`, etc.) are local to each server.
17
+
- Replicated access storage keeps RBAC entities in ZooKeeper/ClickHouse Keeper so changes automatically appear on all nodes.
18
+
- This guide shows how to configure replicated RBAC, validate it, and migrate existing users safely.
19
+
15
20
Before details, the core concept is:
16
21
- ClickHouse stores access entities in access storages configured by `user_directories`.
17
22
- By default, following the shared-nothing concept, SQL RBAC objects are local (`local_directory`), so changes done on one node do not automatically appear on another node unless you run `... ON CLUSTER ...`.
18
23
- With `user_directories.replicated`, ClickHouse stores the RBAC model in Keeper under a configured path (for example `/clickhouse/access`) and every node watches that path.
19
-
- Each node keeps a local in-memory mirror of replicated access entities and updates it from Keeper watch notifications. This is why normal access checks are local-memory fast, while RBAC writes depend on Keeper availability.
20
-
21
-
Important mental model:
22
-
- this feature replicates RBAC state (users, roles, grants, policies, profiles, quotas, masking policies);
23
-
- it is not the same mechanism as distributed DDL queue execution used by `ON CLUSTER`.
24
+
- Each node keeps a local in-memory mirror of replicated access entities and updates it from Keeper watch callbacks. This is why normal access checks are local-memory fast, while RBAC writes depend on Keeper availability.
24
25
25
26
Flow of this KB:
26
27
1. Why this model helps.
@@ -29,17 +30,19 @@ Flow of this KB:
29
30
4. How to migrate existing RBAC safely.
30
31
5. Advanced troubleshooting and internals.
31
32
32
-
## 1. Choose the RBAC replication model (`ON CLUSTER` vs Keeper)
33
+
## 1. ON CLUSTER vs Keeper-backed RBAC: when to use which
33
34
34
35
`ON CLUSTER` executes DDL on hosts that exist at execution time.
35
-
In practice, it fans out the query through the distributed DDL queue to currently known cluster nodes.
36
+
In practice, it fans out the query through the distributed DDL queue (also Keeper/ZooKeeper-dependent) to currently known cluster nodes.
36
37
It does not automatically replay old RBAC DDL for replicas/shards added later.
37
38
38
39
Keeper-backed RBAC solves that:
39
40
- one shared RBAC state for the cluster;
40
41
- new servers read the same RBAC state when they join;
41
42
- no need to remember `ON CLUSTER` for every RBAC statement.
42
43
44
+
Mental model: Keeper-backed RBAC replicates access state, while `ON CLUSTER` fans out DDL to currently known nodes.
45
+
43
46
### 1.1 Pros and Cons
44
47
45
48
Pros:
@@ -50,12 +53,12 @@ Pros:
50
53
- Integrates with access-entity backup/restore.
51
54
52
55
Cons:
53
-
- Writes depend on Keeper availability (reads can continue from local cache, writes fail when Keeper is unavailable).
56
+
- Writes depend on Keeper availability. `CREATE/ALTER/DROP USER` and `CREATE/ALTER/DROP ROLE`, plus `GRANT/REVOKE`, fail if Keeper is unavailable, while existing authentication/authorization may continue from already loaded cache until restart.
54
57
- Operational complexity increases (Keeper health directly affects RBAC operations).
55
58
- Can conflict with `ON CLUSTER` if both mechanisms are used without guard settings.
56
59
- Invalid/corrupted payload in Keeper can be skipped or be startup-fatal, depending on `throw_on_invalid_replicated_access_entities`.
57
60
- Very large RBAC sets (thousands of users/roles or very complex grants) can increase Keeper/watch pressure.
58
-
- If Keeper is unavailable during server startup and replicated RBAC storage is configured, startup can fail, so DBA login is unavailable until startup succeeds.
61
+
- If Keeper is unavailable during server startup and replicated RBAC storage is configured, startup can fail, so you may be unable to log in until startup succeeds.
59
62
60
63
## 2. Configure Keeper-backed RBAC on a new cluster
- this applies to SQL/RBAC users (created with `CREATE USER ...`, `CREATE ROLE ...`, etc.);
273
301
- if your users are in `users.xml`, those are config-based (`--configs`) and this is not an automatic local->replicated RBAC conversion.
302
+
- run restore on one node only; entities will be replicated through Keeper.
274
303
275
304
### 6.3 Migration with embedded SQL `BACKUP/RESTORE`
276
305
@@ -311,6 +340,7 @@ About `clickhouse-backup --rbac/--rbac-only`:
311
340
- It is an external tool, not ClickHouse embedded backup by itself.
312
341
- If `clickhouse-backup` is configured with `use_embedded_backup_restore: true`, it delegates to SQL `BACKUP/RESTORE` and follows embedded rules.
313
342
- Otherwise it uses its own workflow; do not assume full equivalence with embedded `allow_backup` semantics.
343
+
- run restore on one node only; entities will be replicated through Keeper.
314
344
315
345
## 7. Troubleshooting: common support issues
316
346
@@ -320,6 +350,7 @@ About `clickhouse-backup --rbac/--rbac-only`:
320
350
| RBAC objects “disappeared” after config change/restart |`zookeeper_path` or storage source changed | Restore from backup or recreate RBAC in the new storage; keep path stable |
321
351
| New replica has no historical users/roles | Team used only `... ON CLUSTER ...` before scaling | Enable Keeper-backed RBAC so new nodes load shared state |
322
352
|`CREATE USER ... ON CLUSTER` throws "already exists in replicated" | Query fan-out + replicated storage both applied | Remove `ON CLUSTER` for RBAC or enable `ignore_on_cluster_for_replicated_access_entities_queries`|
353
+
|`CREATE USER`/`GRANT` fails with Keeper/ZooKeeper error | Keeper unavailable or connection lost | Check `system.zookeeper_connection`, `system.zookeeper_connection_log`, and server logs |
323
354
| RBAC writes still go local though `replicated` exists |`local_directory` remains first writable storage | Use `user_directories replace="replace"` and avoid writable local SQL storage in front of replicated |
324
355
| Server does not start when Keeper is down; no one can log in | Replicated access storage needs Keeper during initialization | Restore Keeper first, then restart; if needed use a temporary fallback config and keep a break-glass `users.xml` admin |
325
356
| Startup fails (or users are skipped) because of invalid RBAC payload in Keeper | Corrupted/invalid replicated entity and strict validation mode | Use `throw_on_invalid_replicated_access_entities` deliberately: `true` fail-fast, `false` skip+log; fix bad Keeper payload before re-enabling strict mode |
@@ -341,14 +372,14 @@ About `clickhouse-backup --rbac/--rbac-only`:
341
372
342
373
## 9. Observability and debugging signals
343
374
344
-
Keeper connectivity:
375
+
### 9.1 Check Keeper connectivity
345
376
346
377
```sql
347
378
SELECT*FROMsystem.zookeeper_connection;
348
379
SELECT*FROMsystem.zookeeper_connection_logORDER BY event_time DESCLIMIT100;
349
380
```
350
381
351
-
Keeper operations for RBAC path:
382
+
### 9.2 Inspect RBAC activity in Keeper
352
383
353
384
```sql
354
385
SELECT event_time, type, op_num, path, error
@@ -358,6 +389,8 @@ ORDER BY event_time DESC
358
389
LIMIT200;
359
390
```
360
391
392
+
### 9.3 Relevant server log patterns
393
+
361
394
Note: `system.zookeeper_log` is often disabled in production.
362
395
If it is unavailable, use server logs (usually `clickhouse-server.log`) with these patterns:
363
396
@@ -370,6 +403,8 @@ Can't have Replicated access without ZooKeeper
0 commit comments