You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -12,9 +12,27 @@ This KB explains how to make SQL RBAC changes (`CREATE USER`, `CREATE ROLE`, `GR
12
12
13
13
`Keeper` below means either ClickHouse Keeper or ZooKeeper.
14
14
15
-
## 1. Why use this instead of only `ON CLUSTER` for RBAC?
15
+
Before details, the core concept is:
16
+
- ClickHouse stores access entities in access storages configured by `user_directories`.
17
+
- By default, following the shared-nothing concept, SQL RBAC objects are local (`local_directory`), so changes done on one node do not automatically appear on another node unless you run `... ON CLUSTER ...`.
18
+
- With `user_directories.replicated`, ClickHouse stores the RBAC model in Keeper under a configured path (for example `/clickhouse/access`) and every node watches that path.
19
+
- Each node keeps a local in-memory mirror of replicated access entities and updates it from Keeper watch notifications. This is why normal access checks are local-memory fast, while RBAC writes depend on Keeper availability.
20
+
21
+
Important mental model:
22
+
- this feature replicates RBAC state (users, roles, grants, policies, profiles, quotas, masking policies);
23
+
- it is not the same mechanism as distributed DDL queue execution used by `ON CLUSTER`.
24
+
25
+
Flow of this KB:
26
+
1. Why this model helps.
27
+
2. How to configure it on a new cluster.
28
+
3. How to validate and operate it.
29
+
4. How to migrate existing RBAC safely.
30
+
5. Advanced troubleshooting and internals.
31
+
32
+
## 1. Choose the RBAC replication model (`ON CLUSTER` vs Keeper)
16
33
17
34
`ON CLUSTER` executes DDL on hosts that exist at execution time.
35
+
In practice, it fans out the query through the distributed DDL queue to currently known cluster nodes.
18
36
It does not automatically replay old RBAC DDL for replicas/shards added later.
19
37
20
38
Keeper-backed RBAC solves that:
@@ -39,110 +57,7 @@ Cons:
39
57
- Very large RBAC sets (thousands of users/roles or very complex grants) can increase Keeper/watch pressure.
40
58
- If Keeper is unavailable during server startup and replicated RBAC storage is configured, startup can fail, so DBA login is unavailable until startup succeeds.
41
59
42
-
## 2. Backup and migration first (important)
43
-
44
-
Before switching to Keeper-backed RBAC, treat this as a migration.
45
-
46
-
Key facts:
47
-
- Changing `user_directories` storage or changing `zookeeper_path` does **not** move existing SQL RBAC objects automatically.
48
-
- If path changes, old users/roles are not deleted, but become effectively hidden from the new storage path.
49
-
-`zookeeper_path` cannot be changed at runtime via SQL.
50
-
51
-
Recommended migration sequence:
52
-
1. Back up RBAC objects.
53
-
2. Apply the new `user_directories` config on all nodes.
54
-
3. Restart/reload config as required by your environment.
55
-
4. Restore/recreate RBAC objects to the target storage.
56
-
5. Validate on all nodes.
57
-
58
-
### 2.1 Migration with pure SQL (no backup tool)
59
-
60
-
This path is useful when:
61
-
- RBAC DDL is already versioned in your repo, or
62
-
- you want to dump/replay access entities using SQL only.
63
-
64
-
Recommended SQL-only flow:
65
-
1. On source, check where entities are stored (local vs replicated):
66
-
67
-
```sql
68
-
SELECT name, storage FROMsystem.usersORDER BY name;
69
-
SELECT name, storage FROMsystem.rolesORDER BY name;
70
-
SELECT name, storage FROMsystem.settings_profilesORDER BY name;
71
-
SELECT name, storage FROMsystem.quotasORDER BY name;
72
-
SELECT name, storage FROMsystem.row_policiesORDER BY name;
73
-
SELECT name, storage FROMsystem.masking_policiesORDER BY name;
74
-
```
75
-
76
-
2. Export RBAC DDL from source:
77
-
- simplest full dump:
78
-
79
-
```sql
80
-
SHOW ACCESS;
81
-
```
82
-
83
-
Save output as SQL (for example `rbac_dump.sql`) in your repo/artifacts.
84
-
85
-
You can also export individual objects with `SHOW CREATE USER/ROLE/...` when needed.
86
-
87
-
3. Switch config to replicated `user_directories` on target cluster and restart/reload.
88
-
4. Replay exported SQL on one node (without `ON CLUSTER` in replicated mode).
89
-
5. Validate from another node (`SHOW CREATE USER ...`, `SHOW GRANTS FOR ...`).
90
-
91
-
### 2.2 Migration with `clickhouse-backup` (external tool)
- this applies to SQL/RBAC users (created with `CREATE USER ...`, `CREATE ROLE ...`, etc.);
103
-
- if your users are in `users.xml`, those are config-based (`--configs`) and this is not an automatic local->replicated RBAC conversion.
104
-
105
-
### 2.3 Migration with embedded ClickHouse SQL `BACKUP/RESTORE`
106
-
107
-
```sql
108
-
BACKUP
109
-
TABLE system.users,
110
-
TABLE system.roles,
111
-
TABLE system.row_policies,
112
-
TABLE system.quotas,
113
-
TABLE system.settings_profiles,
114
-
TABLE system.masking_policies
115
-
TO <backup_destination>;
116
-
117
-
-- after switching config
118
-
RESTORE
119
-
TABLE system.users,
120
-
TABLE system.roles,
121
-
TABLE system.row_policies,
122
-
TABLE system.quotas,
123
-
TABLE system.settings_profiles,
124
-
TABLE system.masking_policies
125
-
FROM<backup_destination>;
126
-
```
127
-
128
-
`allow_backup` behavior for embedded SQL backup/restore:
129
-
- Storage-level flag in `user_directories` (`<replicated>`, `<local_directory>`, `<users_xml>`) controls whether that storage participates in backup/restore.
130
-
- Entity-level setting `allow_backup` (for users/roles/settings profiles) can exclude specific RBAC objects from backup.
131
-
132
-
Defaults in ClickHouse code:
133
-
-`users_xml`: `allow_backup = false` by default.
134
-
-`local_directory`: `allow_backup = true` by default.
135
-
-`replicated`: `allow_backup = true` by default.
136
-
137
-
Operational implication:
138
-
- If you disable `allow_backup` for replicated storage, embedded `BACKUP TABLE system.users ...` may skip those entities (or fail if no backup-allowed access storage remains).
139
-
140
-
About `clickhouse-backup --rbac/--rbac-only`:
141
-
- It is an external tool, not ClickHouse embedded backup by itself.
142
-
- If `clickhouse-backup` is configured with `use_embedded_backup_restore: true`, it delegates to SQL `BACKUP/RESTORE` and follows embedded rules.
143
-
- Otherwise it uses its own workflow; do not assume full equivalence with embedded `allow_backup` semantics.
144
-
145
-
## 3. Minimal server configuration
60
+
## 2. Configure Keeper-backed RBAC on a new cluster
146
61
147
62
`user_directories` is the ClickHouse server configuration section that defines:
148
63
- where access entities are read from (`users.xml`, local SQL access files, Keeper, LDAP, etc.),
- this applies to SQL/RBAC users (created with `CREATE USER ...`, `CREATE ROLE ...`, etc.);
273
+
- if your users are in `users.xml`, those are config-based (`--configs`) and this is not an automatic local->replicated RBAC conversion.
274
+
275
+
### 6.3 Migration with embedded SQL `BACKUP/RESTORE`
276
+
277
+
```sql
278
+
BACKUP
279
+
TABLE system.users,
280
+
TABLE system.roles,
281
+
TABLE system.row_policies,
282
+
TABLE system.quotas,
283
+
TABLE system.settings_profiles,
284
+
TABLE system.masking_policies
285
+
TO <backup_destination>;
286
+
287
+
-- after switching config
288
+
RESTORE
289
+
TABLE system.users,
290
+
TABLE system.roles,
291
+
TABLE system.row_policies,
292
+
TABLE system.quotas,
293
+
TABLE system.settings_profiles,
294
+
TABLE system.masking_policies
295
+
FROM<backup_destination>;
296
+
```
297
+
298
+
`allow_backup` behavior for embedded SQL backup/restore:
299
+
- Storage-level flag in `user_directories` (`<replicated>`, `<local_directory>`, `<users_xml>`) controls whether that storage participates in backup/restore.
300
+
- Entity-level setting `allow_backup` (for users/roles/settings profiles) can exclude specific RBAC objects from backup.
301
+
302
+
Defaults in ClickHouse code:
303
+
-`users_xml`: `allow_backup = false` by default.
304
+
-`local_directory`: `allow_backup = true` by default.
305
+
-`replicated`: `allow_backup = true` by default.
306
+
307
+
Operational implication:
308
+
- If you disable `allow_backup` for replicated storage, embedded `BACKUP TABLE system.users ...` may skip those entities (or fail if no backup-allowed access storage remains).
309
+
310
+
About `clickhouse-backup --rbac/--rbac-only`:
311
+
- It is an external tool, not ClickHouse embedded backup by itself.
312
+
- If `clickhouse-backup` is configured with `use_embedded_backup_restore: true`, it delegates to SQL `BACKUP/RESTORE` and follows embedded rules.
313
+
- Otherwise it uses its own workflow; do not assume full equivalence with embedded `allow_backup` semantics.
314
+
315
+
## 7. Troubleshooting: common support issues
298
316
299
317
| Symptom | Typical root cause | What to do |
300
318
|---|---|---|
@@ -312,7 +330,7 @@ Also decide your strictness for invalid replicated entities:
312
330
| Short window where user seems present/absent via load balancer | Propagation + node routing timing | Validate directly on each node; avoid assuming LB view is instantly consistent |
313
331
| Server fails after aggressive `user_directories` replacement | Required base users/profiles missing in config | Keep `users_xml` (or equivalent base definitions) intact |
314
332
315
-
## 8. Operational guardrails
333
+
## 8. Operational guardrails for production
316
334
317
335
- Keep the same `user_directories` config on all nodes.
318
336
- Keep `zookeeper_path` unique per cluster/tenant.
@@ -321,7 +339,7 @@ Also decide your strictness for invalid replicated entities:
321
339
- Treat Keeper health as part of access-management SLO.
322
340
- Plan RBAC backup/restore before changing storage path or cluster topology.
- higher-level caches in `AccessControl` (`RoleCache`, `RowPolicyCache`, `QuotaCache`, `SettingsProfilesCache`) are updated/invalidated via access change notifications.
403
421
404
-
## 11. Low-level behavior that explains real incidents
422
+
## 11. Low-level internals behind real incidents
405
423
406
424
- Read path is memory-backed (`MemoryAccessStorage` mirror), not direct Keeper reads per query.
407
425
- Write path requires Keeper availability; if Keeper is down, RBAC writes fail while some reads can continue from loaded state.
408
426
- Insert target is selected by storage order and writeability in `MultipleAccessStorage`; this is why leftover `local_directory` can hijack SQL user creation.
409
427
-`ignore_on_cluster_for_replicated_access_entities_queries` is implemented as AST rewrite that removes `ON CLUSTER` for access queries when replicated access storage is enabled.
0 commit comments