Skip to content

Commit 69b86fc

Browse files
authored
Merge pull request #175 from Altinity/multidisk-config
Add multidisk-jbod-balancing.md
2 parents fb8ee79 + d77fa13 commit 69b86fc

1 file changed

Lines changed: 113 additions & 0 deletions

File tree

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
---
2+
title: "MultiDisk (JBOD) Balancing"
3+
linkTitle: "MultiDisk (JBOD) Balancing"
4+
---
5+
6+
ClickHouse provides two options to balance an insert across disks in a volume with more than one disk: `round_robin` and `least_used` .
7+
8+
## **Round Robin (Default):**
9+
10+
ClickHouse selects the next disk in a round robin manner to write a part.
11+
12+
This is the default setting and is most effective when parts created on insert are roughly the same size.
13+
14+
Drawbacks: may lead to disk skew
15+
16+
## **Least Used:**
17+
18+
ClickHouse selects the disk with the most available space and writes to that disk.
19+
20+
Changing to least_used when even disk space consumption is desirable or when you have a JBOD volume with differing disk sizes. To prevent hot-spots, it is best to set this policy on a fresh volume or on a volume that has already been (re)balanced.
21+
22+
Drawbacks: may lead to hot-spots
23+
24+
## Configurations
25+
26+
Configurations that can affect disk selected:
27+
28+
- storage policy volume configuration: `least_used_ttl_ms`. Only applies to `least_used` policy, 60s default.
29+
- disk setting: `keep_free_space_bytes` , `keep_free_space_ratio`
30+
31+
Configuration to assist rebalancing:
32+
33+
- The MergeTree setting `min_bytes_to_rebalance_partition_over_jbod` does not control where data is written during inserts. Instead, it governs how parts are redistributed across disks within the same volume during merge operations.
34+
35+
> Note: setting `min_bytes_to_rebalance_partition_over_jbod` does not guarantee balanced partitions and balanced disk usage.
36+
>
37+
38+
Example of least_used policy:
39+
40+
```xml
41+
<clickhouse>
42+
<storage_configuration>
43+
<disks>
44+
<default>
45+
<path>/var/lib/clickhouse/</path>
46+
<keep_free_space_bytes>10737418240</keep_free_space_bytes>
47+
</disk1>
48+
<disk1>
49+
<path>/mnt/disk1/</path>
50+
<keep_free_space_bytes>10737418240</keep_free_space_bytes>
51+
</disk1>
52+
<disk2>
53+
<path>/mnt/disk2/</path>
54+
<keep_free_space_bytes>10737418240</keep_free_space_bytes>
55+
</disk2>
56+
</disks>
57+
<policies>
58+
<hot>
59+
<volumes>
60+
<default>
61+
<disk>disk1</disk>
62+
<disk>disk2</disk>
63+
<load_balancing>least_used</load_balancing>
64+
<least_used_ttl_ms>60000</least_used_ttl_ms> <!-- 60s -->
65+
</default>
66+
</volumes>
67+
</hot>
68+
</policies>
69+
</storage_configuration>
70+
</clickhouse>
71+
```
72+
73+
## Manual Rebalancing Parts over JBOD Disks
74+
75+
Following query will select large parts in target_tables and target_databases that can be candidates to move to another disk. Disk chosen should comply with the following requirements:
76+
- Should only select valid moves for the same storage_policy used by that table
77+
- storage_policy must be JBODs type
78+
- moves to other disks in the same volume
79+
- select a different disk, i.e not the same disk as the one that part is in
80+
- select the disk to move the part to by order of largest free_space on that disk
81+
82+
Set `target_tables` and `target_databases` based on requirements.
83+
84+
```sql
85+
WITH
86+
'%' AS target_tables,
87+
'%' AS target_databases
88+
SELECT sub.q FROM
89+
(
90+
SELECT
91+
'ALTER TABLE ' || parts.database || '.' || parts.`table` || ' MOVE PART \'' || parts.name ||'\' TO DISK \'' || other_disk_candidate || '\';' as q,
92+
parts.database as db,
93+
parts.`table` as t,
94+
parts.name as part_name,
95+
parts.disk_name as part_disk_name,
96+
parts.bytes_on_disk AS part_bytes_on_disk,
97+
sp.storage_policy as part_storage_policy,
98+
arrayJoin(arrayRemove(v.disks, parts.disk_name)) AS other_disk_candidate,
99+
candidate_disks.free_space AS candidate_disk_free_space
100+
FROM system.parts AS parts
101+
INNER JOIN ( SELECT database, `table`, storage_policy FROM system.tables where (name LIKE target_tables) AND (database LIKE target_databases) group by 1, 2, 3 ) AS sp ON sp.`table` = parts.`table` AND sp.database = parts.database
102+
INNER JOIN ( SELECT policy_name, volume_name, disks AS disks FROM system.storage_policies WHERE volume_type = 0 ) AS v ON sp.storage_policy = v.policy_name
103+
INNER JOIN ( SELECT name, free_space FROM system.disks ORDER BY free_space DESC ) AS candidate_disks ON candidate_disks.name = other_disk_candidate
104+
WHERE parts.active = 1
105+
AND (parts.bytes_on_disk >= 10737418240) --10GB prioritize larger parts
106+
AND (parts.`table` LIKE target_tables)
107+
AND (parts.database LIKE target_databases)
108+
AND candidate_disks.free_space > parts.bytes_on_disk*2 -- 2x buffer
109+
ORDER BY parts.bytes_on_disk DESC, candidate_disk_free_space DESC
110+
LIMIT 1 BY db, t, part_name
111+
) as sub
112+
FORMAT TSVRaw
113+
```

0 commit comments

Comments
 (0)