How to use multiple disks in Clickhouse

Clickhouse allows using different storage backends for data, including local disks and remote ones, like Amazon S3. It’s quite common to have multiple storage devices when dealing with a lot of data.

Multiple tables on different disks in Clickhouse

Clickhouse allows working with multiple disks on the same server, making it easy to scale beyond single storage device locally.

Configure multiple disks

First, we need to list all of our local disks in configuration, so Clickhouse knows what it can work with. Prefered way is to create new xml file under /etc/clickhouse-server/config.d directory (disks.xml in our case). Let’s say we have 3 local storage devices (disks), one default (used by the system already) and two that we want to use in Clickhouse.

Before adding disks to Clickhouse, we should create data (named clickhouse in our case) folder and grant access to clickhouse user to it:

mkdir /mnt/disk2/clickhouse
chown clickhouse:clickhouse /mnt/disk2/clickhouse
mkdir /mnt/disk3/clickhouse
chown clickhouse:clickhouse /mnt/disk3/clickhouse

Now we can register our disks using the following configuration (in /etc/clickhouse-server/config.d/disks.xml file):

<clickhouse>
  <storage_configuration>
    <disks>
      <d2><type>local</type><path>/mnt/disk2/clickhouse/</path></d2>
      <d3><type>local</type><path>/mnt/disk3/clickhouse/</path></d3>
    </disks>
    <policies>
      <d2_main><volumes><main><disk>d2</disk></main></volumes></d2_main>
      <d3_main><volumes><main><disk>d3</disk></main></volumes></d3_main>
    </policies>
  </storage_configuration>
</clickhouse>

No need to restart Clickhouse server, since it’s gonna read configuration updates and automatically load it in background. To make sure disks are available to Clickhouse we can look at system.disks table:

SELECT * FROM system.disks\G
Row 1:
──────
name:             d2
path:             /mnt/disk2/clickhouse/
...

Row 2:
──────
name:             d3
path:             /mnt/disk3/clickhouse/
...

Using different disks for different tables

Now we can specify which disk we want to store our table to while creating it:

CREATE TABLE some_table ( `some_column` String, )
ENGINE = MergeTree ORDER BY uuid
SETTINGS storage_policy = 'd3_main'

That’s it. Now Clickhouse will automatically use disk3 to write/read some_table data. Note, that Clickhouse can automatically move data between disks accordingly to “hot/cold” storage policies.

Further reading

Published a year ago in #data about #clickhouse by Denys Golotiuk

Edit this article on Github
Denys Golotiuk in 2024

I'm a database architect, focused on OLAP databases for storing & processing large amounts of data in an efficient manner. Let me know if you have questions or need help.

Contact me via golotyuk@gmail.com or mrcrypster @ github