How Small Time?
Clickhouse powers the back-end for Old School Snitch, a plugin for the game Old School Runescape which tracks individual XP gains, item drops, and in-game character locations for roughly 100 users. Old School Snitch is a hobby project of mine, so I live with minor downtime during updates, there is no HA strategy, and even over one year after my previous blog post discussing Old School Snitch and Timescale, the database size is still under a single gigabyte. This post is a collection of my thoughts on using Clickhouse under those conditions.
The Data Set
Old School Snitch collects small, event-driven bursts of data, inserted into Clickhouse as they are received.
There are 4 types of data that Old School Snitch tracks:
- XP Gains
- NPC Kills
- Item Drops
- Player Locations
Each of these gets their own table, and the structure follows a pattern of timestamp, player_id, then 3-4 columns of the relevant data.
You can see how my player's data is displayed here. Importantly, the most granular display of the data (outside of locations) is shown at an hourly level.
The Hardware
All of the services for Old School Snitch including the Clickhouse server are running on an 8-core dedicated server with 64 Gigs of RAM and 2x450 GB SSDs for storage. This is vastly over-provisioned for what I actually use. CPU usage never goes above 5%.
How do I use Clickhouse?
I generally use the same techniques discussed in my previous post using Timescale.
I use async_insert to handle automatic batching of inserts (batching is practically required in one form or another with Clickhouse due to the I/O overhead for any individual insert). This means there is several second lag from when an insert occurs to when it ends up in the database. Since I use Clickhouse for storing historical data, the delay is not noticeable in to normal usage. While on the service side, async_insert lets me treat inserts as individual, no need to even think about batching inserts on the service side.
I utilize compression to great effect. Clickhouse allows for far greater control over how the data is compressed down to strategies per column. I am quite willing to believe that I made a poor choice in the compression method, resulting in slightly worse performance than I got previously in Timescale.
| Table Name |
Before Compression | After Compression | Compression Ratio |
|---|---|---|---|
| xp_drops | 872.44 MiB | 168.20 MiB | 81% |
| item_drops | 197.25 MiB |
29.20 MiB | 85% |
| npc_kills | 84.57 MiB | 5.48 MiB | 94% |
| locations |
4.28 GiB | 592.31 MiB | 87% |
Migrating to Clickhouse got me at least a 2x improvement in storage usage in my largest tables: 210MB -> 52 MiB for XP Drops, 550MB -> 230MiB for character locations (I know I'm comparing MB to MiB, but those are the units Timescale and Clickhouse reported to me. If Hard Drive companies can pretend they are equivalent, it's good enough for me).
Alongside compression, I also have Clickhouse automatically roll up the data into an hourly form to improve query performance. To accomplish this in Clickhouse I use the AggregatingMergeTree table engine and Incremental Materialized Views. I had to be more verbose in my Clickhouse migrations to create them compared to Timescale, but it allows for a lot more customization should your use case require it.
As an example, here are the migrations to make an hourly table for item drops. The first query is for Timescale. The second is for Clickhouse.
execute("""
CREATE MATERIALIZED VIEW item_drops_hourly
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', timestamp) as hourly, player_id, item_id, sum(amount) as amount
FROM item_drops
GROUP BY hourly, player_id, item_id;
""")Timescale Migration
# Create item_drops_hourly table
execute("""
CREATE TABLE item_drops_hourly
(
hourly DateTime64(6, 'UTC') CODEC (Delta, ZSTD),
player_id UUID CODEC (ZSTD),
item_id UInt32 CODEC (ZSTD),
amount SimpleAggregateFunction(sum, UInt64) CODEC (ZSTD),
grand_exchange_value SimpleAggregateFunction(sum, UInt64) CODEC (ZSTD),
high_alch_value SimpleAggregateFunction(sum, UInt64) CODEC (ZSTD),
npc_id Nullable(UInt32)
) ENGINE = AggregatingMergeTree
PRIMARY KEY (player_id, hourly, item_id, npc_id)
SETTINGS allow_nullable_key=1;
""")
# Create item_drops_hourly_mv materialized view
execute("""
CREATE MATERIALIZED VIEW item_drops_hourly_mv TO item_drops_hourly
AS
SELECT toStartOfHour(timestamp) as hourly,
player_id,
item_id,
amount as amount,
sum(grand_exchange_value) as grand_exchange_value,
sum(high_alch_value) as high_alch_value,
npc_id
FROM item_drops
GROUP BY hourly, player_id, item_id, amount, npc_id;
""")Clickhouse Migration
With Clickhouse I had to define the _hourly table that would store the results of the materialized view while Timescale assumes the structure of the table based on the query.
The last of the Clickhouse features I leveraged was Dictionaries to hold all of the static information that I previously JOINed against in Timescale. This is data such as Item IDs and Names, or NPC information such as combat level. Alongside having to create a table in Clickhouse to act as the Dictionary's data source (which means duplicate data), I also have to remember to refresh the dictionary if I ever update the underlying data due to game updates.
The Development Experience
Overall, Clickhouse has a great developer experience. Their documentation is quite thorough. I had some minor stumbling blocks using aggregate functions for the hourly views, but that is going to happen with any new system.
In theory, the extra work required to accomplish common tasks compared to Timescale would be annoying, but it is clear to see that the extra possible customization is a major upside.
I have had no issues composing my queries or migrations with Clickhouse's additions to SQL. Just be aware you will be using different functions for more complex queries compared to Postgres. Plausible's ecto_ch library works great and lets me use Ecto's syntax for most of my queries. The closest I have to a complaint is that streaming query results isn't as simple as it is in other Ecto implementations.
Why the Switch from Timescale?
There was no pressing technical reason to switch from Timescale to Clickhouse. Performance was great nor did I run into any issue with Timescale itself that forced me to migrate. Truthfully my primary reason is that I wanted to give Clickhouse a try. My secondary reason was a smattering of annoyances with the administration of Timescale and it's documentation.
Administration-wise, my complaints are the same in my previous article (and covered in more detail there). Updates are annoying since it requires having to restart postgres and run queries before any other query is run (so it requires stopping the Old School Snitch Service). With Clickhouse I update it through my package manager and restart the service, no need to cause any downtime beyond some queries requiring a retry. The switch to Clickhouse also means I have a much easier time handling major Postgres upgrades since it only contains the non-game data and is much faster to backup/restore for an upgrade (and could even use logical replication to handle a clean swap over. Clickhouse and Timescale update frequently with
My other complaint continues to be Timescale's documentation. In my experience, Clickhouse has much more detailed documentation that is organized a bit better. My primary issue with Timescale's was much of their administration documentation is for their managed service. The self-hosted documentation is off hiding in its own section and not colocated with the main documentation. Compare this to Clickhouse who has a whole section dedicated to local server administration that is clearly differentiated from their Cloud service documentation.
Conclusion
I switched from Timescale to Clickhouse mostly as an excuse to try Clickhouse out. The migration to Clickhouse has worked out wonderfully for Old School Snitch. I gained improved performance and more capability at the minor cost of some extra work.
Clickhouse and Timescale updated pretty frequently with significant improvements and new features. With Clickhouse it is a lot easier to upgrade and take advantage of those new features. Plus with the majority of my data no longer in Postgres, I can be confidant that major Postgres upgrades will be quicker and easier to deal with.
Timescale worked well, but Clickhouse has provided better performance and made running the service easier.
140k Mining xp mining xp was gained during the composition of this post.