ClickHouse 🏠

Real-time analytics on billions of rows. Sub-second queries. No indexes needed.

Setup

On first use, read setup.md for connection configuration.

When to Use

User needs OLAP analytics, log analysis, time-series data, or real-time dashboards. Agent handles schema design, query optimization, data ingestion, and cluster administration.

Architecture

Memory lives in ~/clickhouse/. See memory-template.md for structure.

CODEBLOCK0

Quick Reference

Topic	File
Setup & connection	INLINECODE3
Memory template

Core Rules

1. Always Specify Engine

Every table needs an explicit engine. Default to MergeTree family:

CODEBLOCK1

2. ORDER BY is Your Index

ClickHouse has no traditional indexes. The ORDER BY clause determines data layout:

- Put high-cardinality filter columns first
Put range columns (dates, timestamps) early
Match your most common WHERE patterns

CODEBLOCK2

3. Use Appropriate Data Types

Use Case	Type	Why
Timestamps	INLINECODE9 or INLINECODE10	Native time functions
Low-cardinality strings

4. Batch Inserts

Never insert row-by-row. ClickHouse is optimized for batch writes:

CODEBLOCK3

Minimum batch: 1,000 rows. Optimal: 10,000-100,000 rows.

5. Prewarm Queries with FINAL

Queries on ReplacingMergeTree/CollapsingMergeTree need FINAL for accuracy:

CODEBLOCK4

INLINECODE18 has performance cost. For dashboards, consider materialized views.

6. Materialized Views for Speed

Pre-aggregate expensive computations:

CODEBLOCK5

7. Check System Tables First

Before debugging, check system tables:

CODEBLOCK6

Common Traps

- String instead of LowCardinality → 10x larger storage for status/type columns
Wrong ORDER BY → Full table scans instead of index lookups
Row-by-row inserts → Massive part fragmentation, slow writes
Missing TTL → Unbounded table growth, disk full
SELECT * → Reads all columns, kills columnar advantage
Nullable everywhere → Overhead + NULL handling complexity
Forgetting FINAL → Stale/duplicate data in merge tables

Performance Checklist

Before running expensive queries:

1. Check EXPLAIN: EXPLAIN SELECT ... shows execution plan
Sample first: SELECT ... FROM table SAMPLE 0.01 for 1% sample
Limit columns: Only SELECT what you need
Use PREWHERE: Filters before reading all columns
Check parts: INLINECODE21

CODEBLOCK7

Cluster Administration

Adding TTL for Data Retention

CODEBLOCK8

Monitoring Disk Usage

CODEBLOCK9

External Endpoints

Endpoint	Data Sent	Purpose
localhost:8123	SQL queries	HTTP interface
localhost:9000

SQL queries | Native TCP interface |

No external services contacted. All queries run against user-specified ClickHouse instances.

Security & Privacy

Data saved locally (with user consent):

- Connection profiles (host, port, database) in ~/clickhouse/memory.md
Query patterns and schema documentation
Authentication method preferences (password vs certificate)

Important: If you provide database passwords, they are stored in plain text in ~/clickhouse/. Consider using environment variables or connection profiles managed by clickhouse-client instead.

This skill does NOT:

- Connect to any ClickHouse without explicit user configuration
Send data to external services
Automatically collect or store credentials without asking

Related Skills

Install with clawhub install <slug> if user confirms:

- sql — SQL query patterns
INLINECODE24 — data analysis workflows
INLINECODE25 — structured data exploration

Feedback

- If useful: INLINECODE26
Stay updated: INLINECODE27

ClickHouse 🏠

对数十亿行数据进行实时分析。亚秒级查询。无需索引。

设置

首次使用时，请阅读 setup.md 了解连接配置。

使用场景

用户需要OLAP分析、日志分析、时序数据或实时仪表盘。智能体负责处理模式设计、查询优化、数据摄入和集群管理。

架构

内存文件位于 ~/clickhouse/。结构请参考 memory-template.md。

~/clickhouse/
├── memory.md # 连接配置 + 查询模式
├── schemas/ # 各数据库的表定义
└── queries/ # 保存的分析查询

快速参考

主题	文件
设置与连接	setup.md
内存模板

核心规则

1. 始终指定引擎

每个表都需要显式指定引擎。默认使用MergeTree系列：

sql
-- 时序数据/日志
CREATE TABLE events (
timestamp DateTime,
event_type String,
data String
) ENGINE = MergeTree()
ORDER BY (timestamp, event_type);

-- 聚合指标
CREATE TABLE daily_stats (
date Date,
metric String,
value AggregateFunction(sum, UInt64)
) ENGINE = AggregatingMergeTree()
ORDER BY (date, metric);

2. ORDER BY 就是你的索引

ClickHouse没有传统索引。ORDER BY 子句决定了数据布局：

- 将高基数的过滤列放在前面
将范围列（日期、时间戳）放在前面
匹配最常见的WHERE模式

sql
-- 好：先按user_id过滤，再按日期范围
ORDER BY (userid, date, eventtype)

-- 差：按user_id过滤时却把date放在前面
ORDER BY (date, userid, eventtype)

3. 使用合适的数据类型

使用场景	类型	原因
时间戳	DateTime 或 DateTime64	原生时间函数
低基数字符串

4. 批量插入

永远不要逐行插入。ClickHouse针对批量写入进行了优化：

bash

好：批量插入

clickhouse-client --query=INSERT INTO events FORMAT JSONEachRow < batch.json

差：循环逐行插入

for row in data: INSERT INTO events VALUES (...)

最小批次：1,000行。最佳批次：10,000-100,000行。

5. 使用FINAL预热查询

对ReplacingMergeTree/CollapsingMergeTree的查询需要使用 FINAL 确保准确性：

sql
-- 可能返回重复/旧版本
SELECT * FROM users WHERE id = 123;

-- 保证返回最新版本
SELECT * FROM users FINAL WHERE id = 123;

FINAL 有性能开销。对于仪表盘，考虑使用物化视图。

6. 使用物化视图提升速度

预聚合昂贵的计算：

sql
CREATE MATERIALIZED VIEW hourly_events
ENGINE = SummingMergeTree()
ORDER BY (hour, event_type)
AS SELECT
toStartOfHour(timestamp) AS hour,
event_type,
count() AS events
FROM events
GROUP BY hour, event_type;

7. 优先检查系统表

在调试之前，先检查系统表：

sql
-- 正在运行的查询
SELECT * FROM system.processes;

-- 最近的查询性能
SELECT query, elapsed, readrows, memoryusage
FROM system.query_log
WHERE type = QueryFinish
ORDER BY event_time DESC
LIMIT 10;

-- 表大小
SELECT database, table, formatReadableSize(total_bytes) as size
FROM system.tables
ORDER BY total_bytes DESC;

常见陷阱

- 使用String而非LowCardinality → 状态/类型列存储空间增大10倍
错误的ORDER BY → 全表扫描而非索引查找
逐行插入 → 大量分区碎片，写入缓慢
缺少TTL → 表无限增长，磁盘满
SELECT * → 读取所有列，丧失列式优势
到处使用Nullable → 开销 + NULL处理复杂性
忘记FINAL → 合并表中出现过期/重复数据

性能检查清单

在执行昂贵查询之前：

1. 检查EXPLAIN：EXPLAIN SELECT ... 显示执行计划
先采样：SELECT ... FROM table SAMPLE 0.01 获取1%样本
限制列数：只SELECT需要的列
使用PREWHERE：在读取所有列之前进行过滤
检查分区：SELECT count() FROM system.parts WHERE table=X

sql
-- PREWHERE优化
SELECT userid, eventtype, data
FROM events
PREWHERE date = today()
WHERE event_type = click;

集群管理

添加TTL进行数据保留

sql
-- 删除旧数据
ALTER TABLE events
MODIFY TTL timestamp + INTERVAL 90 DAY;

-- 移动到冷存储
ALTER TABLE events
MODIFY TTL timestamp + INTERVAL 30 DAY TO VOLUME cold;

监控磁盘使用

sql
SELECT
database,
table,
formatReadableSize(sum(bytesondisk)) as disk_size,
sum(rows) as total_rows,
count() as parts
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytesondisk) DESC;

外部端点

端点	发送数据	用途
localhost:8123	SQL查询	HTTP接口
localhost:9000

SQL查询 | 原生TCP接口 |

不联系任何外部服务。所有查询都在用户指定的ClickHouse实例上运行。

安全与隐私

本地保存的数据（经用户同意）：

- 连接配置（主机、端口、数据库）保存在 ~/clickhouse/memory.md
查询模式和模式文档
认证方式偏好（密码 vs 证书）

重要提示： 如果您提供数据库密码，它们将以明文形式存储在 ~/clickhouse/ 中。建议使用环境变量或由clickhouse-client管理的连接配置。

此技能不会：

- 在没有用户明确配置的情况下连接到任何ClickHouse
将数据发送到外部服务
未经询问自动收集或存储凭据

反馈

- 如果觉得有用：clawhub star clickhouse
保持更新：clawhub sync

ClickHouseClickHouse管理

ClickHouse

ClickHouse 🏠

Setup

When to Use

Architecture

Quick Reference

Core Rules

1. Always Specify Engine

2. ORDER BY is Your Index

3. Use Appropriate Data Types

4. Batch Inserts

5. Prewarm Queries with FINAL

6. Materialized Views for Speed

7. Check System Tables First

Common Traps

Performance Checklist

Cluster Administration

Adding TTL for Data Retention

Monitoring Disk Usage

External Endpoints

Security & Privacy

Related Skills

Feedback

ClickHouse 🏠

设置

使用场景

架构

快速参考

核心规则

1. 始终指定引擎

2. ORDER BY 就是你的索引

3. 使用合适的数据类型

4. 批量插入

好：批量插入

差：循环逐行插入

5. 使用FINAL预热查询

6. 使用物化视图提升速度

7. 优先检查系统表

常见陷阱

性能检查清单

集群管理

添加TTL进行数据保留

监控磁盘使用

外部端点

安全与隐私

相关技能

反馈

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement