clickhouse-best-practices

28 reglas de mejores prácticas de ClickHouse organizadas por diseño de esquema, optimización de consultas y estrategia de ingesta de datos. Cubre tres áreas críticas: selección de clave primaria y tipo de datos (decisiones de diseño inmutables), optimización de JOIN y consultas, y agrupación de inserciones con evitación de mutaciones. Incluye 28 reglas priorizadas por impacto, con reglas de diseño de esquema y optimización de consultas marcadas como CRÍTICAS debido al almacenamiento columnar y los mecanismos de índices dispersos de ClickHouse. Proporciona procedimientos de revisión estructurados para...

npx skills add https://github.com/clickhouse/agent-skills --skill clickhouse-best-practices

ClickHouse Best Practices

Comprehensive guidance for ClickHouse covering schema design, query optimization, data ingestion, and AI agent connectivity. Contains 31 rules across 4 main categories (schema, query, insert, agent), prioritized by impact.

Official docs: ClickHouse Best Practices

IMPORTANT: How to Apply This Skill

Before answering ClickHouse questions, follow this priority order:

  1. Check for applicable rules in the rules/ directory
  2. If rules exist: Apply them and cite them in your response using "Per rule-name..."
  3. If no rule exists: Use the LLM's ClickHouse knowledge or search documentation
  4. If uncertain: Use web search for current best practices
  5. Always cite your source: rule name, "general ClickHouse guidance", or URL

Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance.


Agent Connectivity & Query Workflow

Before querying ClickHouse, agents must establish a connection and follow the discovery workflow:

  1. rules/agent-connect-mcp.md - Connection setup (MCP + CLI), credential discovery, output format selection
  2. rules/agent-discovery-schema.md - CRITICAL: 7-step schema discovery workflow
  3. rules/agent-query-safety.md - CRITICAL: LIMIT, timeouts, progressive exploration

Every agent session should follow this sequence:

  1. Connect — establish connection via MCP or CLI (see agent-connect-mcp)
  2. Discover — databases → tables → columns + comments → sort keys → skip indexes → sample → EXPLAIN
  3. Plan — use sort key and skip index knowledge to write efficient WHERE clauses
  4. Execute — run queries with LIMIT and timeouts
  5. Recover — on timeout/memory errors, narrow filters and retry (see agent-query-safety)

Subagent architecture notes

If your system dispatches ClickHouse tasks to specialized subagents:

  • Schema discovery + query execution: any model — the steps are procedural
  • EXPLAIN analysis + query optimization: benefits from mid-tier reasoning
  • Schema design review against all 28 rules: benefits from mid-tier reasoning

Review Procedures

For Schema Reviews (CREATE TABLE, ALTER TABLE)

Read these rule files in order:

  1. rules/schema-pk-plan-before-creation.md - ORDER BY is immutable
  2. rules/schema-pk-cardinality-order.md - Column ordering in keys
  3. rules/schema-pk-prioritize-filters.md - Filter column inclusion
  4. rules/schema-types-native-types.md - Proper type selection
  5. rules/schema-types-minimize-bitwidth.md - Numeric type sizing
  6. rules/schema-types-lowcardinality.md - LowCardinality usage
  7. rules/schema-types-avoid-nullable.md - Nullable vs DEFAULT
  8. rules/schema-partition-low-cardinality.md - Partition count limits
  9. rules/schema-partition-lifecycle.md - Partitioning purpose

Check for:

  • PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
  • Data types match actual data ranges
  • LowCardinality applied to appropriate string columns
  • Partition key cardinality bounded (100-1,000 values)
  • ReplacingMergeTree has version column if used

For Query Reviews (SELECT, JOIN, aggregations)

Read these rule files:

  1. rules/query-join-choose-algorithm.md - Algorithm selection
  2. rules/query-join-filter-before.md - Pre-join filtering
  3. rules/query-join-use-any.md - ANY vs regular JOIN
  4. rules/query-index-skipping-indices.md - Secondary index usage
  5. rules/schema-pk-filter-on-orderby.md - Filter alignment with ORDER BY

Check for:

  • Filters use ORDER BY prefix columns
  • JOINs filter tables before joining (not after)
  • Correct JOIN algorithm for table sizes
  • Skipping indices for non-ORDER BY filter columns

For Insert Strategy Reviews (data ingestion, updates, deletes)

Read these rule files:

  1. rules/insert-batch-size.md - Batch sizing requirements
  2. rules/insert-mutation-avoid-update.md - UPDATE alternatives
  3. rules/insert-mutation-avoid-delete.md - DELETE alternatives
  4. rules/insert-async-small-batches.md - Async insert usage
  5. rules/insert-optimize-avoid-final.md - OPTIMIZE TABLE risks

Check for:

  • Batch size 10K-100K rows per INSERT
  • No ALTER TABLE UPDATE for frequent changes
  • ReplacingMergeTree or CollapsingMergeTree for update patterns
  • Async inserts enabled for high-frequency small batches

Output Format

Structure your response as follows:

## Rules Checked
- `rule-name-1` - Compliant / Violation found
- `rule-name-2` - Compliant / Violation found
...

## Findings

### Violations
- **`rule-name`**: Description of the issue
  - Current: [what the code does]
  - Required: [what it should do]
  - Fix: [specific correction]

### Compliant
- `rule-name`: Brief note on why it's correct

## Recommendations
[Prioritized list of changes, citing rules]

Rule Categories by Priority

PriorityCategoryImpactPrefixRule Count
1Primary Key SelectionCRITICALschema-pk-4
2Data Type SelectionCRITICALschema-types-5
3JOIN OptimizationCRITICALquery-join-5
4Insert BatchingCRITICALinsert-batch-1
5Mutation AvoidanceCRITICALinsert-mutation-2
6Partitioning StrategyHIGHschema-partition-4
7Skipping IndicesHIGHquery-index-1
8Materialized ViewsHIGHquery-mv-2
9Async InsertsHIGHinsert-async-2
10OPTIMIZE AvoidanceHIGHinsert-optimize-1
11JSON UsageMEDIUMschema-json-1
12Agent Schema DiscoveryCRITICALagent-discovery-1
13Agent Query SafetyCRITICALagent-query-1
14Agent Connectivity + FormatsHIGHagent-connect-1

Quick Reference

Schema Design - Primary Key (CRITICAL)

  • schema-pk-plan-before-creation - Plan ORDER BY before table creation (immutable)
  • schema-pk-cardinality-order - Order columns low-to-high cardinality
  • schema-pk-prioritize-filters - Include frequently filtered columns
  • schema-pk-filter-on-orderby - Query filters must use ORDER BY prefix

Schema Design - Data Types (CRITICAL)

  • schema-types-native-types - Use native types, not String for everything
  • schema-types-minimize-bitwidth - Use smallest numeric type that fits
  • schema-types-lowcardinality - LowCardinality for <10K unique strings
  • schema-types-enum - Enum for finite value sets with validation
  • schema-types-avoid-nullable - Avoid Nullable; use DEFAULT instead

Schema Design - Partitioning (HIGH)

  • schema-partition-low-cardinality - Keep partition count 100-1,000
  • schema-partition-lifecycle - Use partitioning for data lifecycle, not queries
  • schema-partition-query-tradeoffs - Understand partition pruning trade-offs
  • schema-partition-start-without - Consider starting without partitioning

Schema Design - JSON (MEDIUM)

  • schema-json-when-to-use - JSON for dynamic schemas; typed columns for known

Query Optimization - JOINs (CRITICAL)

  • query-join-choose-algorithm - Select algorithm based on table sizes
  • query-join-use-any - ANY JOIN when only one match needed
  • query-join-filter-before - Filter tables before joining
  • query-join-consider-alternatives - Dictionaries/denormalization vs JOIN
  • query-join-null-handling - join_use_nulls=0 for default values

Query Optimization - Indices (HIGH)

  • query-index-skipping-indices - Skipping indices for non-ORDER BY filters

Query Optimization - Materialized Views (HIGH)

  • query-mv-incremental - Incremental MVs for real-time aggregations
  • query-mv-refreshable - Refreshable MVs for complex joins

Insert Strategy - Batching (CRITICAL)

  • insert-batch-size - Batch 10K-100K rows per INSERT

Insert Strategy - Async (HIGH)

  • insert-async-small-batches - Async inserts for high-frequency small batches
  • insert-format-native - Native format for best performance

Insert Strategy - Mutations (CRITICAL)

  • insert-mutation-avoid-update - ReplacingMergeTree instead of ALTER UPDATE
  • insert-mutation-avoid-delete - Lightweight DELETE or DROP PARTITION

Insert Strategy - Optimization (HIGH)

  • insert-optimize-avoid-final - Let background merges work

Agent Integration - Discovery (CRITICAL)

  • agent-discovery-schema - Always discover schema before querying

Agent Integration - Safety (CRITICAL)

  • agent-query-safety - LIMIT, timeouts, progressive exploration

Agent Integration - Connectivity + Formats (HIGH)

  • agent-connect-mcp - MCP + CLI setup, credential discovery, output format selection

When to Apply

This skill activates when you encounter:

  • AI agent connecting to ClickHouse (MCP, CLI, HTTP)

  • Agent workflow design for ClickHouse

  • Schema discovery or exploration requests

  • CREATE TABLE statements

  • ALTER TABLE modifications

  • ORDER BY or PRIMARY KEY discussions

  • Data type selection questions

  • Slow query troubleshooting

  • JOIN optimization requests

  • Data ingestion pipeline design

  • Update/delete strategy questions

  • ReplacingMergeTree or other specialized engine usage

  • Partitioning strategy decisions


Rule File Structure

Each rule file in rules/ contains:

  • YAML frontmatter: title, impact level, tags
  • Brief explanation: Why this rule matters
  • Incorrect example: Anti-pattern with explanation
  • Correct example: Best practice with explanation
  • Additional context: Trade-offs, when to apply, references

Full Compiled Document

For the complete guide with all rules expanded inline: AGENTS.md

Use AGENTS.md when you need to check multiple rules quickly without reading individual files.

Más skills de clickhouse

chdb-sql
clickhouse
Ejecuta ClickHouse SQL directamente en Python, sin necesidad de servidor. Consulta archivos locales, bases de datos remotas y almacenamiento en la nube con todo el poder de ClickHouse SQL.
official
chdb-datastore
clickhouse
DataStore es un reemplazo perezoso de pandas respaldado por ClickHouse. Tu código existente de pandas funciona sin cambios, pero las operaciones se compilan a SQL optimizado y se ejecutan solo cuando se necesitan los resultados (por ejemplo, print(), len(), iteración).
official
clickhouse-architecture-advisor
clickhouse
DEBE USARSE al diseñar arquitecturas de ClickHouse, al seleccionar entre patrones de ingesta o modelado, o al traducir mejores prácticas en sistemas específicos de cargas de trabajo…
official
clickhousectl-cloud-deploy
clickhouse
Usar cuando un usuario quiera implementar ClickHouse en la nube, pasar a producción, usar ClickHouse Cloud, alojar un servicio administrado de ClickHouse o migrar desde un entorno local…
official
clickhousectl-local-dev
clickhouse
Úsese cuando un usuario quiera construir una aplicación con ClickHouse, configurar un entorno de desarrollo local de ClickHouse, instalar ClickHouse, crear un servidor local,…
official
setup
clickhouse
Guía a los usuarios en la configuración de la conexión del servidor ClickHouse MCP incluido con este plugin. Úsalo cuando el usuario instale el plugin por primera vez o tenga problemas…
official
clickhouse-js-node-coding
clickhouse
Referencia: https://clickhouse.com/docs/integrations/javascript
official
clickhouse-js-node-troubleshooting
clickhouse
Referencia: https://clickhouse.com/docs/integrations/javascript
official