CodeClone
Structural code quality analysis for Python with baseline-aware CI governance, canonical reports, and a triage-first MCP control surface for agents and IDEs.
CodeClone provides deterministic structural code quality analysis for Python. It detects architectural duplication, computes quality metrics, and enforces CI gates — all with baseline-aware governance that separates known technical debt from new regressions. A triage-first MCP control surface exposes the same canonical pipeline to AI agents and IDEs.
Docs: orenlab.github.io/codeclone · Live sample report: orenlab.github.io/codeclone/examples/report/
[!NOTE] This README and docs site document the CodeClone
2.0release line. For the previous1.4.xline, see thev1.4.4README and thev1.4.4docs tree.
Features
- Clone detection — function (CFG fingerprint), block (statement windows), and segment (report-only) clones
- Structural findings — duplicated branch families, clone guard/exit divergence, and clone-cohort drift
- Quality metrics — cyclomatic complexity, coupling (CBO), cohesion (LCOM4), dependency cycles, adaptive depth profile, dead code, health score, and overloaded-module profiling
- Adoption & API — type/docstring annotation coverage, public API surface inventory and baseline diff
- Coverage Join — fuse external Cobertura XML into the current run to surface coverage hotspots and scope gaps
- Baseline governance — separates accepted legacy debt from new regressions; CI fails only on what changed
- Reports — interactive HTML, JSON, Markdown, SARIF, and text from one canonical report
- MCP control surface — triage-first agent and IDE interface over the same canonical pipeline; read-only by contract
- IDE & agent clients — VS Code extension, Claude Desktop bundle, and Codex plugin over the same MCP contract
- CI-first — deterministic output, stable ordering, exit code contract, pre-commit support
- Fast — incremental caching, parallel processing, warm-run optimization
Quick Start
uv tool install codeclone
codeclone . # analyze
codeclone . --html # HTML report
codeclone . --html --open-html-report # open in browser
codeclone . --json --md --sarif --text # all formats
codeclone . --ci # CI mode
More examples
# timestamped report snapshots
codeclone . --html --json --timestamped-report-paths
# changed-scope gating against git diff
codeclone . --changed-only --diff-against main
# shorthand: diff source for changed-scope review
codeclone . --paths-from-git-diff HEAD~1
Run without install
uvx codeclone@latest .
CI Integration
# 1. Generate baseline (commit to repo)
codeclone . --update-baseline
# 2. Add to CI pipeline
codeclone . --ci
What --ci enables
The --ci preset equals --fail-on-new --no-color --quiet.
When a trusted metrics baseline is loaded, CI mode also enables
--fail-on-new-metrics.
[!TIP] Run
codeclone . --update-baselineonce after install to establish your CI reference point. Commit the baseline file — it becomes the contract CI enforces on every push.
GitHub Action
CodeClone also ships a composite GitHub Action for PR and CI workflows:
- uses: orenlab/codeclone/.github/actions/codeclone@v2
with:
fail-on-new: "true"
sarif: "true"
pr-comment: "true"
It can:
- run baseline-aware gating
- generate JSON and SARIF reports
- upload SARIF to GitHub Code Scanning
- post or update a PR summary comment
Action docs: .github/actions/codeclone/README.md
Quality Gates
# Metrics thresholds
codeclone . --fail-complexity 20 --fail-coupling 10 --fail-cohesion 4 --fail-health 60
# Structural policies
codeclone . --fail-cycles --fail-dead-code
# Regression detection vs baseline
codeclone . --fail-on-new-metrics
# Adoption and API governance
codeclone . --min-typing-coverage 80 --min-docstring-coverage 60
codeclone . --fail-on-typing-regression --fail-on-docstring-regression
codeclone . --api-surface --update-metrics-baseline
codeclone . --fail-on-api-break
# Coverage Join — fuse external Cobertura XML into the review
codeclone . --coverage coverage.xml --fail-on-untested-hotspots --coverage-min 50
Gate details: Metrics and quality gates
Pre-commit
repos:
- repo: local
hooks:
- id: codeclone
name: CodeClone
entry: codeclone
language: system
pass_filenames: false
args: [ ".", "--ci" ]
types: [ python ]
MCP Control Surface
Triage-first MCP server for AI agents and IDE clients, built on the same canonical pipeline as the CLI. Read-only by contract: never mutates source, baselines, or repo state.
uv tool install "codeclone[mcp]"
# or
uv pip install "codeclone[mcp]"
# local stdio clients
codeclone-mcp --transport stdio
# remote / HTTP-only clients
codeclone-mcp --transport streamable-http
[!WARNING] Analysis tools require an absolute repository root. Relative roots such as
.are rejected. Keepstdioas the default transport for local IDE and agent clients; HTTP exposure beyond loopback requires explicit--allow-remote.
MCP usage guide · MCP interface contract
Native Client Surfaces
| Surface | Location | Purpose |
|---|---|---|
| VS Code extension | VS Code Marketplace | Triage-first structural review in the editor |
| Claude Desktop bundle | extensions/claude-desktop-codeclone/ | Local .mcpb install with pre-loaded instructions |
| Codex plugin | plugins/codeclone/ | Native discovery, two skills, and MCP definition |
All three are native clients over the same codeclone-mcp contract — no second analysis engine.
VS Code extension docs · Claude Desktop docs · Codex plugin docs
Configuration
CodeClone can load project-level configuration from pyproject.toml:
[tool.codeclone]
min_loc = 10
min_stmt = 6
baseline = "codeclone.baseline.json"
golden_fixture_paths = ["tests/fixtures/golden_*"]
skip_metrics = false
quiet = false
html_out = ".cache/codeclone/report.html"
json_out = ".cache/codeclone/report.json"
md_out = ".cache/codeclone/report.md"
sarif_out = ".cache/codeclone/report.sarif"
text_out = ".cache/codeclone/report.txt"
block_min_loc = 20
block_min_stmt = 8
segment_min_loc = 20
segment_min_stmt = 10
Precedence: CLI flags > pyproject.toml > built-in defaults.
Config reference: Config and defaults
Baseline Workflow
Baselines capture the current duplication state. Once committed, they become the CI reference point.
- Clones are classified as NEW (not in baseline) or KNOWN (accepted debt)
--update-baselinewrites both clone and metrics snapshots- Trust is verified via
generator,fingerprint_version, andpayload_sha256 - In
--cimode, an untrusted baseline is a contract error (exit 2)
Full contract: Baseline contract
Exit Codes
| Code | Meaning |
|---|---|
0 | Success |
2 | Contract error — untrusted baseline, invalid config, unreadable sources in CI |
3 | Gating failure — new clones or metric threshold exceeded |
5 | Internal error |
Contract errors (2) take precedence over gating failures (3).
Full policy: Exit codes and failure policy
Reports
| Format | Flag | Default path |
|---|---|---|
| HTML | --html | .cache/codeclone/report.html |
| JSON | --json | .cache/codeclone/report.json |
| Markdown | --md | .cache/codeclone/report.md |
| SARIF | --sarif | .cache/codeclone/report.sarif |
| Text | --text | .cache/codeclone/report.txt |
All formats are rendered from one canonical JSON report.
--open-html-report opens the HTML in the default browser.
--timestamped-report-paths appends a UTC timestamp to default filenames.
Report contract: Report contract · HTML render
Canonical JSON report shape (v2.10)
{
"report_schema_version": "2.10",
"meta": {
"codeclone_version": "2.0.0",
"project_name": "...",
"scan_root": ".",
"report_mode": "full",
"analysis_profile": {
"min_loc": 10,
"min_stmt": 6,
"block_min_loc": 20,
"block_min_stmt": 8,
"segment_min_loc": 20,
"segment_min_stmt": 10
},
"analysis_thresholds": {
"design_findings": {
"...": "..."
}
},
"baseline": {
"...": "..."
},
"cache": {
"...": "..."
},
"metrics_baseline": {
"...": "..."
},
"runtime": {
"analysis_started_at_utc": "...",
"report_generated_at_utc": "..."
}
},
"inventory": {
"files": {
"...": "..."
},
"code": {
"...": "..."
},
"file_registry": {
"encoding": "relative_path",
"items": []
}
},
"findings": {
"summary": {
"...": "..."
},
"groups": {
"clones": {
"functions": [],
"blocks": [],
"segments": []
},
"structural": {
"groups": []
},
"dead_code": {
"groups": []
},
"design": {
"groups": []
}
}
},
"metrics": {
"summary": {
"...": "...",
"coverage_adoption": {
"...": "..."
},
"coverage_join": {
"...": "..."
},
"api_surface": {
"...": "..."
}
},
"families": {
"...": "...",
"coverage_adoption": {
"...": "..."
},
"coverage_join": {
"...": "..."
},
"api_surface": {
"...": "..."
}
}
},
"derived": {
"suggestions": [],
"overview": {
"families": {},
"top_risks": [],
"source_scope_breakdown": {},
"health_snapshot": {},
"directory_hotspots": {}
},
"hotlists": {
"most_actionable_ids": [],
"highest_spread_ids": [],
"production_hotspot_ids": [],
"test_fixture_hotspot_ids": []
}
},
"integrity": {
"canonicalization": {
"version": "1",
"scope": "canonical_only"
},
"digest": {
"algorithm": "sha256",
"verified": true,
"value": "..."
}
}
}
Full contract: Report contract
Inline Suppressions
When a symbol is invoked through runtime dynamics (framework callbacks, plugin loading, reflection), suppress the known false positive at the declaration site:
# codeclone: ignore[dead-code]
def handle_exception(exc: Exception) -> None:
...
class Middleware: # codeclone: ignore[dead-code]
...
Suppression contract: Inline suppressions · Dead-code contract
How It Works
Pipeline overview
Python source
│
▼
Parse ──────── AST per file
│
▼
Normalize ───── canonical structure (rename/format-resistant)
│
▼
CFG ─────────── per-function control flow graph
│
▼
Fingerprint ──── stable hash per function / block / segment
│
▼
Group ────────── clone groups + structural findings
│
▼
Metrics ─────── complexity · coupling · cohesion · dependencies
dead code · adoption · security surfaces · health
│
▼
Gate ────────── baseline diff · threshold checks · CI exit codes
│
▼
Report ─────── HTML · JSON · Markdown · SARIF · text
Architecture: Architecture narrative · CFG semantics: CFG semantics
Documentation
Full docs and contract book: orenlab.github.io/codeclone
Quick links: Baseline · Report · Metrics & gates · MCP · CLI
Benchmarking Notes
Reproducible Docker Benchmark
./benchmarks/run_docker_benchmark.sh
The wrapper builds benchmarks/Dockerfile, runs isolated container benchmarks, and writes results to
.cache/benchmarks/codeclone-benchmark.json.
Use environment overrides to pin the benchmark envelope:
CPUSET=0 CPUS=1.0 MEMORY=2g RUNS=16 WARMUPS=4 \
./benchmarks/run_docker_benchmark.sh
Performance claims are backed by the reproducible benchmark workflow documented in Benchmarking contract
License
- Code: MPL-2.0 (
LICENSE) - Documentation and docs-site content: MIT (
LICENSE-MIT)
Versions released before this change remain under their original license terms.
Links
- Docs: https://orenlab.github.io/codeclone/
- Issues: https://github.com/orenlab/codeclone/issues
- Discussions: https://github.com/orenlab/codeclone/discussions
- PyPI: https://pypi.org/project/codeclone/
- **Licenses: ** MPL-2.0 · MIT docs · Scope map
Máy chủ liên quan
Alpha Vantage MCP Server
nhà tài trợAccess financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
DocGen MCP Server
Automated documentation generator from source files on Google Drive and GitHub.
Squire
Remote runtimes for validation and offload jobs.
Assay
The firewall for MCP tool calls. Block unsafe calls, audit every decision, replay anything. Deterministic policy enforcement with replayable evidence bundles.
MCP Everything
A demonstration server for the Model Context Protocol (MCP) showcasing various features like tools, resources, and prompts in TypeScript and Python.
Lighthouse MCP Server
Audit web performance, accessibility, and SEO using Google Lighthouse.
https://github.com/LastEld/AMS
AMS – Deterministic Agent Pipeline with A2A‑style Orchestration and Cryptographic Audit
Hetzner Cloud MCP Server — (Cloud API + SSH)
Hetzner Cloud MCP Server — two management layers (Cloud API + SSH) with 60 tools. Manage server power, snapshots, firewalls, DNS, plus SSH into servers for service control, log viewing, Nginx management, MySQL queries, and system monitoring. Self-hosted PHP, MIT licensed.
JetBrains
Work on your code with JetBrains IDEs
공공 API 연동 MCP 샘플
Integrates the Korea Meteorological Administration's public weather API to provide climate data.
DeepInfra API
Provides a full suite of AI tools via DeepInfra’s OpenAI-compatible API, including image generation, text processing, embeddings, and speech recognition.