CodeClone
Structural code quality analysis for Python with baseline-aware CI governance, canonical reports, and a triage-first MCP control surface for agents and IDEs.
CodeClone provides deterministic structural code quality analysis for Python. It detects architectural duplication, computes quality metrics, and enforces CI gates — all with baseline-aware governance that separates known technical debt from new regressions. A triage-first MCP control surface exposes the same canonical pipeline to AI agents and IDEs.
Docs: orenlab.github.io/codeclone · Live sample report: orenlab.github.io/codeclone/examples/report/
[!NOTE] This README and docs site document the CodeClone
2.0release line. For the previous1.4.xline, see thev1.4.4README and thev1.4.4docs tree.
Features
- Clone detection — function (CFG fingerprint), block (statement windows), and segment (report-only) clones
- Structural findings — duplicated branch families, clone guard/exit divergence, and clone-cohort drift
- Quality metrics — cyclomatic complexity, coupling (CBO), cohesion (LCOM4), dependency cycles, adaptive depth profile, dead code, health score, and overloaded-module profiling
- Adoption & API — type/docstring annotation coverage, public API surface inventory and baseline diff
- Coverage Join — fuse external Cobertura XML into the current run to surface coverage hotspots and scope gaps
- Baseline governance — separates accepted legacy debt from new regressions; CI fails only on what changed
- Reports — interactive HTML, JSON, Markdown, SARIF, and text from one canonical report
- MCP control surface — triage-first agent and IDE interface over the same canonical pipeline; read-only by contract
- IDE & agent clients — VS Code extension, Claude Desktop bundle, and Codex plugin over the same MCP contract
- CI-first — deterministic output, stable ordering, exit code contract, pre-commit support
- Fast — incremental caching, parallel processing, warm-run optimization
Quick Start
uv tool install codeclone
codeclone . # analyze
codeclone . --html # HTML report
codeclone . --html --open-html-report # open in browser
codeclone . --json --md --sarif --text # all formats
codeclone . --ci # CI mode
More examples
# timestamped report snapshots
codeclone . --html --json --timestamped-report-paths
# changed-scope gating against git diff
codeclone . --changed-only --diff-against main
# shorthand: diff source for changed-scope review
codeclone . --paths-from-git-diff HEAD~1
Run without install
uvx codeclone@latest .
CI Integration
# 1. Generate baseline (commit to repo)
codeclone . --update-baseline
# 2. Add to CI pipeline
codeclone . --ci
What --ci enables
The --ci preset equals --fail-on-new --no-color --quiet.
When a trusted metrics baseline is loaded, CI mode also enables
--fail-on-new-metrics.
[!TIP] Run
codeclone . --update-baselineonce after install to establish your CI reference point. Commit the baseline file — it becomes the contract CI enforces on every push.
GitHub Action
CodeClone also ships a composite GitHub Action for PR and CI workflows:
- uses: orenlab/codeclone/.github/actions/codeclone@v2
with:
fail-on-new: "true"
sarif: "true"
pr-comment: "true"
It can:
- run baseline-aware gating
- generate JSON and SARIF reports
- upload SARIF to GitHub Code Scanning
- post or update a PR summary comment
Action docs: .github/actions/codeclone/README.md
Quality Gates
# Metrics thresholds
codeclone . --fail-complexity 20 --fail-coupling 10 --fail-cohesion 4 --fail-health 60
# Structural policies
codeclone . --fail-cycles --fail-dead-code
# Regression detection vs baseline
codeclone . --fail-on-new-metrics
# Adoption and API governance
codeclone . --min-typing-coverage 80 --min-docstring-coverage 60
codeclone . --fail-on-typing-regression --fail-on-docstring-regression
codeclone . --api-surface --update-metrics-baseline
codeclone . --fail-on-api-break
# Coverage Join — fuse external Cobertura XML into the review
codeclone . --coverage coverage.xml --fail-on-untested-hotspots --coverage-min 50
Gate details: Metrics and quality gates
Pre-commit
repos:
- repo: local
hooks:
- id: codeclone
name: CodeClone
entry: codeclone
language: system
pass_filenames: false
args: [ ".", "--ci" ]
types: [ python ]
MCP Control Surface
Triage-first MCP server for AI agents and IDE clients, built on the same canonical pipeline as the CLI. Read-only by contract: never mutates source, baselines, or repo state.
uv tool install "codeclone[mcp]"
# or
uv pip install "codeclone[mcp]"
# local stdio clients
codeclone-mcp --transport stdio
# remote / HTTP-only clients
codeclone-mcp --transport streamable-http
[!WARNING] Analysis tools require an absolute repository root. Relative roots such as
.are rejected. Keepstdioas the default transport for local IDE and agent clients; HTTP exposure beyond loopback requires explicit--allow-remote.
MCP usage guide · MCP interface contract
Native Client Surfaces
| Surface | Location | Purpose |
|---|---|---|
| VS Code extension | VS Code Marketplace | Triage-first structural review in the editor |
| Claude Desktop bundle | extensions/claude-desktop-codeclone/ | Local .mcpb install with pre-loaded instructions |
| Codex plugin | plugins/codeclone/ | Native discovery, two skills, and MCP definition |
All three are native clients over the same codeclone-mcp contract — no second analysis engine.
VS Code extension docs · Claude Desktop docs · Codex plugin docs
Configuration
CodeClone can load project-level configuration from pyproject.toml:
[tool.codeclone]
min_loc = 10
min_stmt = 6
baseline = "codeclone.baseline.json"
golden_fixture_paths = ["tests/fixtures/golden_*"]
skip_metrics = false
quiet = false
html_out = ".cache/codeclone/report.html"
json_out = ".cache/codeclone/report.json"
md_out = ".cache/codeclone/report.md"
sarif_out = ".cache/codeclone/report.sarif"
text_out = ".cache/codeclone/report.txt"
block_min_loc = 20
block_min_stmt = 8
segment_min_loc = 20
segment_min_stmt = 10
Precedence: CLI flags > pyproject.toml > built-in defaults.
Config reference: Config and defaults
Baseline Workflow
Baselines capture the current duplication state. Once committed, they become the CI reference point.
- Clones are classified as NEW (not in baseline) or KNOWN (accepted debt)
--update-baselinewrites both clone and metrics snapshots- Trust is verified via
generator,fingerprint_version, andpayload_sha256 - In
--cimode, an untrusted baseline is a contract error (exit 2)
Full contract: Baseline contract
Exit Codes
| Code | Meaning |
|---|---|
0 | Success |
2 | Contract error — untrusted baseline, invalid config, unreadable sources in CI |
3 | Gating failure — new clones or metric threshold exceeded |
5 | Internal error |
Contract errors (2) take precedence over gating failures (3).
Full policy: Exit codes and failure policy
Reports
| Format | Flag | Default path |
|---|---|---|
| HTML | --html | .cache/codeclone/report.html |
| JSON | --json | .cache/codeclone/report.json |
| Markdown | --md | .cache/codeclone/report.md |
| SARIF | --sarif | .cache/codeclone/report.sarif |
| Text | --text | .cache/codeclone/report.txt |
All formats are rendered from one canonical JSON report.
--open-html-report opens the HTML in the default browser.
--timestamped-report-paths appends a UTC timestamp to default filenames.
Report contract: Report contract · HTML render
Canonical JSON report shape (v2.10)
{
"report_schema_version": "2.10",
"meta": {
"codeclone_version": "2.0.0",
"project_name": "...",
"scan_root": ".",
"report_mode": "full",
"analysis_profile": {
"min_loc": 10,
"min_stmt": 6,
"block_min_loc": 20,
"block_min_stmt": 8,
"segment_min_loc": 20,
"segment_min_stmt": 10
},
"analysis_thresholds": {
"design_findings": {
"...": "..."
}
},
"baseline": {
"...": "..."
},
"cache": {
"...": "..."
},
"metrics_baseline": {
"...": "..."
},
"runtime": {
"analysis_started_at_utc": "...",
"report_generated_at_utc": "..."
}
},
"inventory": {
"files": {
"...": "..."
},
"code": {
"...": "..."
},
"file_registry": {
"encoding": "relative_path",
"items": []
}
},
"findings": {
"summary": {
"...": "..."
},
"groups": {
"clones": {
"functions": [],
"blocks": [],
"segments": []
},
"structural": {
"groups": []
},
"dead_code": {
"groups": []
},
"design": {
"groups": []
}
}
},
"metrics": {
"summary": {
"...": "...",
"coverage_adoption": {
"...": "..."
},
"coverage_join": {
"...": "..."
},
"api_surface": {
"...": "..."
}
},
"families": {
"...": "...",
"coverage_adoption": {
"...": "..."
},
"coverage_join": {
"...": "..."
},
"api_surface": {
"...": "..."
}
}
},
"derived": {
"suggestions": [],
"overview": {
"families": {},
"top_risks": [],
"source_scope_breakdown": {},
"health_snapshot": {},
"directory_hotspots": {}
},
"hotlists": {
"most_actionable_ids": [],
"highest_spread_ids": [],
"production_hotspot_ids": [],
"test_fixture_hotspot_ids": []
}
},
"integrity": {
"canonicalization": {
"version": "1",
"scope": "canonical_only"
},
"digest": {
"algorithm": "sha256",
"verified": true,
"value": "..."
}
}
}
Full contract: Report contract
Inline Suppressions
When a symbol is invoked through runtime dynamics (framework callbacks, plugin loading, reflection), suppress the known false positive at the declaration site:
# codeclone: ignore[dead-code]
def handle_exception(exc: Exception) -> None:
...
class Middleware: # codeclone: ignore[dead-code]
...
Suppression contract: Inline suppressions · Dead-code contract
How It Works
Pipeline overview
Python source
│
▼
Parse ──────── AST per file
│
▼
Normalize ───── canonical structure (rename/format-resistant)
│
▼
CFG ─────────── per-function control flow graph
│
▼
Fingerprint ──── stable hash per function / block / segment
│
▼
Group ────────── clone groups + structural findings
│
▼
Metrics ─────── complexity · coupling · cohesion · dependencies
dead code · adoption · security surfaces · health
│
▼
Gate ────────── baseline diff · threshold checks · CI exit codes
│
▼
Report ─────── HTML · JSON · Markdown · SARIF · text
Architecture: Architecture narrative · CFG semantics: CFG semantics
Documentation
Full docs and contract book: orenlab.github.io/codeclone
Quick links: Baseline · Report · Metrics & gates · MCP · CLI
Benchmarking Notes
Reproducible Docker Benchmark
./benchmarks/run_docker_benchmark.sh
The wrapper builds benchmarks/Dockerfile, runs isolated container benchmarks, and writes results to
.cache/benchmarks/codeclone-benchmark.json.
Use environment overrides to pin the benchmark envelope:
CPUSET=0 CPUS=1.0 MEMORY=2g RUNS=16 WARMUPS=4 \
./benchmarks/run_docker_benchmark.sh
Performance claims are backed by the reproducible benchmark workflow documented in Benchmarking contract
License
- Code: MPL-2.0 (
LICENSE) - Documentation and docs-site content: MIT (
LICENSE-MIT)
Versions released before this change remain under their original license terms.
Links
- Docs: https://orenlab.github.io/codeclone/
- Issues: https://github.com/orenlab/codeclone/issues
- Discussions: https://github.com/orenlab/codeclone/discussions
- PyPI: https://pypi.org/project/codeclone/
- **Licenses: ** MPL-2.0 · MIT docs · Scope map
相關伺服器
Alpha Vantage MCP Server
贊助Access financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
MCPShield
Security scanner for MCP servers — detects tool poisoning, prompt injection, and 90+ vulnerability patterns
B12 Website Generator
An AI-powered website generator from B12, requiring no external data files.
VibeLogin MCP
Add authentication to your app - no code, no config, never leave your IDE
Language Server
MCP Language Server gives MCP enabled clients access to semantic tools like get definition, references, rename, and diagnostics.
zig-mcp
MCP server for Zig that connects AI coding assistants to ZLS (Zig Language Server) via LSP — 16 tools for code intelligence, build, and test.
AI Agent with MCP
An AI agent using the Model Context Protocol (MCP) with a Node.js server providing REST resources for users and messages.
JS Development MCP Server
A server for JavaScript/TypeScript development with intelligent project tooling and testing capabilities.
Remote MCP Server on Cloudflare
A template for deploying a remote MCP server on Cloudflare Workers, allowing for custom tool integration.
ButterKit
Ship App Store screenshots from your AI coding agent: generate, localize to all 50 App Store languages, and upload to App Store Connect
mistral-mcp
MCP server exposing the full Mistral AI surface (chat, OCR, Codestral FIM, Voxtral audio, vision, agents, moderation, classification, files, batch). Stdio + Streamable HTTP, BYOK with Mistral's free 1B tokens/month