improve-codebase-architecture

作者: mattpocock

在代码库中发现深层优化机会,依据CONTEXT.md中的领域语言和docs/adr/中的决策。适用于用户希望改进架构、寻找重构机会、整合紧耦合模块,或使代码库更易于测试和AI导航时使用。

npx skills add https://github.com/mattpocock/skills --skill improve-codebase-architecture

Improve Codebase Architecture

Surface architectural friction and propose deepening opportunities — refactors that turn shallow modules into deep ones. The aim is testability and AI-navigability.

Glossary

Use these terms exactly in every suggestion. Consistent language is the point — don't drift into "component," "service," "API," or "boundary." Full definitions in LANGUAGE.md.

  • Module — anything with an interface and an implementation (function, class, package, slice).
  • Interface — everything a caller must know to use the module: types, invariants, error modes, ordering, config. Not just the type signature.
  • Implementation — the code inside.
  • Depth — leverage at the interface: a lot of behaviour behind a small interface. Deep = high leverage. Shallow = interface nearly as complex as the implementation.
  • Seam — where an interface lives; a place behaviour can be altered without editing in place. (Use this, not "boundary.")
  • Adapter — a concrete thing satisfying an interface at a seam.
  • Leverage — what callers get from depth.
  • Locality — what maintainers get from depth: change, bugs, knowledge concentrated in one place.

Key principles (see LANGUAGE.md for the full list):

  • Deletion test: imagine deleting the module. If complexity vanishes, it was a pass-through. If complexity reappears across N callers, it was earning its keep.
  • The interface is the test surface.
  • One adapter = hypothetical seam. Two adapters = real seam.

This skill is informed by the project's domain model. The domain language gives names to good seams; ADRs record decisions the skill should not re-litigate.

Process

1. Explore

Read the project's domain glossary and any ADRs in the area you're touching first.

Then use the Agent tool with subagent_type=Explore to walk the codebase. Don't follow rigid heuristics — explore organically and note where you experience friction:

  • Where does understanding one concept require bouncing between many small modules?
  • Where are modules shallow — interface nearly as complex as the implementation?
  • Where have pure functions been extracted just for testability, but the real bugs hide in how they're called (no locality)?
  • Where do tightly-coupled modules leak across their seams?
  • Which parts of the codebase are untested, or hard to test through their current interface?

Apply the deletion test to anything you suspect is shallow: would deleting it concentrate complexity, or just move it? A "yes, concentrates" is the signal you want.

2. Present candidates as an HTML report

Write a self-contained HTML file to the OS temp directory so nothing lands in the repo. Resolve the temp dir from $TMPDIR, falling back to /tmp (or %TEMP% on Windows), and write to <tmpdir>/architecture-review-<timestamp>.html so each run gets a fresh file. Open it for the user — xdg-open <path> on Linux, open <path> on macOS, start <path> on Windows — and tell them the absolute path.

The report uses Tailwind via CDN for layout and styling, and Mermaid via CDN for diagrams where a graph/flow/sequence reliably communicates the structure. Mix Mermaid with hand-crafted CSS/SVG visuals — use Mermaid when relationships are graph-shaped (call graphs, dependencies, sequences), and hand-built divs/SVG when you want something more editorial (mass diagrams, cross-sections, collapse animations). Each candidate gets a before/after visualisation. Be visual.

For each candidate, the same template as before, but rendered as a card:

  • Files — which files/modules are involved
  • Problem — why the current architecture is causing friction
  • Solution — plain English description of what would change
  • Benefits — explained in terms of locality and leverage, and how tests would improve
  • Before / After diagram — side-by-side, custom-drawn, illustrating the shallowness and the deepening
  • Recommendation strength — one of Strong, Worth exploring, Speculative, rendered as a badge

End the report with a Top recommendation section: which candidate you'd tackle first and why.

Use CONTEXT.md vocabulary for the domain, and LANGUAGE.md vocabulary for the architecture. If CONTEXT.md defines "Order," talk about "the Order intake module" — not "the FooBarHandler," and not "the Order service."

ADR conflicts: if a candidate contradicts an existing ADR, only surface it when the friction is real enough to warrant revisiting the ADR. Mark it clearly in the card (e.g. a warning callout: "contradicts ADR-0007 — but worth reopening because…"). Don't list every theoretical refactor an ADR forbids.

See HTML-REPORT.md for the full HTML scaffold, diagram patterns, and styling guidance.

Do NOT propose interfaces yet. After the file is written, ask the user: "Which of these would you like to explore?"

3. Grilling loop

Once the user picks a candidate, drop into a grilling conversation. Walk the design tree with them — constraints, dependencies, the shape of the deepened module, what sits behind the seam, what tests survive.

Side effects happen inline as decisions crystallize:

  • Naming a deepened module after a concept not in CONTEXT.md? Add the term to CONTEXT.md — same discipline as /grill-with-docs (see CONTEXT-FORMAT.md). Create the file lazily if it doesn't exist.
  • Sharpening a fuzzy term during the conversation? Update CONTEXT.md right there.
  • User rejects the candidate with a load-bearing reason? Offer an ADR, framed as: "Want me to record this as an ADR so future architecture reviews don't re-suggest it?" Only offer when the reason would actually be needed by a future explorer to avoid re-suggesting the same thing — skip ephemeral reasons ("not worth it right now") and self-evident ones. See ADR-FORMAT.md.
  • Want to explore alternative interfaces for the deepened module? See INTERFACE-DESIGN.md.

来自 mattpocock 的更多技能

grill-me
mattpocock
对用户的计划或设计进行持续追问,直到达成共识,解决决策树的每个分支。当用户想要对计划进行压力测试、接受设计拷问,或提到“grill me”时使用。
researchcommunicationproject-management
grill-with-docs
mattpocock
一场烧烤式讨论,用现有领域模型挑战你的计划,打磨术语,并在决策成型时同步更新文档(CONTEXT.md、ADRs)。当用户希望用项目的语言和已有决策对计划进行压力测试时使用。
developmentdocumentresearch
teach
mattpocock
在此工作空间内,教授用户一项新技能或新概念。
communicationproductivity
tdd
mattpocock
采用红绿重构循环的测试驱动开发。当用户希望使用TDD构建功能或修复缺陷、提及“红绿重构”、需要集成测试或要求测试优先开发时使用。
developmenttesting
to-prd
mattpocock
将当前对话上下文转化为PRD并发布到项目问题追踪器。当用户希望从当前上下文创建PRD时使用。
developmentdocumentproject-management
handoff
mattpocock
将当前对话压缩为交接文档,供其他代理接手处理。
communicationproject-managementdocument
diagnose
mattpocock
Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
developmenttestingcode-review
to-issues
mattpocock
将计划、规范或产品需求文档拆解为项目问题追踪器中可独立领取的问题,采用追踪弹头式垂直切片方法。适用于用户希望将计划转化为问题、创建实施工单或将工作分解为问题场景。
developmentproject-management