check

bởi tw93

Xem xét các bản diff code, PR, hàng đợi issue, mức độ sẵn sàng phát hành, commit, push, xuất bản và kiểm tra dự án. Sử dụng khi người dùng yêu cầu review/看看代码/合并前/看看issue/PR/release/push hoặc để triển khai một kế hoạch đã được phê duyệt, với các biện pháp an toàn cho các worktree bẩn và chưa theo dõi. Không dùng để khám phá ý tưởng, gỡ lỗi nguyên nhân gốc rễ hoặc xem xét văn xuôi.

npx skills add https://github.com/tw93/waza --skill check

Tải ZIP GitHub

6.7k

Check: Review Before You Ship

Prefix your first line with 🥷 inline, not as its own paragraph.

Note: /review is a built-in Anthropic plugin command for PR review. Waza uses /check (or the alias code-review) instead. Do not re-trigger /review from within this skill.

Read the diff and find the problems. Review, audit, triage, and readiness requests are report-only; apply fixes only when the current turn explicitly asks to fix, change, implement, or optimize. Done means the requested review surface is covered and every verification claim comes from this session.

Outcome Contract

Outcome: a review, release decision, or maintainer action grounded in the current diff, project context, and live evidence.
Done when: findings, fixes, shipped state, or blockers are stated with the commands, artifacts, or remote state that prove them.
Evidence: worktree status, diff, public project docs, manifests, CI, package contents, release or registry state, and current command output.
Output: concise findings first, then verification and shipped-state summary when applicable. Multi-step or ship-action runs close with a completion ledger (done / not applicable / remaining), never a narrative that leaves the user asking "is everything done".
Authorization: read-only intent may inspect the worktree and remote state but may not edit files, apply autofixes, commit, push, publish, comment, close, merge, or change branches. Each write or public action needs current-turn authorization, except when the user explicitly authorizes a named batch that contains it.

Worktree Safety Preflight

Before any review, triage, ship, release, or PR operation, read the current worktree with:

git status --short --branch -uall

Treat modified, staged, and untracked files as user work. You may read them and include them in the review surface, but you must not move, hide, overwrite, clean, or discard them without explicit user approval in the current turn.

Do not run these commands as default review or PR setup: git switch, git checkout, git reset --hard, git clean, git stash -u, git stash --include-untracked, git stash -a, git stash --all, or gh pr checkout. If a branch change or cleanup is genuinely required, stop and ask for that exact operation.

Do not "protect" user work by moving untracked files, generated files, screenshots, or local scratch files into /tmp or another holding directory. Moving someone else's WIP out of the checkout is the same class of interference as stashing it. If a clean tree is required for generation, packaging, or verification, use a separate worktree from a known commit and copy only the artifact or patch you own back into the current checkout.

For commit or push follow-through in a dirty or multi-agent checkout, record git rev-parse HEAD before staging. Re-read git status --short --branch -uall and git rev-parse HEAD immediately before commit and again before push. If HEAD moved, unknown commits appeared, or the worktree changed outside your intended files, stop and report the mismatch instead of rebasing, recommitting, or pushing.

For PR inspection, prefer commands that do not switch the current working tree: gh pr view, gh pr diff, git fetch origin pull/<n>/head:refs/tmp/pr-<n>, and git merge-tree.

Mode Picker

Pick the mode that matches the user's intent, then read it in full before acting. Modes layer on top of the shared review surface (Scope, Hard Stops, Autofix, Specialist Review, Verification, Sign-off) further down, which applies in every mode. Load a mode file only when its row matches; the default review path needs none of them.

User intent	Mode
"implement this plan", `/think` output handed off	Plan Execution
Diff or PR ready, "review", "看看代码", "合并前"	Default review (start at Get the Diff)
"look at issues", "review PRs", "triage", "批量处理"	load `references/mode-triage.md`
"is this worth a release", "值不值得发版"	load `references/mode-ship.md` (Release Worthiness Analysis)
"commit", "push", "publish", "release", "close issue", "发布表情"	load `references/mode-ship.md` (Ship / Release Follow-through)
"audit", "项目体检", "项目评分", "给项目打分", "深入分析项目代码", "scorecard", "linus review"	load `references/mode-audit.md`
Document, PDF, prose review	Delegate to `/write` (see Document Review)

Before any mode, run Project Context Extraction and (if memory is in scope) Durable Context Preflight.

Project Context Extraction

This is Waza's public, standalone code-review capability. It should not depend on private machine paths or unpublished project instructions.

Before reviewing, extract project constraints from repository context:

Read the diff and identify changed languages, frameworks, manifests, generated outputs, release files, and CI workflows.
Inspect public project files only as needed: README, AGENTS/CLAUDE instructions when present, package manifests, lockfiles, build configs, test configs, workflow files, and release notes.
Compress the findings into review context: verification commands, protected or generated files, release artifacts, domain risks, and public reply rules.
Apply the stricter rule when project context and this skill overlap.
If project docs or CI name a verification command, prefer that over auto-detection.

For the context shape, see references/project-context.md.

For release or maintainer work, also fill the Release Gate 2.0 matrix from references/project-context.md. It covers review base, dirty/staged/untracked state, latest tag, origin sync, version fields, generated artifacts, package/archive contents, release assets, registry/appcast/CI, and public issue/PR state. Missing matrix evidence is a blocker for a "ready to release" claim.

Durable Context Preflight

See references/durable-context.md for when durable context is in scope and the redaction gate that applies before any of it becomes a durable rule.

For /check: the current diff, CI, and remote state override memory. Durable memory can explain user intent and preferred follow-through, but public project rules still come from README files, manifests, CI workflows, release docs, and explicit instructions in the current thread. Never cite private memory as a public project requirement.

Plan Execution Mode

Activate when the user's message starts with "Implement the following plan", "按计划实施", "按照计划", "整", "可以干", "直接改" followed by a plan body, or links to a /think output.

In this mode, do not run a code review. Instead:

State which plan is being executed (first heading or summary line).
Check for obvious repo drift: run git status --short --branch -uall and skim any changed files that contradict the plan. If drift makes the plan unsafe, name the specific conflict and stop.
After all items are done, run the project's verification command.
Transition automatically into references/mode-ship.md if the project context or current thread indicates review-then-ship.

Default Continuation (review-then-ship)

When the project's AGENTS.md or the current thread explicitly asks to "commit after review", "ship if green", or equivalent, load references/mode-ship.md and transition directly from review to the ship flow after a clean review. Do not ask again. State "proceeding to ship" before acting.

Get the Diff

Derive the review baseline from the user's words and current repository state. Do not ask for commits when the scope is already inferable:

All local or uncommitted changes: inventory staged, unstaged, and untracked files, plus local commits ahead of the configured upstream. Being on the base branch does not make this scope ambiguous.
PR or branch review: use the merge base through the reviewed head, then add any dirty files in that checkout as a separate surface.
Since the last release: use the latest published stable tag through HEAD, not the local version field, then add dirty files.
Recent N days or an explicit ref: resolve that time/ref boundary through HEAD, then add dirty files.
Known-good or previous working version: compare that ref through HEAD; route to /hunt Bisect Mode only when the regression point itself is unknown.
Whole-project audit: use Audit Mode rather than pretending one diff is the repository.

Freeze the resolved base, HEAD, worktree inventory, generated/distribution surfaces, and delegated scopes before review. Ask one narrow question only when two plausible baselines would materially change the verdict. If review fixes are applied or repository state moves, the old verdict expires: re-read HEAD, status, and the full resolved diff before signing off.

Scope

Measure the diff and classify depth:

Depth	Criteria	Reviewers
Quick	Under 100 lines, 1-5 files	Base review only
Standard	100-500 lines, or 6-10 files	Base + conditional specialists
Deep	500+ lines, 10+ files, or touches auth/payments/data mutation	Base + all specialists + adversarial pass

State the depth before proceeding.

Explicit depth language overrides the size thresholds. "All", "全部", "deep", "深入", or "仔细" means whole-scope coverage of the resolved inventory, even when the textual diff is small; it does not permit skipping untracked files, generated mirrors, required artifacts, or pending reviewers.

Static content diffs can stay quick even when they touch several generated files: version strings, dates, release-copy mirrors, sitemap dates, or one-for-one localization copy changes usually need line-by-line readback plus grep consistency, not a specialist fleet. Escalate only when the diff changes logic, generation rules, public distribution behavior, or user-facing semantics beyond the literal text replacement.

Did We Build What Was Asked?

Before reading code, check scope drift: do the diff and the stated goal match? Label: on target / drift / incomplete.

Also check surgical traceability: every changed file and every new public surface must trace back to the user's stated goal. If a file, dependency, config knob, abstraction, generated artifact, workflow permission, or release behavior cannot be explained in one sentence from the request, label it drift until proven necessary.

For every new public setting, flag, environment variable, command, or service, ask who will change it and why one correct default cannot serve them. If there is no evidenced user split, treat the knob as scope drift and fix the default path instead.

Drift signals (examples, not exhaustive -- any one is enough to label drift):

A changed file has no connection to the stated goal
The diff includes pure refactoring (renames, formatting, restructuring) when the goal was a bug fix or feature
A new dependency appears that the goal did not mention
Code unrelated to the goal was deleted or commented out
A new abstraction or helper was introduced that is not required by the goal
A maintainability, review, or cleanup change quietly adds user-visible UI, default config, workflow permissions, or release behavior

Pattern-Fix Completeness

When the diff fixes one instance of a class-of-bug (a missing validation, a wrong selector, an off-by-one, a missing lock), the same shape often lives elsewhere. Extract the pattern signature, grep -rn it across the repo (exclude generated dirs), and confirm sibling instances were also handled. List any unswept sibling: flag it as a hard stop when it carries the same risk, advisory when lower-risk. For a deeper sweep playbook, see hunt's Scope Blast Mode.

When the diff contains a recurring or hard-to-observe bug, output-string branching, guessed waits, consolidation or dead-code deletion, history-sensitive restoration, broad destructive matchers, duplicated derivations, test-only seams, never-shipped migrations, or unknown identifiers, load the matching section of references/review-patterns.md. Do not load that catalog for unrelated diffs.

CLI Command Surface

When a diff touches a CLI entrypoint, installer, completion, config/env handling, package wrapper, or a mutating command such as cleanup, update, uninstall, migration, or cache removal, load references/release-surfaces.md (CLI Command Surface) and work its checklist, then fill the CLI Command Surface template from references/project-context.md before sign-off. The core stance: verify command contract and installed-runtime behavior, not just library tests, and treat every mutating command as a safety sink.

Terminal output is a rendered surface. After changing CLI-facing text, spacing, or layout, re-run the command and read the real output before claiming done; editing the string is not seeing the screen.

Skill, Plugin, And Packaged Install Surface

When a diff touches a skill, plugin, marketplace entry, installer, package allowlist, package manifest, generated mirror, or published archive, load references/release-surfaces.md (Packaged Install Surface) and verify the installed runtime contract through its five steps: real user install path, rebuilt package contents, isolated install smoke, noise filtering, and explicit gaps when the smoke cannot run. Manifest JSON, source tests, or a successful local import never substitute for installed-runtime proof.

Hard Stops (fix before merging)

Examples, not exhaustive -- flag any diff that could cause irreversible harm if merged unreviewed.

No unverified claims. Do not write "I verified X", "I ran Y", "tests pass", or "this fixes Z" unless the shell output is in this turn's transcript. If you reason about behavior without running, say "based on reading the code" instead of "I verified". Every verification claim in the sign-off must point to a command that actually ran in this session.
Re-read source-of-truth facts. Refresh line numbers, worktree state, fallback behavior, locale coverage, and artifact state in the current turn before citing them. Earlier context and reviewer notes are leads, not evidence.
Destructive auto-execution: any task marked "safe" or "auto-run" that modifies user-visible state (history files, config, preferences, installed software) must require explicit confirmation.
Source and distribution out of sync: everything the source change implies downstream must be regenerated, tracked, uploaded, and version-consistent before declaring done: generated or bundled outputs rebuilt and included, every artifact named in release notes or workflows actually uploaded, every new helper module, reference file, or script present in the built archive, and version fields synchronized across manifests, package metadata, changelogs, tags, and lockfiles.
Verifier failure layer unclear: if a verifier fails before assertions or due to missing optional dependencies, bootstrap noise, transient build-service crashes, unavailable simulators, or tool setup, classify setup versus product failure. Retry only with new evidence or a narrower environment. Do not call the repo broken until the intended test body or artifact check actually ran. The inverse is the same stop: a verifier that passes without running the real path -- a skipped optional-dependency job that still prints OK, a function that early-returns leaving output empty so a true-on-empty assertion passes, a render reported fixed but never opened -- is a hollow green. A pass counts only when at least one non-skipped, non-empty case exercised the path and the assertions fail on emptiness.
Injection and validation: SQL, command, path injection at system entry points. Credentials hardcoded, logged, committed, or copied into public docs.
Dependency changes: unexpected additions or version bumps in package.json, Cargo.toml, go.mod, requirements.txt. Flag any new dependency not obviously required by the diff. The inverse is a finding too: a declared dependency or linked SDK with zero imports across the repo gets flagged to the maintainer, not silently removed (it may be staged for an upcoming feature, and unused analytics/telemetry SDKs still drag app review and privacy manifests). Removal needs the maintainer's go-ahead in the current turn, a grep proving zero references first, and a full build after.
Safety sinks: destructive file operations, shell or AppleScript construction, cwd/path/symlink traversal, approval or sandbox boundary changes, signing/appcast flows, and auth prompts need explicit review of validation, rollback, and user-confirmation behavior.

Finding Quality Gate

Before writing any finding into the report, run this gate:

Pre-report self-check (four questions, every finding must pass):

Can I cite the exact file:line?
Can I describe the specific input or state that triggers the bad outcome?
Have I read the upstream callers / downstream consumers, not just the function in isolation?
Is the severity defensible? Would a senior reviewer raise this at this level in a real PR?

If any answer is "no", drop the finding or downgrade it to advisory. Vague findings train the reader to ignore real ones.

A clean review is a valid review. Do not manufacture findings to justify the invocation. Zero findings with a stated review surface is a complete output. Padding the report with low-confidence noise is a worse outcome than reporting nothing.

HIGH and CRITICAL require three pieces of evidence:

The exact file:line where the bug lives.
The specific trigger: what input, state, or sequence produces the bad outcome.
Why existing guards (validation, type system, upstream catch, framework default) do not already prevent it.

Cannot supply all three? Downgrade to MEDIUM, or drop. "This might break under some condition" is not a HIGH.

Knowledge Sync

After reviewing the diff, check whether it introduces invariants not yet captured in project docs:

New safety gate or path-guard rule → AGENTS.md
New UI constraint (layout rule, animation, overlay registration) → .claude/rules/*.md
New deploy/release step or artifact → AGENTS.md or docs/
New cross-file sync requirement (enum ↔ HTML anchors, Swift keys ↔ xcstrings) → AGENTS.md
One-off review reports or diagnostic snapshots should not become durable docs as-is; extract the stable rule into AGENTS/CLAUDE/rules/references and drop the stale report from the commit.

Snapshot Report Routing

Treat review reports, scorecards, and diagnostic snapshots as evidence, not as source-of-truth docs. Before approving one:

Re-read the current diff or repo surface named by the report. If the claim is stale, exclude the report from the commit or rewrite it into a stable rule.
Keep project-specific commands, paths, protected areas, release rituals, and safety constraints in that project's public context. Do not promote them into Waza.
Promote only transferable review behavior into Waza: e.g. "check untracked files before readiness", "inspect generated package contents", or "turn one-off reports into invariants."

If found, either apply the doc update as safe_auto (when the invariant is clear from the diff) or flag it in the sign-off as doc debt. When no new invariants exist, sign-off says doc debt: none.

Specialist Review (Standard and Deep only)

Load references/persona-catalog.md to determine which specialists activate. When the environment has an agent or sub-agent facility, launch all activated specialists in parallel, each with the full diff and its own persona brief. If no parallel reviewer facility exists, run the specialist passes sequentially in the same session.

Merge findings: when two specialists flag the same code location, keep the higher severity and note cross-reviewer agreement. Findings on different code locations are never duplicates even if they share a theme.

Every specialist finding is a claim to verify, not a fact to act on. For HIGH and CRITICAL claims, when the agent facility allows it, spawn one independent skeptic per finding whose only brief is to refute it against the actual code; a finding the skeptic refutes on direct read is dropped or downgraded regardless of which persona raised it. Without the facility, run the skeptic pass yourself: re-read the cited code this turn and confirm the claim is real and live, not already handled elsewhere, not consistent-by-design, not a latent-only risk labeled as a live bug. Parallel reviewers over-report from name-based inference and partial context; drop what dissolves on direct read, and cite the verification path before routing anything to Autofix or sign-off.

Before a whole-scope verdict, reconcile a completion ledger for every delegated review: assigned scope, returned status, and uncovered remainder. Wait for every active reviewer, or name its scope as unreviewed. Never say "all read", "full audit complete", or "no issues" while any reviewer or required verification is still pending.

Autofix Routing

Class	Definition	Action
`safe_auto`	Unambiguous, risk-free: typos, missing imports, style inconsistencies	Apply only after explicit write authorization; otherwise report it
`gated_auto`	Likely correct but changes behavior: null checks, error handling additions	Batch into one user confirmation block
`manual`	Requires judgment: architecture, behavior changes, security tradeoffs	Present in sign-off
`advisory`	Informational only	Note in sign-off

After explicit write authorization, apply safe_auto fixes before surfacing the gated_auto confirmation block. In report-only mode, do not modify the worktree.

Any fix made during review invalidates the pre-fix verdict. Re-freeze the baseline, re-run the check that exposed the finding, refresh the sibling sweep, and complete the final adversarial pass required by the review depth before declaring ready.

Adversarial Pass (Deep only)

"If I were trying to break this system through this specific diff, what would I exploit?" Four angles (see references/persona-catalog.md): assumption violation, composition failures, cascade construction, abuse cases. When the agent facility exists, run the four angles as parallel agents, each blind to the others' findings: convergence from independent angles raises confidence, and singleton findings face the same per-finding skeptic verification as specialist claims. Suppress findings below 0.60 confidence.

Platform Operations

Use the platform tool that matches the project. For GitHub projects, prefer gh or the available GitHub integration and confirm CI passes before merging. For non-GitHub projects, derive the CLI/API from public project docs or the user's explicit platform context; do not force GitHub commands onto other hosts.

Poll CI as structured state, not streamed text: gh run view <id> --json status,conclusion (or the host's equivalent). Piping gh run watch, test output, or build output through tail/head swallows the real exit code and can report a failed or still-running run as green.

Verification

Run bash <skill-base-dir>/scripts/run-tests.sh from the target project root (<skill-base-dir> is this skill's base directory; the script auto-detects the project's test command from the current working directory), or the project's known verification command. Paste the full output.

If the script exits non-zero or prints (no test command detected): halt. Do not claim done. Ask the user for the verification command before proceeding. If the user also cannot provide one, document this explicitly in the sign-off as verification: none -- no command available and flag it as a structural gap, not a pass.

For bug fixes: a regression test that fails on the old code must exist before the fix is done.

In a dirty or multi-agent checkout, a passing local build or test run is not proof your change is sound: unrelated WIP already in the tree can supply missing symbols, mask a break, or fail for reasons unrelated to you. Verify in isolation -- git worktree add --detach <known-good-commit>, git apply only the diff of the files you own, then build/test there. The clean isolated pass is the real signal; the contaminated local pass is not.

Gotchas

What happened	Rule
Posted a public reply to the wrong issue or PR thread	Re-read the target with `gh issue view N` or `gh pr view N` and confirm title, author, and current state before acting
PR comment sounded like a report	1-2 sentences, natural, like a colleague. Not structured, not AI-sounding.
PR comment used bullet points	Write as short paragraphs, one thought per paragraph; thank the contributor first
New file name duplicated a locale, platform, or suffix convention	Check the target directory's existing naming convention before creating or renaming files
Deployed without provider runtime or env checks	Follow the project's public deployment docs and compare provider config with local required env and runtime settings
Push failed from auth mismatch	Check `git remote -v`, current branch, and auth identity before the first push in a new project

Document Review

For document, PDF, white paper, or prose review, route to /write (Document Review Mode). /check handles code diffs and release artifacts only.

Sign-off

Open the final message with the status line as plain prose before any table or detail: exactly where the work stands now, with the hash, tag, or blocker. A verdict buried under verification tables reads as unfinished and makes the user re-ask "都提交了吗"; the tables support the verdict, they do not replace it.

status:           [committed and pushed as <hash> / staged, not committed / released vX.Y.Z / blocked on <what>]
files changed:    N (+X -Y)
scope:            on target / drift: [what]
user-visible delta: none / [entry, UI, copy, behavior added, removed, or changed]
review depth:     quick / standard / deep
hard stops:       N found, N fixed, N deferred
sibling sweep:    N same-shape sites checked, N fixed / none found / not applicable
specialists:      [security, architecture] or none
new tests:        N
public actions:   replied #N, closed #N, reactions done / none pending
doc debt:         none / AGENTS.md needs X / rules need Y
verification:     [command] -> pass / fail

public actions lists every outward-facing step the task implied (issue replies, closures, release reactions) with its done or pending state; an external action the user has to ask about was not finished.

For a whole-scope or post-fix verdict, scope is backed by the frozen baseline and current inventory, not by the last patch viewed. For a ship action, the status line is incomplete until every currently authorized ledger item is done, not applicable, or blocked with evidence.

Thêm skills từ tw93

hunt

tw93

Tìm nguyên nhân gốc rễ trước khi áp dụng bản sửa lỗi, sự cố, hồi quy, kiểm thử thất bại, hành vi hỏng và lỗi được báo cáo qua ảnh chụp màn hình. Sử dụng khi người dùng yêu cầu 排查/报错/崩溃/不工作/回归/判断为什么报错, hoặc nói rằng điều gì đó từng hoạt động nhưng giờ không còn. Không dùng cho đánh giá mã hoặc tính năng mới.

developmenttestingcode-review

think

tw93

Biến những ý tưởng thô sơ thành các kế hoạch đã được phê duyệt, hoàn chỉnh về quyết định với cấu trúc đã được xác thực trước khi viết mã. Sử dụng khi người dùng yêu cầu 出方案/给方案/深入分析/怎么设计/有没有必要/值不值得/plan this/how should I/should we keep this cho các tính năng, kiến trúc hoặc đánh giá giá trị. Không dùng cho sửa lỗi hoặc chỉnh sửa nhỏ.

developmentresearchproject-management

health

tw93

Thực hiện kiểm tra sức khỏe kỹ thuật có hỗ trợ agent, có nhận biết ngân sách, cho các vấn đề trôi dạt hướng dẫn/cấu hình, hooks/MCP, bề mặt xác minh và khả năng bảo trì AI. Sử dụng khi người dùng yêu cầu kiểm tra claude/kiểm tra codex/kiểm tra pi/kiểm tra cấu hình/sức khỏe hoặc báo cáo agent bỏ qua hướng dẫn, thiếu xác thực hoặc mã trở nên khó bảo trì. Không dùng để gỡ lỗi mã hoặc xem xét PR.

developmentresearchcode-review