embedded-captions

作者: heygen-com

為說話人頭影片添加字幕。一份包含32種視覺風格的目錄(CATALOG.md),基於兩種引擎:列式流(字幕合成至場景中——遮罩遮擋+混合模式;奶油/墨水/編輯/主題演講/紀錄片/鮮明/霓虹/故障/鉻/速度)與主題構成(錨點/軍械/終端/霓虹招牌/星塵/跺腳/記分板/轉運/VHS/街機/檔案/雷射/雷鳴/全息/生物光/極光/光譜/剪紙/彈窗/黑板/塗鴉/畫筆/水墨/勒索/末頁/夜城...

npx skills add https://github.com/heygen-com/hyperframes --skill embedded-captions

Embedded Captions

One catalog, picked up front (CATALOG.md — 17 identities; the three engines behind it are backend detail). Standard (default) builds a clean verbatim rail (lower-third subtitle carrying most text) + an embed climax composited into the scene behind the subject at the peak. Cinematic is pure embed — no rail, every caption composited behind the subject (hero typography, accumulation, occlusion as the effect). Theme is a complete themed constitution — body paradigm × hero setpiece × front fx × plate reaction, composed from registries (themes/README.md): ordnance terminal neonsign stardust stomp. Most explainer / voiceover is Standard; embed is the scarce, earned peak — embedding every word is the common mistake; Theme is for VFX-grade asks ("炸", "特效", "像 AE 做的").


Operational flow (TL;DR)

The craft prose below is long; the pipeline itself is short — and everything deterministic is computed or compiled, never hand-written:

  1. Decision gate (refuse bad clips) → pick ONE identity from CATALOG.md (17 identities; engine/compiler derived by lookup — never surface a mode/category question)
  2. hyperframes init (skip it if the project dir already exists with the video inside — matte.cjs/transcribe.cjs adopt any video in the dir as source.mp4) → bash scripts/prepare.sh <project> (matte ∥ transcribe ∥ audio-envelope in parallel, then safe-zones v2 with scene palette/optics/lighting — one command, nothing forgotten)
  3. author a small JSON of creative choices (read safe-zones.json first): Cinematic → plan.jsonfill-timings.cjsfit-fonts.cjsmake-composition.cjs; Theme → theme.jsonmake-theme.cjs (rail/panel/poem/takeover paradigms; anchor is the quiet rail default)
  4. Visual QA: node scripts/preview-frames.cjs <project> → faithful composite previews in ~2s/frame (no render). Check § Visual QA before paying for a render.
  5. render-and-composite.sh → gates (timing / occlusion+hero / overflow / hand-off) → final.mp4

Load-bearing rules people miss:

  • rail (default) + embed (promotion). drop (filler, not shown) / rail (verbatim lower-third subtitle, in front, carries most text) / embed (a peak word composited behind the subject). Standard mode does both, embedding only the peak(s). See § Caption model.
  • The video is delivered UNTOUCHED (Standard/Cinematic; Theme mode's PLATE budget is the one sanctioned exception — register-gated reaction beats (charge-dim, punch, shake, grain) defined per theme DNA and applied AFTER the matte composite so subject+text+plate move as one frame) — captions are the only thing added; the matte just lets the subject occlude the embed track. Never grade/recolor/scanline the footage.
  • Two rulebooks: rail → references/rail.md (thin), embed craft → references/composition-craft.md (rich, embed-only). Skim by need.

Caption model — rail + embed

Every spoken phrase is one of three things:

WhatHow it's shown
dropfiller — um/uh, stutters, self-correctionsnot shown
railthe default — ordinary spoken content (verbatim)clean lower-third subtitle, in front, readable. A punch word can get an inline emphasis highlight (accent colour / active-word pop) — it stays on the rail.
embeda promoted peak — the headline beatone big word composited behind the subject (matte occlusion), designed entrance + exit

The rail carries most of the text; embed is the scarce, earned peak. Scarcity is per beat/block, not per clip: ≤1 hero per block (thought), never two co-visible, ≥ a beat of air between hero windows (the compiler warns under 0.6s). A short clip → usually 1–2; a long explainer → ~one per section. Among multiple heroes, the largest authored one is the APEX (it alone gets the full lockup embed + width-fit raise); smaller ones are MINOR peaks that ride their column as oversized emphasis lines (fg, damped motion) — not every beat needs the matte showcase, which is exactly what keeps the apex an event. Embedding every word is still the common mistake.

Rail-surface identities build exactly this (rail = rail.html, embed = the climax in index.html). Column-flow identities drop the rail and make everything embed-style — recommend them only for mood-over-verbatim asks, never for explainer / voiceover where the words must read (CATALOG.md encodes this per identity).


Step 0 — pick ONE identity from the CATALOG

One front-end, three engines behind. The user picks an IDENTITY from CATALOG.md (17 entries: 12 classic + 5 themed); the engine, compiler and authoring file are derived by lookup from the catalog row. Never surface "Standard vs Cinematic vs Theme" as a question — those are backend names (a product has one UX even with several engines). The catalog encodes everything routing needs: reading surface, voice, recommend-for, scene needs, adjacency notes for the genuinely-close pairs (loud↔ordnance, neon↔neonsign, cream↔stardust).

Procedure: probe the clip → shortlist 2–3 identities from the catalog → recommend ONE with a one-line why → the user picks → author that identity's file. Identities are engine-locked (no cross combos; opening one is a validation event — see dna/README.md).

Always present your recommendation and let the user pick before you author. Don't silently default.

(The full identity table lives in CATALOG.md — single source of truth for routing. The engine docs below describe each backend's authoring contract.)

Recommendation heuristic: use the "Shortlisting heuristics" in CATALOG.md — they are identity-level (e.g. "炸" shortlists ordnance/stomp/terminal/loud and picks by WHAT should explode), never category-level. Unsure → anchor.

  • Cinematic → write plan.json for a locked template, compiled by make-composition.cjs.
  • Theme → read themes/README.md, author theme.json, run scripts/render-theme.sh (compiles + renders + plate reaction → final_fx.mp4).

Decision gate — RUN FIRST

Probe the video and classify the scene before either mode.

ffprobe <video.mp4>                    # specs
ffmpeg -ss <t> -i <video.mp4> -vframes 1 sample.png   # at 20/50/80%

Read the samples. Refuse if:

  • Multiple speakers / hard cuts (split & render each shot, or refuse)
  • No human subject (this skill is for talking-head)
  • Under 3 seconds, no speech, or face never clearly visible — transcribe.cjs warns when audio is near-silent (Whisper hallucinates words like "Thank you." over silence); heed it and refuse rather than caption fabricated words
  • Source already has burned-in captions / subtitles / heavy text graphics — adding a second caption system conflicts and the footage ships untouched (no covering/inpainting). Burned text often appears only mid-clip: sample a 1fps contact sheet (ffmpeg -i in.mp4 -vf "fps=1,scale=160:-1,tile=10x5" sheet.png), don't trust 3 spot frames.
  • Transcript is garbage — non-native/heavy-accent speech can transcribe into confident gibberish. Sanity-read transcript.json before authoring; if it doesn't parse as language, try WHISPER_MODEL=medium once, else refuse (a verbatim rail of fabricated words is worse than no captions).
  • Busy handheld with fast motion (matte flickers)

Pre-flight probes (cost nothing, prevent the worst failures)

  1. Shot-cut probe. Sample frames at 20%, 50%, 80%. If a different subject/scene appears, trim the clip before the cut.
  2. Letterbox / pillarbox probe. Black bars on the first frame? Compute safe content rect and constrain caption placement inside it.
  3. Luminance probe. Sample the caption region's average luminance — under 60 → light text reads as-is, 60-180 → add the glyph scrim, 180+ → opaque text + scrim (never bare light text). Cinematic templates are cream+screen and LOCKED — use this probe to pick a fitting identity (bright scenes → ink, or the opaque-rail anchor theme), never to recolour one.
  4. Identity recommendation by tone (you recommend; the user picks — see Step 0 + CATALOG.md). explainer / interview / must-read words → rail/panel-surface identities; poetic / social / "cinematic" → column-flow identities by register; "炸 / 特效 / VFX" / named worlds → themed identities. When unsure → anchor (words read, scene safe) — but present a shortlist and let the user choose.

Pipeline — 5 steps

1. hyperframes init <project> --non-interactive --video <video.mp4> --skip-skills
2. bash scripts/prepare.sh <project>       # matte ∥ transcribe (parallel) → safe-zones. One command.
                                           #   → frames_fg/ transcript.json safe-zones.json
3. [AGENT STEP — the only creative step] author a small JSON; see below by mode
   Cinematic: author plan.json → node scripts/fill-timings.cjs → fit-fonts.cjs → make-composition.cjs
   Theme:     author theme.json → bash scripts/render-theme.sh <project>   (compiles + renders + plate fx)
4. node scripts/preview-frames.cjs <project>   # ~2s/frame composite previews → § Visual QA (BEFORE the render)
5. bash scripts/render-and-composite.sh <project>  # gates → final.mp4 + history/ snapshot
   (Theme mode: SKIP steps 3b/5 — render-theme.sh already runs compile + render-and-composite
    + _postfx.sh; the deliverable is final_fx.mp4, final.mp4 is pre-plate-reaction)

Step 3 differs by mode:

Step 3 — Cinematic mode (pure embed)

  1. Read safe-zones.json first. Narration planes go in zones.hugLeft/hugRight — clean strips ABUTTING the silhouette (text far from the body reads as floating, not embedded; far corners are the fallback, not the default). The hero defaults to heroAnchor/heroBands.best (centered ON the subject, ~30–55% occluded). recommendation:"fg" moves NARRATION in front for legibility; the hero stays embedded whenever heroBands.feasible — hero-fg is the last resort.
  2. The DNA is the identity you picked in Step 0 (CATALOG.md) — do not re-open the choice here. Sanity-check it against the scene (bright hero band luma > 150 wants ink; full pick guidance lives in the catalog, covering all ten incl. neon / glitch / chrome / velocity). State your pick + why; the user decides. The DNA locks type/palette/blend/motion + hero three-act; safe-zones v2 (palette/optics/lighting) parameterizes it to THIS scene automatically.
  3. Author <project>/cinematic.json"dna": "<name>" + thought-BLOCKS, not raw groups: each block = lines of words (grouped 2–5 at clause boundaries) + the plane it stacks in + per-line css (size/weight/style only — no positions) + at most ONE line marked "hero": true (the promoted word; "text" for display form). Schema: scripts/make-cinematic.cjs header.
  4. Compile: node scripts/make-cinematic.cjs <project> — lowers blocks → plan.json → index.html. Generated for you: transcript-sequenced timings, accumulate-within-block, page-flip-between-blocks, the hero LOCKUP (a hero block's pre-context, HERO and post-context stack as ONE bonded composition centered on the subject — reading order top→bottom = spoken order by construction; context floats in FRONT while the hero embeds BEHIND = the depth sandwich; a mass rule keeps the hero dominating its context), apex/minor hero split, reading order by construction, fg fallback per safe-zones. Then the gates run as usual. (Hand-authoring plan.json directly remains possible for designs blocks can't express — then run fill-timings.cjs + fit-fonts.cjs + make-composition.cjs yourself.)

Step 3 — Theme mode (themed constitution)

Read themes/README.md FIRST — paradigm/setpiece registries, linkages, hard rules, and the exact theme.json schema.

  1. Pick a theme DNA by content register (each themes/<name>.json has voice + when). State your pick + why; the user decides.
  2. Author <project>/theme.jsondna, lines (verbatim, transcript order; 1–5 words each — for takeover each line is one CARD), minors (emphasis words), hero:{match} (the climax word/phrase; leave it OUT of lines for embed setpieces, keep it IN for inline setpieces and panel+redact).
  3. Render: bash scripts/render-theme.sh <project> — compiles (verbatim-completeness gate at compile time), renders both layers, composites, applies the plate reaction → final_fx.mp4. Use preview-frames.cjs between compile and render for Visual QA.

Visual QA — preview BEFORE you render

node scripts/preview-frames.cjs <project> [t…] composites faithful preview frames in ~2s each (caption layers screenshotted at seek-time + real video frame + matte occlusion + rail overlay = what the final composite will look like at that moment). Default samples = each group/climax window. A full render costs minutes — never use it to discover layout problems.

Check the previews (<project>/preview/sheet.png) against this list — these are the failures the geometric gates cannot catch:

  1. Washout — light text over a bright region (window/sign/sky): unreadable → move the plane or change DNA/mode (bright scene → ink).
  2. Text-on-text — captions over the scene's own text/graphics, or two caption groups colliding.
  3. Reading order — on-screen vertical order must match spoken order; the hero must not sit below later words.
  4. Hero presence — the climax should be BIG and visibly behind the subject (~30–55% occluded), not a floating label in a margin.
  5. Balance — one coherent column/band, not scattered fragments; margins breathing; nothing clipped.

Then the 5 positive checks in references/reference-bar.md (poster test · timid test · one-glance hierarchy · scene handshake · dead-air audit) — the failure list keeps a render from being broken; the positive list is what makes it designed. Ship when both pass.

Fresh-eyes review (recommended for anything user-facing): you have confirmation bias about your own layout. If you can spawn a subagent, give it ONLY the preview sheet + this checklist and ask for PASS/FIX verdicts per frame ("review these caption previews against the 5-point checklist; answer PASS or the specific fix per frame"). Apply fixes in plan.json / theme.json, recompile, re-preview — each loop costs seconds. Render once, when the previews pass.


The DNA registry — ten visual languages (replaces the template catalog)

Both modes draw from dna/ — six art-directed visual languages that parameterize per scene (accent sampled from the footage, contact shadow along the measured light direction, depth-match blur, RMS-coupled hero amplitude):

DNARegisterScene fitVoice
creampremium-warmdark/mid warm scenesInter + warm cream + screen; glowing emergence hero (successor of cinematic-cream)
inkpremiumbright scenes (luma > 150)near-black multiply — type printed ON the wall; the bright-scene answer
editorialeditorial-luxeintrospective / fashion / poeticBodoni Moda, lowercase-italic hero — magazine elegance
keynotetech-premiumproduct / launchopaque white Inter 800, dead-center stillness
documentaryformalinterview / seriousburn-in reveals, no hero — gravitas IS the style
loudloudhype / sport / socialAnton + scene-sampled accent, single-unit slam + ripple; body ANNOUNCES in front (bodyLayer: fg)
neonloud-cybercyberpunk / nightlife / tech-noir (dark scenes)electric-cyan signage, ignition flicker, the hero powers ON like a sign
glitchloud-cyberdigital / hacker / AIRGB-split echoes snap together on landing; machine-percussive timing
chromeloud-luxeY2K / fashion-tech / musicliquid-metal gradient hero + one sheen sweep during the hold
velocityloud-sportsport / auto / fitnessevery word arrives along its motion vector (streak+skew), hero passes with speed trails

Pick by safe-zones.json (heroAnchor.bandLuma, palette.temperature) × content register — dna/README.md has the decision rule. Authoring: cinematic.json takes "dna": "<name>".

The engine generates the hero three-act from the DNA (no authoring needed): co-visible captions dim (setup) → per-letter entrance with amplitude ∝ spoken loudness (impact) → breathe + glow until exit (afterglow).

(Legacy: plan.template:"cinematic-cream" maps to dna:"cream" automatically. The retired 54-template library lives outside the skill at ~/Downloads/embedded-captions-archive/standard-templates-54/; _motion.md remains in-skill as the motion-verb reference catalog.)


Aesthetic decision — tone × shot × platform (input to the catalog shortlist, NOT a second router)

Classify the clip on 3 axes and feed the result into CATALOG.md's shortlisting — this section never picks a mode/engine by itself:

Tone (what feel does the content have?)

  • documentary | conversational | energetic | poetic | keynote | investigative | music-video

Shot (what's the framing?)

  • close-up (head + shoulders) | mid-shot (torso+) | wide (full body+) | cut-montage (mixed shots)

Platform (where will it play?)

  • 9:16 portrait (TikTok/IG/Shorts) | 16:9 landscape (YouTube/web) | 1:1 square | broadcast export

Cross-reference in references/direction-catalog.md § Classification matrix for direction language — then return to CATALOG.md to shortlist identities (this matrix informs the shortlist; the catalog is the only routing surface).

Composition craft (embed track) — read before embedding

The full embed-track playbook lives in references/composition-craft.md: transcript role-annotation, phrase grouping, planes & clean-zone anchoring, zone coherence, climax pop & readability, edge-breathing, the occlusion 3-step judgement, and accumulation/persistence. It governs how a promoted phrase sits INTO the scene — read it before authoring any embed (Cinematic plan.json or Standard index.html). The default rail track has its own, much simpler spec → references/rail.md.


Shared knowledge

DocWhat
references/rail.mdThe rail track — standard lower-third subtitle spec (the default; carries most text).
references/composition-craft.mdThe embed-track playbook — grouping, planes, climax pop, occlusion judgement, accumulation/persistence. Read before embedding.
dna/README.mdThe DNA registry — six scene-parameterized visual languages; how to pick.
references/reference-bar.mdThe taste bar — per-register world-class references + the 5 positive checks.
references/aesthetic-principles.mdThe 18 rules. Beat Veed AI on taste. Read first.
references/motion-vocabulary.md10 named motion primitives + tone→timing lookup
references/direction-catalog.md10 ship-ready aesthetics + tone×shot×platform matrix
references/anti-patterns.mdBugs already locked out (CoreML, letter-spacing reflow, etc.)
references/scene-types.mdWhen a wall surface is usable (4 conditions)
references/layout-heuristics.mdPlane positioning, clean-zone selection, crown 3 conditions, pillarbox math
references/typography-presets.mdFont-size × column-width matrix (starting points)
references/caption-grouping.mdWord → group rules (pauses, sentence boundaries)
references/failure-modes.mdLong tail of dev gotchas
references/bespoke-vs-presets.mdWhy presets fail sometimes; clone-and-tweak pattern

Read the aesthetic principles and direction catalog FIRST. Everything else is implementation detail.


Non-negotiables

  • Face must never be 100%-covered continuously — every 0.3s window, face bbox ≥30% uncovered.
  • WCAG contrast — final render lints; fix palette if it fails.
  • Deterministic — no Math.random(), no Date.now(), no repeat:-1.
  • Never grade/recolor the video. The footage ships untouched — captions are the only addition. No full-frame scanlines / duotone / darken / vignette over the a-roll. Cyberpunk/CRT texture belongs inside a caption element, not over the whole frame.
  • Rail-first for talking-head / explainer. Don't embed the whole transcript — most text is the rail; embed only peaks. Embedding everything is the default mistake.
  • Embed is scarce + spaced. ≤1 embed per sentence/beat, never two adjacent or co-visible, ≥ a beat apart, at most one apex. climax = per-beat peak, not "the single payoff of the entire clip."
  • Matte = the PERSON (hyperframes remove-background, u2net_human_seg, Apache-2.0). Human segmentation by intent, but not surgically: thin offset furniture (mic boom arms) is usually excluded — captions render over it, behind the person — while large salient objects NEAR the subject (a telescope, a desk rig) can still leak into the matte and occlude captions. Objects HELD by the subject (products, phones) may drop out intermittently, letting captions pass in front. NEVER assume: sample frames_fg/ at 2-3 timestamps before placing the hero, and prefer hero positions clear of any leaked furniture (heroAnchor can be skewed by leaks — cross-check against frames_bg).
  • safe-zones is PROP-BLIND — eyeball every band you use. Zones/heroBands score subject occlusion + luma only: a mic, telescope, or screen sitting inside a "clean" zone is invisible to them (and a prop leaking INTO the matte skews heroAnchor.centerXPct off the person). Before authoring, extract ONE frame of each band you intend to use; if a prop lives there, measure its bbox and move/shrink the plane. Two real cases shipped clean only because the agent did exactly this. (Auto prop-saliency is a known gap; zones' peakLuma only catches moving bright objects.)
  • Captions stay on-frame. Cinematic mode hard-gates frame-overflow; Standard mode runs check-overflow.cjs as a WARNING (intentional bleed is the only exception — read the warning).
  • Each caption ≥ 0.5s on screen — shorter = unreadable.
  • Word timings must match transcript.json within 80ms — a caption firing 500ms off-beat destroys the scene illusion. Cinematic runs check-timing.cjs --strict before rendering (via render-and-composite.sh); THEME mode enforces the same timings at compile time instead (make-theme's sequential transcript matcher + verbatim completeness gate — drift is a compile error). Never pack multiple transcript words into one entry (e.g. "FUTURE OF" or an IT + line-break + ALL stack with one start/end) — the second word inherits the first's timestamp and fires early. Split them into separate word entries with their own timings, even if you want them on the same visual line (use CSS white-space / natural wrap instead of <br>). Creative substitutions where caption text ≠ transcript (e.g. "15%" replacing "fifteen percent") are supported — register them in CREATIVE_SUBS inside check-timing.cjs.
  • Group windows must envelop their wordsgroup.in ≤ min(word.start) and group.out ≥ max(word.end) for every group. If group.in is later than a word's start, the word is silently delayed until the container mounts (we've shipped 800ms lag bugs from this). The validator enforces this.
  • No two caption groups may overlap in both time AND screen region — overlapping-in-time captions create text-on-text pileups. Options: (a) spatial separation — place each group in a non-overlapping vertical band so they can coexist (memory-wall cascade style); (b) handoff — set the earlier group's out ≤ the next group's in so only one is on screen; (c) deliberate layered typography — add "allow_overlap": true on one of the groups to silence the validator. The validator estimates each group's vertical bbox from its CSS and flags collisions. Pick (a) by default — it's what makes cinematic-cream feel like a poem accumulating, not a subtitle track replacing itself.
  • Screen-blend fails on bright backgrounds (>180 luminance). Cinematic templates are cream + screen and that DNA is locked (the plan can't recolour them) → on a bright backdrop they wash out, so pick ink (letterpress built FOR bright surfaces) or the anchor theme (opaque rail surface) rather than overriding a look.
  • Don't animate letter-spacing or filter:blur on word entrance — inline-block reflow causes line-jumps.
  • CoreML banned for matting — the onnxruntime CoreML EP's mixed-precision partitioning corrupted face alpha (observed with the previous RVM engine; don't re-try it). Matting is CPU-only (~2 fps @1080p ≈ 2-3 min per 10s clip; budget for it on long clips).

Dependencies

  • hyperframes, built (packages/cli/dist/cli.js). Scripts auto-resolve the checkout: HYPERFRAMES_ROOT env → repo root if this skill ships inside hyperframes → ~/Downloads/hyperframes. Build with bun install && bun run build.
  • Node-first; two Python touchpoints via uvx (no manual installs): transcription runs WhisperX through uvx (word-level timings; falls back per SKILL §transcription), and Theme's drawon setpiece shells python3 scripts/gen-stroke-path.py at compile time. Everything else runs on the toolchain hyperframes already ships: matting via the hyperframes CLI's remove-background (u2net_human_seg; weights auto-download once, ~168 MB, to ~/.cache/hyperframes/), image/alpha math via sharp, layout/occlusion/overflow via puppeteer, plus ffmpeg. The scripts auto-resolve these from the hyperframes checkout — nothing extra to install.
  • Transcription = WhisperX via uvx (word-level timings + alignment; no manual install — transcribe.cjs drives uvx whisperx). Falls back to an existing word-level transcript.json if present.
  • Source videomatte.cjs / transcribe.cjs auto-resolve source.mp4 (or glob the clip / read hyperframes.json), so hyperframes init --video X.mp4 needs no manual rename.
  • fpsmatte.cjs extracts at the source's native rate and records matte.fps; render-and-composite.sh uses that so the matte stays frame-aligned.
  • Matting weights are NOT bundled: matte.cjs shells the hyperframes CLI's remove-background, which downloads u2net_human_seg (~168 MB, Apache-2.0) once to ~/.cache/hyperframes/background-removal/models/. First prepare on a fresh machine needs network for that one download.

If a hard dependency is missing, STOP and ask the user — don't silently skip steps.

來自 heygen-com 的更多技能

hyperframes-cli
heygen-com
HyperFrames CLI dev loop — `npx hyperframes` for scaffolding (init), validation (lint, inspect), preview, render, and environment troubleshooting (doctor, browser, info, upgrade). Use when running any of these commands or troubleshooting the HyperFrames build/render environment. For asset preprocessing commands (`tts`, `transcribe`, `remove-background`), invoke the `hyperframes-media` skill instead.
developmenttestingapi
hyperframes-animation
heygen-com
HyperFrames 的所有動畫知識——原子運動規則、多階段場景藍圖、場景轉場、更廣泛的動態設計技術,以及七個運行時適配器(預設 GSAP,加上 Lottie、Three.js、Anime.js、CSS 關鍵影格、Web Animations API、TypeGPU)。適用於任何動態或動畫任務:選取 2-4 條規則並組合,或載入藍圖,或查詢特定運行時 API(例如 GSAP 緩動函數 / Lottie 播放器 / Three.js 混合器)。HyperFrames 原生功能:單一暫停時間軸、安全搜尋、...
creativedevelopmentdesign
hyperframes-core
heygen-com
HyperFrames HTML 組合合約。用於組合結構、資料屬性、片段、軌道、子組合、變數、媒體播放、確定性渲染規則,以及最小可渲染專案的驗證。
developmentmediacreative
hyperframes-media
heygen-com
HyperFrames 組合的資產預處理 — 多供應商 TTS(HeyGen / ElevenLabs / Kokoro 本地)、多供應商 BGM(Google Lyria / 本地 MusicGen)、Whisper 轉錄、背景移除及字幕編寫。用於 npx hyperframes tts、bgm、transcribe、remove-background、語音/供應商選擇、音樂情緒提示、字幕/副標題/歌詞/卡拉OK/逐字樣式設定。
mediaaudiovideo
hyperframes-registry
heygen-com
安裝並將註冊區塊與元件接入 HyperFrames 組合中。適用於執行 hyperframes add、安裝區塊或元件、將已安裝項目接入 index.html,或處理 hyperframes.json 時使用。涵蓋 add 指令、安裝位置、區塊子組合接入、元件片段合併、註冊表探索,以及編寫新區塊或元件以向上游貢獻(構想 → 骨架 → 驗證 → PR)。
developmentapicode-review
general-video
heygen-com
作為自訂 HyperFrames HTML 影片合成編寫的備用方案,適用於無專門工作流程可匹配的情況。涵蓋較長或多場景作品、品牌/宣傳短片、蒙太奇、標題卡、長篇動態海報、靜態循環,以及任意長度或格式的自由形式合成。不適用於行銷產品宣傳(product-launch-video)、一般網站轉影片擷取(website-to-video)、主題解說(faceless-explainer)、GitHub PR 影片(pr-to-video)、為現有素材加字幕等用途。
videocreativemedia
motion-graphics
heygen-com
當使用者需要一段簡短、以設計為主的動態圖像,且動態本身即為訊息時使用:動態字體排版、數字或統計數據累計、圖表/數據視覺化呈現、標誌片頭、品牌組合、下方標題、提示框、社群疊加層、動態標題/推文/新聞項目、動態海報,或快速捕捉的頁面重點。通常長度在10秒以內,最長約30秒,無敘事弧線、旁白或真人拍攝主體。可輸出為MP4或透明疊加層。不適用於較長、多場景、有旁白或品牌展示影片等類型。
creativevideodesign
hyperframes-read-first
heygen-com
任何製作、創作、生成、編輯、動畫化或渲染影片、動畫、動態圖像、解說影片、標題卡、疊加層、字幕影片、產品宣傳片、網站影片、公關或更新日誌影片、數據蒙太奇、動態海報或HyperFrames HTML組合的請求,請從此處開始。當用戶希望HyperFrames創作或渲染完成的MP4/網路影片、選擇工作流程,或在產品發布影片、無臉解說影片、網站轉影片等之間進行路由時,請在其他影片或動畫技能之前使用此功能。
creativevideomedia