hyperframes-media

Asset preprocessing for HyperFrames compositions — multi-provider TTS (HeyGen / ElevenLabs / Kokoro local), multi-provider BGM (Google Lyria / local MusicGen), Whisper transcription, background removal, and caption authoring. Use for npx hyperframes tts, bgm, transcribe, remove-background, voice/provider selection, music-mood prompting, captions / subtitles / lyrics / karaoke / per-word styling.

npx skills add https://github.com/heygen-com/hyperframes --skill hyperframes-media

HyperFrames Media

CLI commands that create assets (tts, bgm, transcribe, remove-background), plus everything needed to consume and animate transcript data in HTML. For placing assets into compositions, see hyperframes-core.

Provider chains (auto-detected from env)

TTSnpx hyperframes tts "..." picks the first available provider:

OrderProviderDetected whenWord timestamps
1HeyGen (Starfish)$HEYGEN_API_KEY / hyperframes auth loginYes, native — pass --words narration.words.json to capture
2ElevenLabs$ELEVENLABS_API_KEY setNo — chain transcribe after
3Kokoro-82M (local, 54 voices)always (no key required)No — chain transcribe after

If the installed hyperframes tts is the local-only build (its --help says "Kokoro-82M" and has no --provider/--words flags), it silently falls back to Kokoro even with $HEYGEN_API_KEY set. To force HeyGen regardless of CLI version, use the self-contained scripts/heygen-tts.mjs (see references/tts.md).

BGMnpx hyperframes bgm --duration N:

OrderProviderDetected when
1Google Lyria (RealTime)$GEMINI_API_KEY or $GOOGLE_API_KEY set
2MusicGen (facebook/musicgen-small, local)Python transformers + torch + soundfile installed

Override either with --provider <name>.

Routing

TaskRead
npx hyperframes tts — provider chain, voice IDs, words.jsonreferences/tts.md
HeyGen without the CLI — self-contained REST script (wav + words)scripts/heygen-tts.mjs (see references/tts.md)
npx hyperframes bgm — Lyria vs MusicGen, mood prompts, tuningreferences/bgm.md
npx hyperframes transcribe — Whisper, model rules, output shapereferences/transcribe.md
npx hyperframes remove-background — transparent cutoutsreferences/remove-background.md
TTS → transcription → captions (no recorded voiceover)references/tts-to-captions.md
Caption authoring — style detection, layout, word grouping, exitreferences/captions/authoring.md
Transcript handling — input formats, quality gates, cleanup, APIsreferences/captions/transcript-handling.md
Caption motion — karaoke, marker effects, audio-reactivereferences/captions/motion.md
Model caches, system dependencies, troubleshootingreferences/requirements.md

Non-negotiable rules

  • Voice IDs are provider-specific. am_michael is Kokoro-only; HeyGen UUIDs don't work on Kokoro. If you pass --voice, also pin --provider to avoid silent provider drift when the user's env changes.
  • Always pass --model to transcribe. The CLI default small.en silently translates non-English audio. See references/transcribe.md → "Language Rule".
  • HeyGen returns word timestamps; ElevenLabs / Kokoro do not. When you want captions, either pass --words to HeyGen and use that JSON directly, or run transcribe against the audio file. Don't assume word data is always there.
  • Captions consume the flat word-array format with { id, text, start, end }. See references/transcribe.md → "Output Shape".
  • remove-background --background-output is hole-cut, not inpainted. For "scene without the person", a different tool is needed. See references/remove-background.md → "When NOT the right tool".

More skills from heygen-com

hyperframes-cli
heygen-com
HyperFrames CLI dev loop — `npx hyperframes` for scaffolding (init), validation (lint, inspect), preview, render, and environment troubleshooting (doctor, browser, info, upgrade). Use when running any of these commands or troubleshooting the HyperFrames build/render environment. For asset preprocessing commands (`tts`, `transcribe`, `remove-background`), invoke the `hyperframes-media` skill instead.
developmenttestingapi
hyperframes-animation
heygen-com
All animation knowledge for HyperFrames — atomic motion rules, multi-phase scene blueprints, scene transitions, broader motion-design techniques, AND the seven runtime adapters (GSAP default, plus Lottie, Three.js, Anime.js, CSS keyframes, Web Animations API, TypeGPU). Use for any motion or animation task: pick 2-4 rules and compose, or load a blueprint, or look up runtime-specific API (e.g. GSAP eases / Lottie player / Three.js mixer). HyperFrames-native: single paused timeline, seek-safe,...
creativedevelopmentdesign
hyperframes-core
heygen-com
HyperFrames HTML composition contract. Use for composition structure, data attributes, clips, tracks, sub-compositions, variables, media playback, deterministic render rules, and validation of minimal renderable projects.
developmentmediacreative
hyperframes-registry
heygen-com
Install and wire registry blocks and components into HyperFrames compositions. Use when running hyperframes add, installing a block or component, wiring an installed item into index.html, or working with hyperframes.json. Covers the add command, install locations, block sub-composition wiring, component snippet merging, registry discovery, and authoring a new block or component to contribute upstream (idea → scaffold → validate → PR).
developmentapicode-review
general-video
heygen-com
Use as the fallback for custom HyperFrames HTML video composition authoring when no specialized workflow fits. Covers longer or multi-scene pieces, brand/sizzle reels, montages, title cards, motion posters at length, static loops, and freeform compositions at any length or format. Not for marketed product promos (product-launch-video), general website-to-video capture (website-to-video), topic explainers (faceless-explainer), GitHub PR videos (pr-to-video), captioning existing footage...
videocreativemedia
motion-graphics
heygen-com
Use when the user wants a short, design-led motion graphic where motion is the message: kinetic typography, stat or number count-up, chart/data-viz hit, logo sting, brand lockup, lower-third, callout, social overlay, animated headline/tweet/news item, motion poster, or quick captured-page highlight. Usually under 10s and up to ~30s, with no narration arc, voice-over, or live-action subject. Can render to MP4 or transparent overlay. Not for longer, multi-scene, narrated, or brand-reel pieces...
creativevideodesign
hyperframes-read-first
heygen-com
START HERE for any request to make, create, generate, edit, animate, or render a video, animation, motion graphic, explainer, title card, overlay, captioned video, product promo, website video, PR or changelog video, data montage, motion poster, or HyperFrames HTML composition. Use before other video or animation skills when the user wants HyperFrames to author or render a finished MP4/web video, choose a workflow, or route between product-launch-video, faceless-explainer, website-to-video,...
creativevideomedia
hyperframes-creative
heygen-com
Non-animation creative direction for HyperFrames videos. Use for design spec (frame.md / design.md) handling, palettes, typography, narration, beat planning, audio-reactive visuals, composition patterns, and brand / style decisions. For atomic motion patterns and scene blueprints, use `hyperframes-animation`.
creativedesignvideo