Music

작성자: ElevenLabs

ElevenLabs Music API를 사용하여 음악을 생성합니다. 악기 트랙, 가사가 있는 노래, 배경 음악, 징글 또는 AI 생성 음악 작곡을 만들 때 사용합니다. 프롬프트 기반 생성, 세부 제어를 위한 작곡 계획, 메타데이터가 포함된 상세 출력을 지원합니다.

npx skills add https://github.com/elevenlabs/skills --skill music

ElevenLabs Music Generation

Generate music from text prompts - supports instrumental tracks, songs with lyrics, and fine-grained control via composition plans.

Setup: See Installation Guide. For JavaScript, use @elevenlabs/* packages only.

All examples below default to music_v2, the current generation model. Pass model_id="music_v1" only when explicitly requested to.

Quick Start

Python

from elevenlabs import ElevenLabs

client = ElevenLabs()

audio = client.music.compose(
    prompt="A chill lo-fi hip hop beat with jazzy piano chords",
    music_length_ms=30000,
    model_id="music_v2",
)

with open("output.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)

TypeScript

import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createWriteStream } from "fs";

const client = new ElevenLabsClient();
const audio = await client.music.compose({
  prompt: "A chill lo-fi hip hop beat with jazzy piano chords",
  musicLengthMs: 30000,
  modelId: "music_v2",
});
audio.pipe(createWriteStream("output.mp3"));

cURL

curl -X POST "https://api.elevenlabs.io/v1/music" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" -H "Content-Type: application/json" \
  -d '{"prompt": "A chill lo-fi beat", "music_length_ms": 30000, "model_id": "music_v2"}' \
  --output output.mp3

Methods

MethodDescription
music.composeGenerate audio from a prompt or composition plan
music.streamStream audio chunks as they are generated (paid plans)
music.composition_plan.createGenerate a structured plan for fine-grained control
music.compose_detailedGenerate audio + composition plan + metadata; pass store_for_inpainting=True to enable inpainting
music.video_to_musicGenerate background music from one or more uploaded video files
music.uploadUpload an audio file for later inpainting workflows, optionally extracting its composition plan or word-level timestamps

See API Reference for full parameter details.

music.upload is available to enterprise clients with access to the inpainting feature.

Video to Music

Generate background music from uploaded video clips via POST /v1/music/video-to-music (client.music.video_to_music). This is separate from prompt-based music.compose (POST /v1/music).

The API combines videos in order, accepts an optional natural-language description, and lets you steer style with up to 10 tags such as upbeat or cinematic. This endpoint still defaults to music_v1; pass model_id="music_v2" to use the newer model.

Python

from elevenlabs import ElevenLabs

client = ElevenLabs()

audio = client.music.video_to_music(
    videos=["trailer.mp4"],
    description="Build suspense, then resolve with a warm cinematic finish.",
    tags=["cinematic", "suspenseful", "uplifting"],
    model_id="music_v2",
)

with open("video-score.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)

TypeScript

import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createReadStream, createWriteStream } from "fs";

const client = new ElevenLabsClient();

const audio = await client.music.videoToMusic({
  videos: [createReadStream("trailer.mp4")],
  description: "Build suspense, then resolve with a warm cinematic finish.",
  tags: ["cinematic", "suspenseful", "uplifting"],
  modelId: "music_v2",
});

audio.pipe(createWriteStream("video-score.mp3"));

cURL

curl -X POST "https://api.elevenlabs.io/v1/music/video-to-music" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" \
  -F "[email protected]" \
  -F "description=Build suspense, then resolve with a warm cinematic finish." \
  -F "tags=cinematic" \
  -F "tags=suspenseful" \
  -F "tags=uplifting" \
  -F "model_id=music_v2" \
  --output video-score.mp3

Constraints from the current API schema:

  • Upload 1-10 video files per request
  • Keep total combined upload size at or below 200 MB
  • Keep total combined video duration at or below 600 seconds
  • Use description for high-level musical direction and tags for concise style cues

Composition Plans

music_v2 composition plans are an ordered list of chunks. Each chunk specifies its own text (section label, lyrics, inline cues), duration_ms, positive_styles, negative_styles, and context_adherence (low, medium, or high, default high). Up to 30 chunks per plan, each 3,000–120,000 ms, total length 3 s to 10 minutes.

Generate a plan first, edit it, then compose:

plan = client.music.composition_plan.create(
    prompt="An epic orchestral piece building to a climax",
    music_length_ms=60000,
    model_id="music_v2",
)

# Edit chunks in place
plan["chunks"][0]["text"] = "[Intro]\nQuiet strings rising"

audio = client.music.compose(
    composition_plan=plan,
    model_id="music_v2",
)
const plan = await client.music.compositionPlan.create({
  prompt: "An epic orchestral piece building to a climax",
  musicLengthMs: 60000,
  modelId: "music_v2",
});

plan.chunks[0].text = "[Intro]\nQuiet strings rising";

const audio = await client.music.compose({
  compositionPlan: plan,
  modelId: "music_v2",
});

Or hand-build a plan to control lyrics and style per section:

composition_plan = {
    "chunks": [
        {
            "text": "[Verse]\nWalking down an empty street",
            "duration_ms": 15000,
            "positive_styles": ["pop", "upbeat", "female vocals", "acoustic guitar"],
            "negative_styles": ["dark", "slow"],
            "context_adherence": "high",
        },
        {
            "text": "[Chorus]\nThis is my moment",
            "duration_ms": 15000,
            "positive_styles": ["powerful vocals", "full band"],
            "negative_styles": [],
            "context_adherence": "high",
        },
    ]
}

audio = client.music.compose(composition_plan=composition_plan, model_id="music_v2")
const compositionPlan = {
  chunks: [
    {
      text: "[Verse]\nWalking down an empty street",
      durationMs: 15000,
      positiveStyles: ["pop", "upbeat", "female vocals", "acoustic guitar"],
      negativeStyles: ["dark", "slow"],
      contextAdherence: "high",
    },
    {
      text: "[Chorus]\nThis is my moment",
      durationMs: 15000,
      positiveStyles: ["powerful vocals", "full band"],
      negativeStyles: [],
      contextAdherence: "high",
    },
  ],
};

const audio = await client.music.compose({
  compositionPlan,
  modelId: "music_v2",
});

Put broader characteristics (genre, instrumentation, vocal style) in positive_styles, not in text. The first chunk's styles set the overall tone — include 6–7 styles there.

Streaming

For paid plans, stream audio chunks as they are generated instead of waiting for the full file:

from io import BytesIO

stream = client.music.stream(
    prompt="A driving synthwave track with arpeggiated leads",
    music_length_ms=30000,
    model_id="music_v2",
)

buffer = BytesIO()
for chunk in stream:
    if chunk:
        buffer.write(chunk)
const stream = await client.music.stream({
  prompt: "A driving synthwave track with arpeggiated leads",
  musicLengthMs: 30000,
  modelId: "music_v2",
});

const chunks: Buffer[] = [];
for await (const chunk of stream) {
  chunks.push(chunk);
}

Inpainting

Inpainting edits or extends a stored song by mixing audio reference chunks (unchanged slices of a stored song) with new generation chunks in a single composition plan.

Step 1 — get a song_id, either by storing a fresh generation or uploading existing audio:

# Option A: keep a generation for later editing
result = client.music.compose_detailed(
    prompt="An upbeat pop song with verse and chorus",
    music_length_ms=60000,
    model_id="music_v2",
    store_for_inpainting=True,
)
song_id = result.song_id

# Option B: upload an existing track and extract its plan
uploaded = client.music.upload(
    file=open("my-song.mp3", "rb"),
    extract_composition_plan="music_v2",
)
song_id = uploaded.song_id
composition_plan = uploaded.composition_plan
import { createReadStream } from "fs";

// Option A: keep a generation for later editing
const result = await client.music.composeDetailed({
  prompt: "An upbeat pop song with verse and chorus",
  musicLengthMs: 60000,
  modelId: "music_v2",
  storeForInpainting: true,
});
let songId = result.songId;

// Option B: upload an existing track and extract its plan
const uploaded = await client.music.upload({
  file: createReadStream("my-song.mp3"),
  extractCompositionPlan: "music_v2",
});
songId = uploaded.songId;
const compositionPlan = uploaded.compositionPlan;

Step 2 — compose a plan that references the stored audio and regenerates the part you want to change:

plan = {
    "chunks": [
        {"song_id": song_id, "range": {"start_ms": 0, "end_ms": 30000}},
        {
            "text": "[Chorus]\nWe're rising up tonight",
            "duration_ms": 30000,
            "positive_styles": ["bigger drums", "layered vocals", "anthemic"],
            "negative_styles": ["sparse"],
            "context_adherence": "high",
        },
    ]
}

audio = client.music.compose(composition_plan=plan, model_id="music_v2")
const plan = {
  chunks: [
    { songId, range: { startMs: 0, endMs: 30000 } },
    {
      text: "[Chorus]\nWe're rising up tonight",
      durationMs: 30000,
      positiveStyles: ["bigger drums", "layered vocals", "anthemic"],
      negativeStyles: ["sparse"],
      contextAdherence: "high",
    },
  ],
};

const audio = await client.music.compose({
  compositionPlan: plan,
  modelId: "music_v2",
});

To match the feel of a stored slice without copying it, attach a conditioning_ref (up to 30,000 ms) plus a condition_strength of low, medium, high, or xhigh to a generation chunk. Conditioning placed on the first chunk influences every later chunk.

See API Reference for the full inpainting parameter list.

Content Restrictions

  • Cannot reference specific artists, bands, or copyrighted lyrics
  • bad_prompt errors include a prompt_suggestion with alternative phrasing
  • bad_composition_plan errors include a composition_plan_suggestion

Error Handling

try:
    audio = client.music.compose(prompt="...", music_length_ms=30000)
except Exception as e:
    print(f"API error: {e}")
try {
  const audio = await client.music.compose({
    prompt: "...",
    musicLengthMs: 30000,
  });
} catch (err) {
  console.error("API error:", err);
}

Common errors: 401 (invalid key), 422 (invalid params), 429 (rate limit).

References

ElevenLabs의 다른 스킬

Setup API Key
ElevenLabs
사용자가 ElevenLabs MCP 도구와 함께 사용할 ElevenLabs API 키를 설정하는 과정을 안내합니다. 사용자가 ElevenLabs API 키를 구성해야 할 때, API 키 누락으로 ElevenLabs 도구가 실패할 때, 또는 사용자가 ElevenLabs에 대한 액세스가 필요하다고 언급할 때 사용하세요.
development
Agents
ElevenLabs
ElevenLabs로 음성 AI 에이전트를 구축하세요. 음성 비서, 고객 서비스 봇, 대화형 음성 캐릭터 또는 실시간 음성 대화 경험을 만들 때 사용합니다.
developmentofficial
Sound Effects
ElevenLabs
텍스트 설명을 사용하여 ElevenLabs로 음향 효과를 생성합니다. 음향 효과 제작, 오디오 텍스처 생성, 앰비언트 사운드, 시네마틱 임팩트, UI 사운드 또는 음성이 아닌 모든 오디오를 만들 때 사용하세요. 루핑, 길이 제어, 프롬프트 영향 조정을 지원합니다.
developmentofficial
Speech To Text
ElevenLabs
ElevenLabs Scribe v2를 사용하여 오디오를 텍스트로 변환합니다. 오디오/비디오를 텍스트로 변환하거나, 자막을 생성하거나, 회의를 기록하거나, 음성 콘텐츠를 처리할 때 사용하세요.
developmentofficial
Text To Speech
ElevenLabs
ElevenLabs 음성 AI를 사용하여 텍스트를 음성으로 변환합니다. 텍스트에서 오디오를 생성하거나, 음성 해설을 만들거나, 음성 앱을 구축하거나, 70개 이상의 언어로 음성을 합성할 때 사용하세요.
developmentofficial