llamaindex

작성자: firecrawl

RAG를 사용하여 LLM 애플리케이션을 구축하기 위한 데이터 프레임워크입니다. 문서 수집(300개 이상의 커넥터), 인덱싱 및 쿼리에 특화되어 있습니다. 벡터 인덱스 기능을 제공합니다.

npx skills add https://github.com/firecrawl/ai-research-skills --skill llamaindex

LlamaIndex - Data Framework for LLM Applications

The leading framework for connecting LLMs with your data.

When to use LlamaIndex

Use LlamaIndex when:

  • Building RAG (retrieval-augmented generation) applications
  • Need document question-answering over private data
  • Ingesting data from multiple sources (300+ connectors)
  • Creating knowledge bases for LLMs
  • Building chatbots with enterprise data
  • Need structured data extraction from documents

Metrics:

  • 45,100+ GitHub stars
  • 23,000+ repositories use LlamaIndex
  • 300+ data connectors (LlamaHub)
  • 1,715+ contributors
  • v0.14.7 (stable)

Use alternatives instead:

  • LangChain: More general-purpose, better for agents
  • Haystack: Production search pipelines
  • txtai: Lightweight semantic search
  • Chroma: Just need vector storage

Quick start

Installation

# Starter package (recommended)
pip install llama-index

# Or minimal core + specific integrations
pip install llama-index-core
pip install llama-index-llms-openai
pip install llama-index-embeddings-openai

5-line RAG example

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load documents
documents = SimpleDirectoryReader("data").load_data()

# Create index
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

Core concepts

1. Data connectors - Load documents

from llama_index.core import SimpleDirectoryReader, Document
from llama_index.readers.web import SimpleWebPageReader
from llama_index.readers.github import GithubRepositoryReader

# Directory of files
documents = SimpleDirectoryReader("./data").load_data()

# Web pages
reader = SimpleWebPageReader()
documents = reader.load_data(["https://example.com"])

# GitHub repository
reader = GithubRepositoryReader(owner="user", repo="repo")
documents = reader.load_data(branch="main")

# Manual document creation
doc = Document(
    text="This is the document content",
    metadata={"source": "manual", "date": "2025-01-01"}
)

2. Indices - Structure data

from llama_index.core import VectorStoreIndex, ListIndex, TreeIndex

# Vector index (most common - semantic search)
vector_index = VectorStoreIndex.from_documents(documents)

# List index (sequential scan)
list_index = ListIndex.from_documents(documents)

# Tree index (hierarchical summary)
tree_index = TreeIndex.from_documents(documents)

# Save index
index.storage_context.persist(persist_dir="./storage")

# Load index
from llama_index.core import load_index_from_storage, StorageContext
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

3. Query engines - Ask questions

# Basic query
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")
print(response)

# Streaming response
query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("Explain quantum computing")
for text in response.response_gen:
    print(text, end="", flush=True)

# Custom configuration
query_engine = index.as_query_engine(
    similarity_top_k=3,          # Return top 3 chunks
    response_mode="compact",     # Or "tree_summarize", "simple_summarize"
    verbose=True
)

4. Retrievers - Find relevant chunks

# Vector retriever
retriever = index.as_retriever(similarity_top_k=5)
nodes = retriever.retrieve("machine learning")

# With filtering
retriever = index.as_retriever(
    similarity_top_k=3,
    filters={"metadata.category": "tutorial"}
)

# Custom retriever
from llama_index.core.retrievers import BaseRetriever

class CustomRetriever(BaseRetriever):
    def _retrieve(self, query_bundle):
        # Your custom retrieval logic
        return nodes

Agents with tools

Basic agent

from llama_index.core.agent import FunctionAgent
from llama_index.llms.openai import OpenAI

# Define tools
def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b

def add(a: int, b: int) -> int:
    """Add two numbers."""
    return a + b

# Create agent
llm = OpenAI(model="gpt-4o")
agent = FunctionAgent.from_tools(
    tools=[multiply, add],
    llm=llm,
    verbose=True
)

# Use agent
response = agent.chat("What is 25 * 17 + 142?")
print(response)

RAG agent (document search + tools)

from llama_index.core.tools import QueryEngineTool

# Create index as before
index = VectorStoreIndex.from_documents(documents)

# Wrap query engine as tool
query_tool = QueryEngineTool.from_defaults(
    query_engine=index.as_query_engine(),
    name="python_docs",
    description="Useful for answering questions about Python programming"
)

# Agent with document search + calculator
agent = FunctionAgent.from_tools(
    tools=[query_tool, multiply, add],
    llm=llm
)

# Agent decides when to search docs vs calculate
response = agent.chat("According to the docs, what is Python used for?")

Advanced RAG patterns

Chat engine (conversational)

from llama_index.core.chat_engine import CondensePlusContextChatEngine

# Chat with memory
chat_engine = index.as_chat_engine(
    chat_mode="condense_plus_context",  # Or "context", "react"
    verbose=True
)

# Multi-turn conversation
response1 = chat_engine.chat("What is Python?")
response2 = chat_engine.chat("Can you give examples?")  # Remembers context
response3 = chat_engine.chat("What about web frameworks?")

Metadata filtering

from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter

# Filter by metadata
filters = MetadataFilters(
    filters=[
        ExactMatchFilter(key="category", value="tutorial"),
        ExactMatchFilter(key="difficulty", value="beginner")
    ]
)

retriever = index.as_retriever(
    similarity_top_k=3,
    filters=filters
)

query_engine = index.as_query_engine(filters=filters)

Structured output

from pydantic import BaseModel
from llama_index.core.output_parsers import PydanticOutputParser

class Summary(BaseModel):
    title: str
    main_points: list[str]
    conclusion: str

# Get structured response
output_parser = PydanticOutputParser(output_cls=Summary)
query_engine = index.as_query_engine(output_parser=output_parser)

response = query_engine.query("Summarize the document")
summary = response  # Pydantic model
print(summary.title, summary.main_points)

Data ingestion patterns

Multiple file types

# Load all supported formats
documents = SimpleDirectoryReader(
    "./data",
    recursive=True,
    required_exts=[".pdf", ".docx", ".txt", ".md"]
).load_data()

Web scraping

from llama_index.readers.web import BeautifulSoupWebReader

reader = BeautifulSoupWebReader()
documents = reader.load_data(urls=[
    "https://docs.python.org/3/tutorial/",
    "https://docs.python.org/3/library/"
])

Database

from llama_index.readers.database import DatabaseReader

reader = DatabaseReader(
    sql_database_uri="postgresql://user:pass@localhost/db"
)
documents = reader.load_data(query="SELECT * FROM articles")

API endpoints

from llama_index.readers.json import JSONReader

reader = JSONReader()
documents = reader.load_data("https://api.example.com/data.json")

Vector store integrations

Chroma (local)

from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

# Initialize Chroma
db = chromadb.PersistentClient(path="./chroma_db")
collection = db.get_or_create_collection("my_collection")

# Create vector store
vector_store = ChromaVectorStore(chroma_collection=collection)

# Use in index
from llama_index.core import StorageContext
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Pinecone (cloud)

from llama_index.vector_stores.pinecone import PineconeVectorStore
import pinecone

# Initialize Pinecone
pinecone.init(api_key="your-key", environment="us-west1-gcp")
pinecone_index = pinecone.Index("my-index")

# Create vector store
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

FAISS (fast)

from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

# Create FAISS index
d = 1536  # Dimension of embeddings
faiss_index = faiss.IndexFlatL2(d)

vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Customization

Custom LLM

from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings

# Set global LLM
Settings.llm = Anthropic(model="claude-sonnet-4-5-20250929")

# Now all queries use Anthropic
query_engine = index.as_query_engine()

Custom embeddings

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Use HuggingFace embeddings
Settings.embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

index = VectorStoreIndex.from_documents(documents)

Custom prompt templates

from llama_index.core import PromptTemplate

qa_prompt = PromptTemplate(
    "Context: {context_str}\n"
    "Question: {query_str}\n"
    "Answer the question based only on the context. "
    "If the answer is not in the context, say 'I don't know'.\n"
    "Answer: "
)

query_engine = index.as_query_engine(text_qa_template=qa_prompt)

Multi-modal RAG

Image + text

from llama_index.core import SimpleDirectoryReader
from llama_index.multi_modal_llms.openai import OpenAIMultiModal

# Load images and documents
documents = SimpleDirectoryReader(
    "./data",
    required_exts=[".jpg", ".png", ".pdf"]
).load_data()

# Multi-modal index
index = VectorStoreIndex.from_documents(documents)

# Query with multi-modal LLM
multi_modal_llm = OpenAIMultiModal(model="gpt-4o")
query_engine = index.as_query_engine(llm=multi_modal_llm)

response = query_engine.query("What is in the diagram on page 3?")

Evaluation

Response quality

from llama_index.core.evaluation import RelevancyEvaluator, FaithfulnessEvaluator

# Evaluate relevance
relevancy = RelevancyEvaluator()
result = relevancy.evaluate_response(
    query="What is Python?",
    response=response
)
print(f"Relevancy: {result.passing}")

# Evaluate faithfulness (no hallucination)
faithfulness = FaithfulnessEvaluator()
result = faithfulness.evaluate_response(
    query="What is Python?",
    response=response
)
print(f"Faithfulness: {result.passing}")

Best practices

  1. Use vector indices for most cases - Best performance
  2. Save indices to disk - Avoid re-indexing
  3. Chunk documents properly - 512-1024 tokens optimal
  4. Add metadata - Enables filtering and tracking
  5. Use streaming - Better UX for long responses
  6. Enable verbose during dev - See retrieval process
  7. Evaluate responses - Check relevance and faithfulness
  8. Use chat engine for conversations - Built-in memory
  9. Persist storage - Don't lose your index
  10. Monitor costs - Track embedding and LLM usage

Common patterns

Document Q&A system

# Complete RAG pipeline
documents = SimpleDirectoryReader("docs").load_data()
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist(persist_dir="./storage")

# Query
query_engine = index.as_query_engine(
    similarity_top_k=3,
    response_mode="compact",
    verbose=True
)
response = query_engine.query("What is the main topic?")
print(response)
print(f"Sources: {[node.metadata['file_name'] for node in response.source_nodes]}")

Chatbot with memory

# Conversational interface
chat_engine = index.as_chat_engine(
    chat_mode="condense_plus_context",
    verbose=True
)

# Multi-turn chat
while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break
    response = chat_engine.chat(user_input)
    print(f"Bot: {response}")

Performance benchmarks

OperationLatencyNotes
Index 100 docs~10-30sOne-time, can persist
Query (vector)~0.5-2sRetrieval + LLM
Streaming query~0.5s first tokenBetter UX
Agent with tools~3-8sMultiple tool calls

LlamaIndex vs LangChain

FeatureLlamaIndexLangChain
Best forRAG, document Q&AAgents, general LLM apps
Data connectors300+ (LlamaHub)100+
RAG focusCore featureOne of many
Learning curveEasier for RAGSteeper
CustomizationHighVery high
DocumentationExcellentGood

Use LlamaIndex when:

  • Your primary use case is RAG
  • Need many data connectors
  • Want simpler API for document Q&A
  • Building knowledge retrieval system

Use LangChain when:

  • Building complex agents
  • Need more general-purpose tools
  • Want more flexibility
  • Complex multi-step workflows

References

Resources

firecrawl의 다른 스킬

oracle
firecrawl
oracle CLI 사용 모범 사례 (프롬프트 + 파일 번들링, 엔진, 세션 및 파일 첨부 패턴)
official
firecrawl-monitor
firecrawl
웹사이트 콘텐츠 변경을 감지하고 웹훅이나 이메일로 알림을 받습니다 — 크론 작업, 스크래퍼, diff 스크립트가 필요하지 않습니다. 사용자가 페이지 변경 사항을 추적하거나, 경쟁사 가격을 모니터링하거나, 새 채용 공고나 블로그 게시물에 대한 알림을 받거나, 문서/변경 로그/상태 페이지를 모니터링하거나, "모니터링", "감시", "추적", "변경 시 알림", "X가 변경되면 알림", "변경되면 알려줘", "변경 시 이메일 보내줘", "웹훅 보내줘"라고 말할 때 이 스킬을 사용하세요. 내장된 AI 판별기가 포맷, 타임스탬프 등을 필터링합니다...
officialweb-scrapingresearch
firecrawl-deep-research
firecrawl
Firecrawl을 사용하여 다중 소스 심층 연구를 실행합니다. 사용자가 주제를 조사하거나, 관점을 비교하거나, 출처가 포함된 브리핑을 작성하거나, 기술적 또는 시장 관련 질문을 조사하거나, 여러 소스의 웹 증거를 종합하도록 요청할 때 사용하세요.
officialresearchweb-scraping
firecrawl-research-papers
firecrawl
Firecrawl을 사용하여 연구 논문, 백서, PDF, 기술 보고서 및 학술 자료를 찾고 종합합니다. 사용자가 문헌 검토, 논문 요약, 연구 동향, 또는 PDF 및 학술/산업 간행물에서 출처가 포함된 종합 정보를 원할 때 사용하세요.
officialresearchweb-scraping
firecrawl-market-research
firecrawl
Firecrawl을 사용하여 시장, 재무, 실적, 산업 및 기업 지표를 추출합니다. 사용자가 시장 조사, 산업 동향, 상장 기업 데이터, 재무 비교, 실적 조사 또는 구조화된 시장 보고서를 요청할 때 사용하세요.
officialresearchweb-scraping
firecrawl-website-design-clone
firecrawl
Firecrawl 스크레이프 증거를 사용하여 모든 웹사이트의 디자인 시스템을 에이전트가 사용할 수 있는 DESIGN.md로 추출합니다. 사용자가 웹사이트의 색상, 글꼴, 간격, 구성 요소, 레이아웃 패턴 또는 브랜드/UI 가이드를 원할 때 사용하여 AI 에이전트가 새 웹사이트를 만들거나, 디자인을 복제하거나, 해당 디자인에서 영감을 받은 페이지를 구축할 수 있도록 합니다.
officialdesignweb-scraping
firecrawl-knowledge-base
firecrawl
Firecrawl을 사용하여 웹 콘텐츠로 지식 베이스를 구축하세요. 로컬 참조 문서, RAG 준비 청크, 파인튜닝 데이터셋, 문서 미러, 주제 코퍼스 또는 웹 소스에서 정리된 LLM 준비 마크다운에 사용할 수 있습니다.
officialweb-scrapingresearch
firecrawl-lead-research
firecrawl
Firecrawl을 사용하여 회의 전 리드 인텔리전스 브리핑을 생성합니다. 사용자가 영업 통화, 파트너십 회의, 투자자 대화 또는 고객 인터뷰 전에 회사 조사, 인물 조사, 최신 뉴스, 대화 포인트, 문제점 또는 아웃리치 준비가 필요할 때 사용합니다.
officialresearchweb-scraping