langsmith-code-eval

Erstellt codebasierte Evaluatoren für LangSmith-getracete Agenten. Verwenden Sie dies beim Aufbau benutzerdefinierter Evaluierungslogik, beim Testen von Toolnutzungsmustern oder beim Bewerten von Agentenausgaben…

npx skills add https://github.com/langchain-ai/lca-skills --skill langsmith-code-eval

LangSmith Code Evaluator Creation

Creates evaluators for LangSmith experiments through structured inspection and implementation.

Prerequisites

  • langsmith Python package installed
  • LANGSMITH_API_KEY environment variable set (check project's .env file)

Workflow

Copy this checklist and track progress:

Evaluator Creation Progress:
- [ ] Step 1: Gather info from user
- [ ] Step 2: Inspect trace and dataset structure
- [ ] Step 3: Read agent code
- [ ] Step 4: Write evaluator
- [ ] Step 5: Write experiment runner
- [ ] Step 6: Run and iterate

Step 1: Gather Info from User

IMPORTANT: Do NOT search or explore the codebase. Ask the user all of these questions upfront using AskUserQuestion before doing anything else.

Ask the user the following in a single AskUserQuestion call:

  1. Python command: How do you run Python in this project? (e.g., python, python3, uv run python, poetry run python)
  2. Agent file path: What is the path to your agent file?
  3. LangSmith project name: What is your LangSmith project name (where traces are logged)?
  4. LangSmith dataset name: What is the name of the dataset to evaluate against?
  5. Evaluation goal: What behavior should pass vs fail? Common types:
    • Tool usage: Did the agent call the correct tool?
    • Output correctness: Does output match expected format/content?
    • Policy compliance: Did it follow specific rules?
    • Classification: Did it categorize correctly?

Step 2: Inspect Trace and Dataset Structure

Using the info from Step 1, run the inspection scripts located in this skill's directory:

{python_cmd} {skill_dir}/scripts/inspect_trace.py PROJECT_NAME [RUN_ID]
{python_cmd} {skill_dir}/scripts/inspect_dataset.py DATASET_NAME

Replace {python_cmd} with the command from Step 1, and {skill_dir} with this skill's directory path.

Verify the trace matches the agent:

  • Does the trace type match? (e.g., OpenAI trace for OpenAI agent)
  • Does it contain the data needed for evaluation?
  • If mismatched, clarify before proceeding.

From the dataset inspection, note:

  • Input schema (what gets passed to the agent)
  • Output schema (reference/expected outputs)
  • Metadata fields (e.g., expected_tool, difficulty, labels)

The dataset metadata often contains ground truth for evaluation (e.g., which tool should be called, expected classification).

Step 3: Read Agent Code

Read the agent file provided in Step 1 to identify:

  • Entry point function (look for @traceable decorator)
  • Available tools
  • Output format (what the function returns)

Step 4: Write the Evaluator

Create evaluator functions based on trace and dataset structure. See EVALUATOR_REFERENCE.md for function signatures and return formats.

Step 5: Write Experiment Runner

Create a script that:

  1. Imports the agent's entry function
  2. Wraps it as a target function
  3. Runs evaluate() or aevaluate() against the dataset

See EVALUATOR_REFERENCE.md for evaluate() usage.

Step 6: Run and Iterate

Execute the experiment, review results in LangSmith, refine evaluators as needed.

Mehr Skills von langchain-ai

arxiv-search
langchain-ai
Durchsuche arXiv nach Preprints und wissenschaftlichen Artikeln zu einem Thema mit Abruf der Zusammenfassungen. Abfragebasierte Suche in den Bereichen Physik, Mathematik, Informatik, Biologie, Statistik und verwandten Feldern. Konfigurierbare Ergebnisanzahl (Standard 10 Artikel), sortiert nach Relevanz. Gibt Titel und Zusammenfassung jedes passenden Artikels zurück. Erfordert das arxiv-Python-Paket; Installation über pip, falls nicht bereits vorhanden.
official
blog-post
langchain-ai
We need to translate the given English text into German, preserving the name "blog-post" if it appears. The text describes a skill for writing long-form blog posts. It mentions research delegation, structured templates, AI-generated cover images, etc. The instruction says to preserve product names, protocol names, URLs, numbers, technical terms. "blog-post" is a name to preserve. The text does not contain "blog-post" explicitly, but the name is given as "blog-post" in the directory item type. However, the instruction says "Do not include the name unless it appears in the source text." So we should not add "blog-post" if it's not in the source. The source text starts with "Long-form blog post writing..." so "blog post" appears but not as a name to preserve? The instruction says "Name to preserve: blog-post" but then says "Do not include the name unless it appears in the source text." The source text has "blog post" (two words) not "blog-post" (hyphenated). But likely we should
official
code-review
langchain-ai
Führe eine strukturierte Code-Review der Änderungen durch und prüfe auf Korrektheit, Stil, Tests und potenzielle Probleme.
official
coding-prefs
langchain-ai
Lies die Codierungspräferenzen des Benutzers aus /memory/coding-prefs.md, bevor du nicht-triviale Stilentscheidungen triffst, und füge neue Präferenzen hinzu, wenn der Benutzer diese angibt…
official
competitor-analysis
langchain-ai
Wenn Sie gebeten werden, Wettbewerber zu analysieren:
official
cudf-analytics
langchain-ai
Verwendung für GPU-beschleunigte Datenanalyse auf Datensätzen, CSVs oder tabularen Daten mit NVIDIA cuDF. Wird ausgelöst, wenn Aufgaben Groupby-Aggregationen, statistische…
official
cuml-machine-learning
langchain-ai
Verwendung für GPU-beschleunigtes maschinelles Lernen auf tabularen Daten mit NVIDIA cuML. Wird ausgelöst, wenn Aufgaben Klassifikation, Regression, Clustering, Dimensionsreduktion betreffen…
official
data-visualization
langchain-ai
Verwendung zur Erstellung von publikationsreifen Diagrammen und mehrteiligen Analysezusammenfassungen. Wird ausgelöst, wenn Aufgaben die Visualisierung von Daten, das Plotten von Ergebnissen, das Erstellen von … umfassen.
official