langsmith-code-eval

โดย langchain-ai

สร้างตัวประเมินที่ใช้โค้ดสำหรับเอเจนต์ที่ถูกติดตามด้วย LangSmith ใช้เมื่อสร้างตรรกะการประเมินแบบกำหนดเอง ทดสอบรูปแบบการใช้งานเครื่องมือ หรือให้คะแนนผลลัพธ์ของเอเจนต์…

npx skills add https://github.com/langchain-ai/lca-skills --skill langsmith-code-eval

ดาวน์โหลด ZIP GitHub

LangSmith Code Evaluator Creation

Creates evaluators for LangSmith experiments through structured inspection and implementation.

Prerequisites

langsmith Python package installed
LANGSMITH_API_KEY environment variable set (check project's .env file)

Workflow

Copy this checklist and track progress:

Evaluator Creation Progress:
- [ ] Step 1: Gather info from user
- [ ] Step 2: Inspect trace and dataset structure
- [ ] Step 3: Read agent code
- [ ] Step 4: Write evaluator
- [ ] Step 5: Write experiment runner
- [ ] Step 6: Run and iterate

Step 1: Gather Info from User

IMPORTANT: Do NOT search or explore the codebase. Ask the user all of these questions upfront using AskUserQuestion before doing anything else.

Ask the user the following in a single AskUserQuestion call:

Python command: How do you run Python in this project? (e.g., python, python3, uv run python, poetry run python)
Agent file path: What is the path to your agent file?
LangSmith project name: What is your LangSmith project name (where traces are logged)?
LangSmith dataset name: What is the name of the dataset to evaluate against?
Evaluation goal: What behavior should pass vs fail? Common types:
- Tool usage: Did the agent call the correct tool?
- Output correctness: Does output match expected format/content?
- Policy compliance: Did it follow specific rules?
- Classification: Did it categorize correctly?

Step 2: Inspect Trace and Dataset Structure

Using the info from Step 1, run the inspection scripts located in this skill's directory:

{python_cmd} {skill_dir}/scripts/inspect_trace.py PROJECT_NAME [RUN_ID]
{python_cmd} {skill_dir}/scripts/inspect_dataset.py DATASET_NAME

Replace {python_cmd} with the command from Step 1, and {skill_dir} with this skill's directory path.

Verify the trace matches the agent:

Does the trace type match? (e.g., OpenAI trace for OpenAI agent)
Does it contain the data needed for evaluation?
If mismatched, clarify before proceeding.

From the dataset inspection, note:

Input schema (what gets passed to the agent)
Output schema (reference/expected outputs)
Metadata fields (e.g., expected_tool, difficulty, labels)

The dataset metadata often contains ground truth for evaluation (e.g., which tool should be called, expected classification).

Step 3: Read Agent Code

Read the agent file provided in Step 1 to identify:

Entry point function (look for @traceable decorator)
Available tools
Output format (what the function returns)

Step 4: Write the Evaluator

Create evaluator functions based on trace and dataset structure. See EVALUATOR_REFERENCE.md for function signatures and return formats.

Step 5: Write Experiment Runner

Create a script that:

Imports the agent's entry function
Wraps it as a target function
Runs evaluate() or aevaluate() against the dataset

See EVALUATOR_REFERENCE.md for evaluate() usage.

Step 6: Run and Iterate

Execute the experiment, review results in LangSmith, refine evaluators as needed.

Skills เพิ่มเติมจาก langchain-ai

arxiv-search

langchain-ai

ค้นหา arXiv สำหรับพรีปรินต์และเอกสารวิชาการตามหัวข้อ พร้อมดึงบทคัดย่อ ค้นหาตามคำถามในสาขาฟิสิกส์ คณิตศาสตร์ วิทยาการคอมพิวเตอร์ ชีววิทยา สถิติ และสาขาที่เกี่ยวข้อง กำหนดจำนวนผลลัพธ์ได้ (ค่าเริ่มต้น 10 เอกสาร) โดยเรียงตามความเกี่ยวข้อง คืนค่าชื่อเรื่องและบทคัดย่อของเอกสารที่ตรงกัน ต้องใช้แพ็กเกจ arxiv ใน Python ติดตั้งผ่าน pip หากยังไม่มี

official

blog-post

langchain-ai

การเขียนบล็อกโพสต์แบบยาว พร้อมการมอบหมายงานวิจัย เทมเพลตเนื้อหาที่มีโครงสร้าง และภาพปกที่สร้างโดย AI มอบหมายการวิจัยให้กับตัวแทนย่อยก่อนเขียน โดยเก็บผลลัพธ์ในรูปแบบมาร์กดาวน์เพื่อใช้อ้างอิงและบริบท บังคับใช้โครงสร้างโพสต์ห้าส่วน: การดึงดูดความสนใจ บริบท เนื้อหาหลัก (3–5 หัวข้อ) การประยุกต์ใช้จริง และบทสรุปพร้อมคำกระตุ้นการตัดสินใจ สร้างภาพปกที่ปรับแต่งเพื่อ SEO โดยใช้พรอมต์โดยละเอียดที่ครอบคลุมหัวข้อ สไตล์ องค์ประกอบ สี และแสง ส่งออกโพสต์ไปยัง...

official

code-review

langchain-ai

ดำเนินการตรวจสอบโค้ดที่มีการเปลี่ยนแปลงอย่างมีโครงสร้าง โดยตรวจสอบความถูกต้อง รูปแบบ การทดสอบ และปัญหาที่อาจเกิดขึ้น

official

coding-prefs

langchain-ai

อ่านความชอบในการเขียนโค้ดของผู้ใช้จาก /memory/coding-prefs.md ก่อนตัดสินใจเรื่องสไตล์ที่ไม่ใช่เรื่องเล็กน้อย และเพิ่มความชอบใหม่เมื่อผู้ใช้ให้…

official

competitor-analysis

langchain-ai

เมื่อถูกขอให้วิเคราะห์คู่แข่ง:

official

cudf-analytics

langchain-ai

ใช้สำหรับการวิเคราะห์ข้อมูลที่เร่งด้วย GPU บนชุดข้อมูล, CSV หรือข้อมูลแบบตารางโดยใช้ NVIDIA cuDF ทำงานเมื่อมีงานที่เกี่ยวข้องกับการรวมกลุ่มแบบ groupby, การคำนวณทางสถิติ...

official

cuml-machine-learning

langchain-ai

ใช้สำหรับการเรียนรู้ของเครื่องที่เร่งด้วย GPU บนข้อมูลแบบตารางโดยใช้ NVIDIA cuML ทำงานเมื่อมีงานที่เกี่ยวข้องกับการจำแนกประเภท การถดถอย การจัดกลุ่ม การลดมิติ...

official

data-visualization

langchain-ai

ใช้สำหรับสร้างแผนภูมิคุณภาพระดับสิ่งพิมพ์และสรุปการวิเคราะห์แบบหลายแผง ทำงานเมื่อมีงานที่เกี่ยวข้องกับการแสดงข้อมูลเป็นภาพ การพล็อตผลลัพธ์ การสร้าง...

official