langsmith-code-eval

bởi langchain-ai

Tạo các bộ đánh giá dựa trên mã cho các tác nhân được theo dõi bởi LangSmith. Sử dụng khi xây dựng logic đánh giá tùy chỉnh, kiểm tra các mẫu sử dụng công cụ hoặc chấm điểm đầu ra của tác nhân…

npx skills add https://github.com/langchain-ai/lca-skills --skill langsmith-code-eval

Tải ZIP GitHub

LangSmith Code Evaluator Creation

Creates evaluators for LangSmith experiments through structured inspection and implementation.

Prerequisites

langsmith Python package installed
LANGSMITH_API_KEY environment variable set (check project's .env file)

Workflow

Copy this checklist and track progress:

Evaluator Creation Progress:
- [ ] Step 1: Gather info from user
- [ ] Step 2: Inspect trace and dataset structure
- [ ] Step 3: Read agent code
- [ ] Step 4: Write evaluator
- [ ] Step 5: Write experiment runner
- [ ] Step 6: Run and iterate

Step 1: Gather Info from User

IMPORTANT: Do NOT search or explore the codebase. Ask the user all of these questions upfront using AskUserQuestion before doing anything else.

Ask the user the following in a single AskUserQuestion call:

Python command: How do you run Python in this project? (e.g., python, python3, uv run python, poetry run python)
Agent file path: What is the path to your agent file?
LangSmith project name: What is your LangSmith project name (where traces are logged)?
LangSmith dataset name: What is the name of the dataset to evaluate against?
Evaluation goal: What behavior should pass vs fail? Common types:
- Tool usage: Did the agent call the correct tool?
- Output correctness: Does output match expected format/content?
- Policy compliance: Did it follow specific rules?
- Classification: Did it categorize correctly?

Step 2: Inspect Trace and Dataset Structure

Using the info from Step 1, run the inspection scripts located in this skill's directory:

{python_cmd} {skill_dir}/scripts/inspect_trace.py PROJECT_NAME [RUN_ID]
{python_cmd} {skill_dir}/scripts/inspect_dataset.py DATASET_NAME

Replace {python_cmd} with the command from Step 1, and {skill_dir} with this skill's directory path.

Verify the trace matches the agent:

Does the trace type match? (e.g., OpenAI trace for OpenAI agent)
Does it contain the data needed for evaluation?
If mismatched, clarify before proceeding.

From the dataset inspection, note:

Input schema (what gets passed to the agent)
Output schema (reference/expected outputs)
Metadata fields (e.g., expected_tool, difficulty, labels)

The dataset metadata often contains ground truth for evaluation (e.g., which tool should be called, expected classification).

Step 3: Read Agent Code

Read the agent file provided in Step 1 to identify:

Entry point function (look for @traceable decorator)
Available tools
Output format (what the function returns)

Step 4: Write the Evaluator

Create evaluator functions based on trace and dataset structure. See EVALUATOR_REFERENCE.md for function signatures and return formats.

Step 5: Write Experiment Runner

Create a script that:

Imports the agent's entry function
Wraps it as a target function
Runs evaluate() or aevaluate() against the dataset

See EVALUATOR_REFERENCE.md for evaluate() usage.

Step 6: Run and Iterate

Execute the experiment, review results in LangSmith, refine evaluators as needed.

Thêm skills từ langchain-ai

arxiv-search

langchain-ai

Tìm kiếm bản in trước và bài báo học thuật trên arXiv theo chủ đề có truy xuất tóm tắt. Tìm kiếm dựa trên truy vấn trong các lĩnh vực vật lý, toán học, khoa học máy tính, sinh học, thống kê và các lĩnh vực liên quan. Giới hạn kết quả có thể cấu hình (mặc định 10 bài báo) với kết quả được sắp xếp theo mức độ liên quan. Trả về tiêu đề và tóm tắt cho mỗi bài báo phù hợp. Yêu cầu gói Python arxiv; cài đặt qua pip nếu chưa có.

official

blog-post

langchain-ai

Viết bài blog dài với phân công nghiên cứu, mẫu nội dung có cấu trúc và ảnh bìa do AI tạo. Phân công nghiên cứu cho các tác nhân phụ trước khi viết, lưu trữ kết quả dưới dạng markdown để tham khảo và ngữ cảnh. Áp dụng cấu trúc bài viết năm phần: mở đầu thu hút, bối cảnh, nội dung chính (3–5 phần), ứng dụng thực tế và kết luận kèm lời kêu gọi hành động. Tạo ảnh bìa tối ưu SEO bằng các gợi ý chi tiết về chủ đề, phong cách, bố cục, màu sắc và ánh sáng. Xuất bài viết đến...

official

code-review

langchain-ai

Thực hiện đánh giá mã nguồn có cấu trúc đối với các thay đổi, kiểm tra tính chính xác, phong cách, bài kiểm tra và các vấn đề tiềm ẩn.

official

coding-prefs

langchain-ai

Đọc sở thích lập trình của người dùng từ /memory/coding-prefs.md trước khi đưa ra các quyết định về phong cách không tầm thường, và thêm các sở thích mới khi người dùng đưa ra…

official

competitor-analysis

langchain-ai

Khi được yêu cầu phân tích đối thủ cạnh tranh:

official

cudf-analytics

langchain-ai

Sử dụng để phân tích dữ liệu tăng tốc GPU trên các tập dữ liệu, CSV hoặc dữ liệu dạng bảng bằng NVIDIA cuDF. Kích hoạt khi các tác vụ liên quan đến tổng hợp groupby, thống kê…

official

cuml-machine-learning

langchain-ai

Sử dụng cho học máy tăng tốc GPU trên dữ liệu dạng bảng với NVIDIA cuML. Kích hoạt khi tác vụ liên quan đến phân loại, hồi quy, phân cụm, giảm chiều…

official

data-visualization

langchain-ai

Sử dụng để tạo biểu đồ chất lượng xuất bản và tóm tắt phân tích nhiều bảng. Kích hoạt khi nhiệm vụ liên quan đến trực quan hóa dữ liệu, vẽ kết quả, tạo…

official