toml-configby huggingface

How to write and use TOML configs in prime-rl. Use when creating config files, running commands with configs, or overriding config values via CLI.

npx skills add https://github.com/huggingface/prime-rl --skill toml-config

TOML Config

All prime-rl commands use pydantic_config (tyro-backed) with TOML configs and CLI overrides.

Running with configs

# Load a config file with @ syntax
uv run inference @ configs/debug/infer.toml
uv run sft @ configs/debug/sft/train.toml
uv run rl @ configs/debug/rl/train.toml

# CLI overrides (take precedence over TOML)
uv run inference @ config.toml --model.name Qwen/Qwen3-0.6B --server.port 8001

# Boolean flags: no value needed
uv run inference --model.enforce-eager          # sets to true
uv run inference --no-model.enforce-eager       # sets to false

# CLI-only (no TOML file)
uv run inference --model.name Qwen/Qwen3-0.6B --model.max-model-len 2048

# Compose multiple config files (later files override earlier ones)
uv run rl @ examples/reverse_text/rl.toml @ examples/reverse_text/slurm_rl.toml

# Nested config files: load a config for a specific section
uv run rl --model @ model.toml --data @ data.toml

TOML structure

Top-level fields must come before any [section] header — this is a TOML rule.

# Top-level fields first
gpu_memory_utilization = 0.5
seed = 42

# Then sections
[model]
name = "Qwen/Qwen3-0.6B"
max_model_len = 4096

[server]
port = 8000

Putting a top-level field after a section header nests it inside that section, which causes validation errors.

Setting None

Use the string "None" in TOML to set a field to None:

max_model_len = "None"

SLURM mode

Both rl and sft commands support SLURM execution via an optional [slurm] section. When present, the run is submitted as a SLURM job instead of running locally.

SLURM configs are composed with the base config via CLI:

uv run rl @ examples/reverse_text/rl.toml @ examples/reverse_text/slurm_rl.toml

RL SLURM

output_dir = "/shared/experiments/my-run"

[deployment]
type = "multi_node"
num_train_nodes = 2
num_infer_nodes = 1
gpus_per_node = 8
# nodes_per_fsdp_group = 1

[slurm]
job_name = "my-rl-job"
# dry_run = true          # generate script without submitting
# template_path = "path/to/custom.sh.j2"
# project_dir = "/path/to/project"

When [slurm] is set for RL:

  • output_dir must be explicitly set (the default outputs is rejected)
  • Teacher inference is not supported in multi-node deployment

SFT SLURM

output_dir = "/shared/experiments/my-sft-run"

[deployment]
type = "multi_node"
num_nodes = 2
gpus_per_node = 8
# nodes_per_fsdp_group = 1

[slurm]
job_name = "my-sft-job"
# dry_run = true
# template_path = "path/to/custom.sh.j2"
# project_dir = "/path/to/project"

SFT deployment follows the same pattern as RL:

  • [deployment] configures node/GPU allocation (single_node default or multi_node)
  • [slurm] configures SLURM submission (job name, partition, template)
  • output_dir must be explicitly set when using SLURM
  • Multi-node deployment requires [slurm] to be set

Available commands

All accept @ config.toml and CLI overrides:

CommandConfig classDescription
uv run rlfull RL pipelineOrchestrator + inference + trainer (local or SLURM)
uv run inferenceInferenceConfigvLLM inference server
uv run trainertrainer configRL trainer
uv run orchestratororchestrator configRollout orchestrator
uv run env-serverenv server configEnvironment server
uv run sftSFT configSupervised fine-tuning (local or SLURM)

Key files

  • src/prime_rl/utils/config.pyBaseConfig, cli, get_all_fields
  • src/prime_rl/entrypoints/rl.py — unified RL entrypoint (local + SLURM)
  • src/prime_rl/configs/rl.pyRLConfig, SlurmConfig, DeploymentConfig
  • src/prime_rl/entrypoints/sft.py — unified SFT entrypoint (local + SLURM)
  • src/prime_rl/configs/sft.pySFTConfig
  • configs/ — all config files, organized by task

More skills from huggingface

Hugging Face Cli
by huggingface
Execute Hugging Face Hub operations using the `hf` CLI. Use when the user needs to download models/datasets/spaces, upload files to Hub repositories, create repos, manage local cache, or run compute jobs on HF infrastructure. Covers authentication, file transfers, repository creation, cache operations, and cloud compute.
Hugging Face Datasets
by huggingface
Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.
Hugging Face Evaluation
by huggingface
Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.
Hugging Face Jobs
by huggingface
Run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with tokens, secrets management, timeout configuration, and result persistence. Designed for general-purpose compute workloads including data processing, inference, experiments, batch jobs, and any Python-based tasks.
Hugging Face Model Trainer
by huggingface
Train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on dataset preparation, hardware selection, cost estimation, and model persistence.
Hugging Face Paper Publisher
by huggingface
Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles.
Hugging Face Tool Builder
by huggingface
Build reusable scripts and tools using the Hugging Face API. Useful when chaining or combining API calls, or when tasks will be repeated/automated. Creates reusable command line scripts to fetch, enrich, or process data from Hugging Face Hub.
Hugging Face Trackio
by huggingface
Track and visualize ML training experiments with Trackio. Use when logging metrics during training (Python API) or retrieving/analyzing logged metrics (CLI). Supports real-time dashboard visualization, HF Space syncing, and JSON output for automation.