prolog-reasoner
SWI-Prolog execution for LLMs with CLP(FD) and recursion — boosts logic/constraint accuracy from 73% to 90% on a 30-problem benchmark.
prolog-reasoner
SWI-Prolog as a "logic calculator" for LLMs — available as an MCP server and a Python library. Eliminate the black box from LLM logical reasoning.
LLMs excel at natural language but struggle with formal logic. Prolog excels at logical reasoning but can't process natural language. prolog-reasoner bridges this gap by exposing SWI-Prolog execution to LLMs.
Does it help?
On the built-in 30-problem logic benchmark:
| Pipeline | Accuracy |
|---|---|
LLM-only (claude-sonnet-4-6) | 22/30 (73.3%) |
| LLM + prolog-reasoner | 27/30 (90.0%) |
The gap concentrates in constraint satisfaction and multi-step reasoning — the combinatorial territory LLMs are weak on and Prolog is strong on. Full breakdown below.
Why it works
LLMs pattern-match; Prolog actually searches and solves. When the LLM writes its problem down as Prolog, two things happen at once:
- Prolog handles the combinatorial work LLMs are weak on — constraint satisfaction, multi-step inference, exhaustive search.
- The reasoning exists as code you can read, re-run, and debug. When it goes wrong, you see the exact Prolog that failed and why.
Two ways to use it
- MCP server — Claude (or any MCP client) calls it as a logic solver during conversation. Rule bases let the LLM save stable domain rules once and reference them by name per call.
- Python library — full NL→Prolog pipeline with self-correction. Requires OpenAI or Anthropic.
Features
- MCP tools:
execute_prologfor arbitrary SWI-Prolog execution, pluslist_rule_bases/get_rule_base/save_rule_base/delete_rule_basefor reusable named rule bases (v14) - Rule bases: save stable Prolog rules once (e.g. chess move rules, legal axioms) and reference them by name from
execute_prologso the LLM only writes the situation-specific facts per call - Transparent intermediate representation: the Prolog code is the audit trail — inspect, modify, or verify before execution
- CLP(FD) support: constraint logic programming for scheduling and optimization
- Negation-as-failure, recursion, all standard SWI-Prolog features
- Library mode: NL→Prolog translation with self-correction loop (OpenAI / Anthropic)
Requirements
- Python ≥ 3.10
- SWI-Prolog installed and on PATH (≥ 9.0)
- API key for OpenAI or Anthropic — only for library mode, not for the MCP server
Installation
# MCP server only (no LLM dependencies)
pip install prolog-reasoner
# Library with OpenAI
pip install prolog-reasoner[openai]
# Library with Anthropic
pip install prolog-reasoner[anthropic]
# Both providers
pip install prolog-reasoner[all]
MCP Server Setup
The MCP server exposes five tools — execute_prolog runs Prolog code written by the connected LLM, and four rule-base tools manage named, reusable Prolog modules. It does not call any external LLM API, so no API key is required.
Claude Desktop / Claude Code
{
"mcpServers": {
"prolog-reasoner": {
"command": "uvx",
"args": ["prolog-reasoner"]
}
}
}
Or, if prolog-reasoner is installed directly:
{
"mcpServers": {
"prolog-reasoner": {
"command": "prolog-reasoner"
}
}
}
Docker (SWI-Prolog bundled)
Use Docker if you don't want to install SWI-Prolog locally:
docker build -f docker/Dockerfile -t prolog-reasoner .
{
"mcpServers": {
"prolog-reasoner": {
"command": "docker",
"args": ["run", "-i", "--rm", "prolog-reasoner"]
}
}
}
Tool reference
execute_prolog(prolog_code, query, rule_bases=None, max_results=100, trace=False)
prolog_code— Prolog facts and rules (string)query— Prolog query to run, e.g."mortal(X)"(string)rule_bases— optional list of saved rule base names to prepend toprolog_code(in order). Use this to reuse stable domain rules across calls without re-sending themmax_results— cap the number of solutions returned (default 100)trace— whenTrue, attach a structured proof tree per solution tometadata.proof_trace. Opt-in sub-feature; has performance overhead and does not support CLP(FD), higher-order predicates, or assert/retract.
Returns a JSON object with success, output, query, error, and metadata.
On success, metadata includes execution_time_ms, result_count, truncated, and rule_bases_used. When rule bases were requested, rule_base_load_ms is also attached (disk I/O timing). On failure, metadata also includes error_category (one of syntax_error, undefined_predicate, unbound_variable, type_error, domain_error, evaluation_error, permission_error, timeout, trace_mechanism_error, unknown) and error_explanation — a natural-language hint for the connected LLM (or human) to decide how to fix the Prolog code.
Rule base tools — manage named, reusable Prolog modules under PROLOG_REASONER_RULES_DIR (defaults to ~/.prolog-reasoner/rules/). Names are restricted to [a-z0-9_-], length 1–64.
save_rule_base(name, content)— write or overwrite a rule base. Content is syntax-validated (parse-only) before the write; failures surface asRULEBASE_003. Returns{"success": true, "name": ..., "created": bool}wherecreatedistrueon first write,falseon overwrite. Files overmax_rule_sizeare rejected withRULEBASE_005.list_rule_bases()— return all saved rule bases withname,description, andtags. Metadata is extracted from leading% description:/% tags:comments in each file.get_rule_base(name)— return the raw Prolog source of a saved rule base.delete_rule_base(name)— remove a saved rule base.
For name/size/existence errors, the tools return {"success": false, "error": "...", "error_code": "RULEBASE_001"|"RULEBASE_002"|"RULEBASE_003"|"RULEBASE_005"} rather than raising. I/O failures (RULEBASE_004) are propagated as infrastructure errors.
Rule base conventions — start each rule base file with leading comments that double as list_rule_bases metadata:
% description: Chess piece movement rules
% tags: chess, games
piece_move(knight, (X1,Y1), (X2,Y2)) :- ...
Then reference from execute_prolog:
{
"rule_bases": ["chess_moves"],
"prolog_code": "position(knight, (4,4)).",
"query": "piece_move(knight, (4,4), Target)"
}
Rule bases also serve as the foundation for domain-specialized forks: ship a curated set (legal axioms, game rules, tax scenarios, etc.) bundled via BUNDLED_RULES_DIR as a ready-to-use reasoning package.
Library Usage
The library exposes PrologExecutor (Prolog-only, no LLM) and PrologReasoner (NL→Prolog pipeline, needs an LLM API key).
Execute Prolog directly (no LLM)
import asyncio
from prolog_reasoner.config import Settings
from prolog_reasoner.executor import PrologExecutor
async def main():
settings = Settings() # no API key needed
executor = PrologExecutor(settings)
result = await executor.execute(
prolog_code="human(socrates). mortal(X) :- human(X).",
query="mortal(X)",
)
print(result.output) # mortal(socrates)
asyncio.run(main())
Full NL→Prolog pipeline (requires LLM API key)
import asyncio
from prolog_reasoner import PrologReasoner, TranslationRequest, ExecutionRequest
from prolog_reasoner.config import Settings
from prolog_reasoner.executor import PrologExecutor
from prolog_reasoner.translator import PrologTranslator
from prolog_reasoner.llm_client import LLMClient
async def main():
settings = Settings(llm_api_key="sk-...") # from env or explicit
llm = LLMClient(
provider=settings.llm_provider,
api_key=settings.llm_api_key,
model=settings.llm_model,
timeout_seconds=settings.llm_timeout_seconds,
)
reasoner = PrologReasoner(
translator=PrologTranslator(llm, settings),
executor=PrologExecutor(settings),
)
translation = await reasoner.translate(
TranslationRequest(query="Socrates is human. All humans are mortal. Is Socrates mortal?")
)
print(translation.prolog_code)
result = await reasoner.execute(
ExecutionRequest(prolog_code=translation.prolog_code, query=translation.suggested_query)
)
print(result.output)
asyncio.run(main())
Configuration
All settings via environment variables (prefix PROLOG_REASONER_):
| Variable | Default | Required for |
|---|---|---|
LLM_PROVIDER | openai | library (openai or anthropic) |
LLM_API_KEY | "" | library only — leave unset for MCP |
LLM_MODEL | gpt-5.4-mini | library |
LLM_TEMPERATURE | 0.0 | library |
LLM_TIMEOUT_SECONDS | 30.0 | library |
SWIPL_PATH | swipl | both |
EXECUTION_TIMEOUT_SECONDS | 10.0 | both |
RULES_DIR | ~/.prolog-reasoner/rules | both (where user-saved rule bases live) |
BUNDLED_RULES_DIR | unset | both (optional — synced into RULES_DIR on first startup for shipping default rules with a fork) |
MAX_RULE_SIZE | 1048576 (1 MiB) | both (per-file save cap; save_rule_base rejects larger content with RULEBASE_005) |
MAX_RULE_PROMPT_BYTES | 65536 (64 KiB) | library only (total budget for the "Available rule bases" prompt section; truncated with a marker when exceeded) |
LOG_LEVEL | INFO | both |
Benchmark
benchmarks/ contains 30 logic problems across 5 categories (deduction, transitive, constraint, contradiction, multi-step) to compare LLM-only reasoning vs LLM+Prolog reasoning. The benchmark exercises the library path (translator + executor), since it requires the NL→Prolog step.
Results
Measured on anthropic/claude-sonnet-4-6, single run over 30 problems:
| Pipeline | Accuracy | Avg latency |
|---|---|---|
| LLM-only | 22/30 (73.3%) | 1.7s |
| LLM + Prolog | 27/30 (90.0%) | 3.8s |
Per-category breakdown:
| Category | LLM-only | LLM + Prolog |
|---|---|---|
| deduction | 6/6 | 6/6 |
| transitive | 6/6 | 5/6 |
| constraint | 3/7 | 6/7 |
| contradiction | 4/4 | 3/4 |
| multi-step | 3/7 | 7/7 |
The gap is concentrated in constraint (SEND+MORE, 6-queens, knapsack, K4 coloring, Einstein-lite) and multi-step (Nim game theory, 3-person knights-and-knaves, TSP-4, zebra puzzle) — exactly the combinatorial/search-heavy territory where symbolic solvers outperform pattern completion. On purely deductive or transitive questions the LLM is already strong and Prolog adds latency without accuracy gains.
All 3 LLM+Prolog failures were Prolog execution errors from malformed LLM-generated code (missing predicate definitions, unbound CLP(FD) variables) rather than reasoning errors — addressable via prompt tuning. Notably, every failure is inspectable: you can see the exact Prolog that failed and why, rather than a wrong natural-language answer with no explanation.
Running it yourself
docker run --rm -e PROLOG_REASONER_LLM_API_KEY=sk-... \
prolog-reasoner-dev python benchmarks/run_benchmark.py
Results are saved to benchmarks/results.json.
Comparison with other Prolog MCPs
Several Prolog MCP servers exist, each with different design choices. prolog-reasoner is intentionally stateless and spot-use — Prolog is a calculator you call when logic matters, not the backbone of your agent's memory.
| prolog-reasoner | Stateful Prolog MCPs | |
|---|---|---|
| Prolog's role | Per-call reasoning tool | Project-wide knowledge base |
| State | Stateless execution (each call independent); optional named rule bases for reusable static rules, no inter-call session memory | Persistent sessions / layered KBs |
| Reproducibility | Same input (incl. same rule bases) → same output, always | Depends on accumulated state |
| Integration effort | Use where logic matters, skip where it doesn't | Architectural commitment |
| A/B testable vs LLM-only | Yes (each call is a controlled experiment) | Structurally not comparable |
This is also why accuracy benchmarks are published here and not elsewhere: statelessness is what makes a side-by-side comparison possible.
If you need persistent agent memory, hallucination-safeguarded fact storage, or a full neuro-symbolic substrate, other projects may fit better:
- adamrybinski/prolog-mcp — Trealla WASM with save/load sessions
- umuro/prolog-mcp — layered KB with file-backed persistence
- vpursuit/model-context-lab — SWI-Prolog with security sandboxing
- dr3d/prolog-reasoning — neuro-symbolic memory with write-path safety
We're the spot-use option.
Development
# Build dev image
docker build -f docker/Dockerfile -t prolog-reasoner-dev .
# Run tests (no API key needed — LLM calls are mocked)
docker run --rm prolog-reasoner-dev
# With coverage
docker run --rm prolog-reasoner-dev pytest tests/ -v --cov=prolog_reasoner
# Or via docker compose
docker compose -f docker/docker-compose.yml run --rm test
License
MIT
Related Servers
Scout Monitoring MCP
sponsorPut performance and error data directly in the hands of your AI assistant.
Alpha Vantage MCP Server
sponsorAccess financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
Code Council
Your AI Code Review Council - Get diverse perspectives from multiple AI models in parallel.
MCP Test Utils
Desktop UI automation for AI agents: screenshots, window management, mouse, keyboard, UI Automation tree, OCR
Unison MCP Server
An MCP server for the Unison language, allowing AI assistants to interact with the Unison Codebase Manager (UCM).
Sequential Thinking Multi-Agent System (MAS)
An MCP agent that utilizes a Multi-Agent System (MAS) for sequential thinking and problem-solving.
Stata-MCP
Perform regression analysis using Stata with the help of an LLM. Requires a local Stata installation and an external LLM API key.
Chrome Debug MCP Server
Automate your browser by connecting to Chrome's debugging port, preserving your login state.
Remote MCP Server (Authless)
An authentication-free, remote MCP server deployable on Cloudflare Workers. Customize tools directly in the source code and deploy via Cloudflare or locally.
MCP System Monitor Server
A cross-platform server for real-time monitoring of CPU, GPU, memory, disk, network, and process information.
Seiro MCP
Seiro MCP is an MCP server and Skills that enables autonomous build workflows for visionOS (Swift) apps using Codex CLI / App.
Flutter Package MCP Server
A Model Context Protocol (MCP) server for Flutter packages, designed to integrate with AI assistants like Claude.