SACL MCP Server
A framework for bias-aware code retrieval using semantic-augmented reranking and localization.
SACL MCP Server
Semantic-Augmented Reranking and Localization for Code Retrieval
A Model Context Protocol (MCP) server that implements the SACL research framework to provide bias-aware code retrieval for AI coding assistants like Claude Code, Cursor, and other MCP-enabled tools.
๐ฏ Overview
SACL addresses the critical problem of textual bias in code retrieval systems. Traditional systems over-rely on surface-level features like docstrings, comments, and variable names, leading to biased results that favor well-documented code regardless of functional relevance.
Key Features
- ๐ง Bias Detection: Identifies over-reliance on textual features
- ๐ Semantic Augmentation: Enriches code understanding beyond surface text
- ๐ Intelligent Reranking: Prioritizes functional relevance over documentation
- ๐ฏ Code Localization: Pinpoints functionally relevant code segments
- ๐ Relationship Analysis: Maps code dependencies and relationships
- ๐จ Context-Aware Retrieval: Returns results with related components
- ๐ Agent-Controlled Updates: Explicit file updates for Docker compatibility
- ๐๏ธ Knowledge Graph: Persistent semantic storage with Graphiti/Neo4j
- ๐ง MCP Integration: Works with Claude Code, Cursor, and other AI tools
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ AI Assistant โโโโโโ SACL MCP Server โโโโโโ Graphiti/Neo4j โ
โ (Claude, Cursor)โ โ โ โ Knowledge Graph โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโ
โ SACL Framework โ
โ โ
โ โข Bias Detectionโ
โ โข Semantic Aug. โ
โ โข Reranking โ
โ โข Localization โ
โ โข Relationships โ
โ โข Context-Aware โ
โโโโโโโโโโโโโโโโโโโ
๐ Quick Start
Prerequisites
- Node.js 18+
- Neo4j database
- OpenAI API key
Installation
# Clone the repository
git clone <repository-url>
cd sacl
# Install dependencies
npm install
# Copy environment configuration
cp .env.example .env
# Edit .env with your settings
OPENAI_API_KEY=your_key_here
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password
Using Docker (Recommended)
# Start Neo4j and SACL server
docker-compose up -d
# Check logs
docker-compose logs -f sacl-mcp-server
Manual Setup
# Build the project
npm run build
# Start the server
npm start
๐ง Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY | OpenAI API key (required) | - |
SACL_REPO_PATH | Repository to analyze | Current directory |
SACL_NAMESPACE | Unique namespace | Auto-generated |
SACL_LLM_MODEL | LLM model for analysis | gpt-4 |
SACL_EMBEDDING_MODEL | Embedding model | text-embedding-3-small |
SACL_BIAS_THRESHOLD | Bias detection sensitivity (0-1) | 0.5 |
SACL_MAX_RESULTS | Maximum search results | 10 |
SACL_CACHE_ENABLED | Enable embedding cache | true |
NEO4J_URI | Neo4j connection URI | bolt://localhost:7687 |
NEO4J_USER | Neo4j username | neo4j |
NEO4J_PASSWORD | Neo4j password | password |
๐ฎ Usage
MCP Tools
The SACL server provides comprehensive MCP tools for bias-aware code analysis:
1. analyze_repository
Performs full SACL analysis of a repository:
{
"repositoryPath": "/path/to/repo",
"incremental": false
}
2. query_code
Bias-aware code search with optional context:
{
"query": "function that sorts arrays efficiently",
"repositoryPath": "/path/to/repo",
"maxResults": 10,
"includeContext": false // Set true for relationship context
}
3. query_code_with_context ๐
Enhanced search with relationship context and related components:
{
"query": "authentication middleware",
"repositoryPath": "/path/to/repo",
"maxResults": 10,
"includeRelated": true
}
4. update_file ๐
Explicitly update single file analysis when changes are made:
{
"filePath": "src/services/auth.js",
"changeType": "modified" // "created", "modified", or "deleted"
}
5. update_files ๐
Batch update multiple files:
{
"files": [
{ "filePath": "src/index.js", "changeType": "modified" },
{ "filePath": "src/utils/new.js", "changeType": "created" }
]
}
6. get_relationships ๐
Analyze code relationships and dependencies:
{
"filePath": "src/controllers/UserController.js",
"maxDepth": 3,
"relationshipTypes": ["imports", "calls", "extends"] // Optional filter
}
7. get_file_context ๐
Get comprehensive context for a file:
{
"filePath": "src/models/User.js",
"includeSnippets": true // Include code previews
}
8. get_bias_analysis
Detailed bias metrics and debugging:
{
"filePath": "src/utils/sort.js" // Optional
}
9. get_system_stats
System performance and statistics:
{}
MCP Client Configuration
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"sacl": {
"command": "node",
"args": ["/path/to/sacl/dist/index.js"],
"env": {
"OPENAI_API_KEY": "your-key",
"NEO4J_URI": "bolt://localhost:7687",
"NEO4J_USER": "neo4j",
"NEO4J_PASSWORD": "password"
}
}
}
}
Cursor IDE
Configure in your Cursor settings to connect to the SACL MCP server.
๐ SACL Framework
Stage 1: Bias Detection
Identifies three types of textual bias:
- Docstring Dependency: Over-reliance on documentation
- Identifier Name Bias: Focusing on variable/function names
- Comment Over-reliance: Prioritizing commented code
Stage 2: Semantic Augmentation
Enriches code representations with:
- Functional Signatures: What the code actually does
- Behavior Patterns: Computational patterns (iteration, recursion, etc.)
- Structural Features: Complexity metrics, AST analysis
- Augmented Embeddings: Bias-adjusted semantic vectors
Stage 3: Reranking & Localization
- Bias-Aware Ranking: Reduces textual weight based on bias score
- Code Localization: Identifies functionally relevant segments
- Semantic Similarity: Uses augmented embeddings
- Functional Relevance: Considers computational patterns
Stage 4: Relationship Analysis ๐
Maps code relationships and dependencies:
- Import/Export Analysis: Module dependencies and exports
- Function Call Mapping: Call graphs and method invocations
- Class Inheritance: Extends/implements relationships
- Dependency Tracking: External and internal dependencies
- Context-Aware Results: Related components with each query result
๐งช Example Workflow
-
Repository Analysis:
AI Assistant โ analyze_repository โ SACL processes all files โ Knowledge graph populated -
Code Query with Context:
AI Assistant โ query_code_with_context("authentication") โ SACL retrieval โ Context-aware results -
File Updates:
AI modifies code โ update_file("src/auth.js", "modified") โ SACL re-analyzes โ Relationships updated -
Relationship Exploration:
AI Assistant โ get_relationships("UserController.js") โ Dependency graph โ Related components -
Results Include:
- Original textual similarity score
- Semantic similarity score
- Bias-adjusted final score
- Localized code regions
- Related components and dependencies
- Context explanation with relationship importance
- Explanation of ranking decisions
๐ Performance
Based on SACL research benchmarks:
- 12.8% improvement in Recall@1 on HumanEval
- 9.4% improvement on MBPP
- 7.0% improvement on SWE-Bench-Lite
- P95 latency: <300ms for retrieval operations
๐ Bias Analysis Example
๐ง SACL Bias Analysis
File: src/algorithms/quicksort.js
Bias Metrics:
โข Overall Bias Score: 73.2% ๐ด
โข Semantic Pattern: Recursive divide-and-conquer sorting
โข Functional Signature: Array input โ sorted array output
Bias Indicators:
โข docstring_dependency: High docstring dependency (15.3% of code)
โข identifier_name_bias: High reliance on descriptive names
โข comment_over_reliance: Excessive comments (18.7% of code)
๐ก Improvement Suggestions:
โข Reduce reliance on variable naming for semantic understanding
โข Focus on structural patterns over comments
โข Improve functional signature extraction
๐ ๏ธ Development
Project Structure
src/
โโโ core/ # SACL framework implementation
โ โโโ BiasDetector.ts # Textual bias detection
โ โโโ SemanticAugmenter.ts # Semantic enhancement
โ โโโ SACLReranker.ts # Reranking and localization with context
โ โโโ SACLProcessor.ts # Main orchestrator with relationship support
โโโ mcp/ # MCP server implementation
โ โโโ SACLMCPServer.ts # MCP protocol handlers (9 tools)
โโโ graphiti/ # Knowledge graph integration
โ โโโ GraphitiClient.ts # Graphiti/Neo4j interface with relationships
โโโ utils/ # Utility modules
โ โโโ CodeAnalyzer.ts # AST analysis and relationship extraction
โโโ types/ # TypeScript type definitions
โ โโโ index.ts # Core types and interfaces
โ โโโ relationships.ts # Relationship type definitions
โโโ index.ts # Application entry point
Building
npm run build # Build TypeScript
npm run dev # Development with auto-reload
npm run lint # Code linting
npm run format # Code formatting
npm test # Run tests
Contributing
- Fork the repository
- Create a feature branch
- Implement changes following SACL methodology
- Add tests for new functionality
- Submit a pull request
๐ Research Background
This implementation is based on the research paper:
"SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization"
- Authors: Dhruv Gupta, Gayathri Ganesh Lakshmy, Yiqing Xie
- arXiv: 2506.20081v2
Key Research Contributions
- Systematic Bias Detection: Identifies textual bias through feature masking
- Semantic Augmentation: Enhances code understanding beyond text
- Bias-Aware Ranking: Reduces surface-level feature dependency
- Localization: Pinpoints functionally relevant code regions
๐ Integration
Supported AI Tools
- Claude Code: Direct MCP integration
- Cursor: MCP server connection
- VS Code Extensions: Via MCP protocol
- Custom Tools: Any MCP-compatible client
Language Support
-
JavaScript/TypeScript: Full AST analysis with relationship extraction
- Import/export tracking
- Function call analysis
- Class inheritance detection
- Dynamic imports support
-
Python: Regex-based analysis
- Import statement parsing
- Class inheritance detection
- Function call patterns
-
Other Languages (Java, C++, C#, Go, Rust): Basic analysis
- Import/include statements
- Class declarations
- Function definitions
-
Extensible: Easy to add new language analyzers
๐ License
MIT License - see LICENSE file for details.
๐ Support
- Issues: GitHub Issues
- Documentation: See
/docsdirectory - Research Paper: arXiv:2506.20081v2
๐ฎ Future Enhancements
- Multi-language AST parsing for all supported languages
- Real-time Graphiti integration (currently uses mock methods)
- Semantic relationship detection beyond syntactic analysis
- Visual relationship graphs in MCP responses
- Custom bias threshold configuration per project
- Integration with Language Server Protocol (LSP)
- Advanced localization algorithms with machine learning
- Performance optimizations for large codebases (>10k files)
- Real-time bias notifications during code writing
- Custom relationship type definitions
SACL MCP Server - Bringing research-backed bias-aware code retrieval to AI coding assistants.
Related Servers
Harness
Access and interact with Harness platform data, including pipelines, repositories, logs, and artifact registries.
FAL FLUX.1 Kontext [Max]
A frontier image generation and editing model with advanced text rendering and contextual understanding, powered by the FAL AI API.
SpecBridge
Automatically generates MCP tools from OpenAPI specifications by scanning a folder for spec files. No configuration is needed and it supports authentication via environment variables.
Dify Workflow
A tool server for integrating Dify Workflows via the Model Context Protocol (MCP).
LogAI MCP Server
An MCP server for log analysis using the LogAI framework, with optional Grafana and GitHub integrations.
Zen MCP
An AI-powered server providing access to multiple models for code analysis, problem-solving, and collaborative development with guided workflows.
XTQuantAI
Integrates the xtquant quantitative trading platform with an AI assistant, enabling AI to access and operate quantitative trading data and functions.
Clojure MCP
An MCP server providing a complete toolset for Clojure development, requiring a running nREPL server.
CodeSeeker
Advanced code search and transformation powered by ugrep and ast-grep for modern development workflows.
Image Generator MCP Server
Generate placeholder images with specified dimensions and colors, and save them to a file path.