whisper-windows-mcp
Local GPU-accelerated audio/video transcription for Claude Desktop on Windows, using whisper.cpp with AMD Vulkan support, background batch processing, and subtitle generation.
whisper-windows-mcp
A Windows-native MCP (Model Context Protocol) server that lets Claude Desktop transcribe audio and video files locally using whisper.cpp — with GPU acceleration, multilingual support, and batch processing. No internet connection required. No audio ever leaves your machine.
Why does this exist? The popular
whisper-mcppackage was built for macOS and assumes a Unix environment. It does not work on Windows. This package was written specifically for Windows users who want local AI transcription integrated with Claude Desktop.
What you can do with it
Once installed, you can say things like this directly in Claude Desktop:
- "Transcribe C:\Users\Me\Downloads\meeting.mp3"
- "Transcribe this folder of recordings and save each as a text file"
- "Generate Japanese and English subtitles for this video"
- "Start a batch transcription of everything in this folder"
- "How long will it take to transcribe these files?"
- "Check if GPU acceleration is working"
Requirements
- Node.js 18 or later — nodejs.org
- whisper.cpp binaries with Vulkan GPU support — see Step 1
- A Whisper model file — see Step 2
- FFmpeg — required for video files and non-WAV/MP3 audio
Step 1 — Install whisper.cpp binaries
Option A — Pre-built Vulkan release (recommended)
Download whisper-vulkan-win-x64.zip from the releases page.
This is a custom-compiled build with Vulkan GPU acceleration enabled. Works with AMD, NVIDIA, and Intel GPUs — no vendor-specific SDK required.
Extract to C:\whisper\Release\. You should end up with:
C:\whisper\Release\whisper-cli.exe
C:\whisper\Release\ggml-vulkan.dll
C:\whisper\Release\ggml.dll
C:\whisper\Release\ggml-base.dll
C:\whisper\Release\ggml-cpu.dll
C:\whisper\Release\whisper.dll
GPU acceleration is automatic — no additional configuration needed.
Option B — Build from source
Requires: Git, CMake, Visual Studio Build Tools 2022+ with "Desktop development with C++", Vulkan SDK from lunarg.com.
git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release --target whisper-cli
Copy the binaries from build\bin\Release\ to C:\whisper\Release\.
Note: The official whisper.cpp Windows releases on GitHub do not include a Vulkan build. You must use the pre-built release above or compile from source with
-DGGML_VULKAN=ON.
Step 2 — Download a Whisper model
| Model | Size | Speed | Accuracy | Best for |
|---|---|---|---|---|
ggml-tiny.en.bin | 75 MB | Very fast | Basic | Quick tests |
ggml-base.en.bin | 142 MB | Fast | Good | Everyday English |
ggml-small.en.bin | 466 MB | Moderate | Better | Important recordings |
ggml-medium.en.bin | 1.5 GB | Fast on GPU | Very good | Best quality English |
ggml-large-v3.bin | 2.9 GB | Fast on GPU | Excellent | Multilingual, best accuracy |
For English-only use: base.en or medium.en are the best starting points.
For multilingual use (auto-detect, foreign language, translation): large-v3 is required. English-only models (*.en.bin) output [FOREIGN] on non-English audio and cannot be used for other languages.
Download from Hugging Face:
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.en.bin
Save to C:\whisper\models\.
Step 3 — Install FFmpeg
FFmpeg is required for video files and non-native audio formats.
Install via winget:
winget install ffmpeg
Or download from ffmpeg.org and add to your PATH.
Verify:
ffmpeg -version
Step 4 — Install this MCP server
npm install -g whisper-windows-mcp
Step 5 — Configure Claude Desktop
Open Claude Desktop → Settings → Developer → Edit Config.
Add the whisper entry:
{
"mcpServers": {
"whisper": {
"command": "npx",
"args": ["-y", "whisper-windows-mcp"],
"env": {
"WHISPER_CLI_PATH": "C:\\whisper\\Release\\whisper-cli.exe",
"WHISPER_MODEL": "C:\\whisper\\models\\ggml-medium.en.bin"
}
}
}
}
Config file location: C:\Users\YourName\AppData\Roaming\Claude\claude_desktop_config.json
Use double backslashes in all paths.
Save and fully restart Claude Desktop. You should see whisper listed with a green running badge in Settings → Developer.
Step 6 — Verify your setup
In Claude Desktop, ask:
"Check your whisper config"
Then:
"Check your system hardware"
This confirms your GPU is detected and Vulkan acceleration is active.
Available tools
transcribe_audio
Transcribe a single file. Supports blocking (default) or background mode for long files.
| Parameter | Description |
|---|---|
file_path | Absolute path to the file (required) |
language | Language code (en, ja, es, etc.) or auto to detect. Default: en |
output_format | text (default), timestamps, json, or srt |
save_to_file | Save transcript as .txt next to the source file |
background | Run as detached job — returns a job ID immediately. Use check_progress to monitor. Recommended for files over 10 minutes. |
threads | CPU thread override |
check_progress
Monitor a background transcription job started with transcribe_audio (background=true).
Returns elapsed time, last processed timestamp, percentage, and the full transcript when complete.
| Parameter | Description |
|---|---|
job_id | Job ID returned by transcribe_audio |
start_batch
Automated sequential batch transcription of all untranscribed files in a folder. Sorts by duration (shortest first), processes one at a time as background jobs, validates each output.
| Parameter | Description |
|---|---|
folder_path | Path to folder (required) |
language | Language code. Default: en |
threads | CPU thread override |
check_batch_progress
Monitor a running batch. Automatically advances to the next file when the current one finishes. Returns overall progress, current file with timestamp, ETA, and any failed files.
| Parameter | Description |
|---|---|
batch_id | Batch ID returned by start_batch |
transcribe_batch (interactive)
Process files one at a time with a preview and confirmation before each. Useful when you want to review as you go.
| Parameter | Description |
|---|---|
folder_path | Path to folder (required) |
file_index | Which file to process (1-based). Omit to list files first. |
language | Language code. Default: en |
recursive | Include subfolders |
generate_subtitles
Generate SRT subtitle files. Supports automatic language detection and English translation output.
| Parameter | Description |
|---|---|
file_path | Path to file (required) |
language | Language code or auto to detect. Default: en |
translate_to_english | Also generate an English translation .en.srt. Only applies when source is not English. |
threads | CPU thread override |
When both native and translation are requested, two files are saved next to the source:
filename.ja.srt— original languagefilename.en.srt— English translation
Whisper's built-in translation only translates to English. For other target languages, translate the .srt file contents separately.
analyze_media
Analyze files before committing to transcription. Returns duration, size, codec, and estimated transcription time on CPU and GPU. For folders, shows all files in a sortable table with transcription status.
| Parameter | Description |
|---|---|
path | Path to a single file or folder (required) |
sort_by | For folders: duration (default), name, or size |
check_config
Verify whisper-cli.exe, the model file, and FFmpeg are all accessible. Run this first if anything is failing.
check_system
Detect GPU hardware and verify Vulkan acceleration is available. Reports GPU name, VRAM, whether ggml-vulkan.dll is present, and recommends the best model size for your hardware.
Supported formats
| Type | Formats |
|---|---|
| Native (no conversion) | mp3, wav |
| Video (auto-converted via FFmpeg) | mp4, mkv, avi, mov, webm, flv, wmv, m4v, ts, 3gp |
| Audio (auto-converted via FFmpeg) | m4a, ogg, flac |
GPU acceleration
The pre-built Vulkan release enables GPU acceleration automatically. Tested on AMD Radeon RX Vega 56 (GCN 5th gen). Any GPU with Vulkan 1.0+ support should work, including NVIDIA and Intel Arc.
Performance comparison (medium.en model, ~5 minute audio file):
| Hardware | Time |
|---|---|
| CPU only (Ryzen 7 2700x, 8 threads) | 8–12 minutes |
| GPU (Vega 56 via Vulkan) | 20–40 seconds |
GPU utilization during transcription is typically 15–20%, dropping back to idle between files. CPU stays around 15%.
Multilingual support
Whisper can auto-detect the spoken language and transcribe in that language. The built-in translation model translates to English only.
For best multilingual accuracy, use the large-v3 model. English-specific models (*.en.bin) cannot detect or transcribe other languages.
Example — foreign language video with subtitles:
- Ask Claude to generate subtitles with
language=autoandtranslate_to_english=true - Whisper detects the language and generates a native-language SRT
- A second pass generates an English translation SRT
- Load either file in VLC via Subtitle → Add Subtitle File
Designed for free-tier users
This tool is built to minimize Claude API interactions. The entire transcription workflow — scan, analyze, queue, run, validate — is designed to require as few Claude interactions as possible. Heavy lifting is done locally on your machine.
Optional environment variables
| Variable | Description |
|---|---|
WHISPER_CLI_PATH | Path to whisper-cli.exe (required) |
WHISPER_MODEL | Path to model .bin file (required) |
WHISPER_THREADS | CPU thread count override |
FFMPEG_PATH | Path to ffmpeg if not in system PATH |
Troubleshooting
See TROUBLESHOOTING.md for detailed solutions.
Quick checklist:
- Paths in config use double backslashes (
C:\\whisper\\...) whisper-cli.exeexists at the configured path- Model
.binfile exists at the configured path - FFmpeg is installed and in PATH (
ffmpeg -versionworks) - Claude Desktop was fully restarted after editing config
- Whisper shows running in Settings → Developer
License
MIT
Contributing
Pull requests welcome. See ROADMAP.md for planned features.
If you've tested GPU acceleration on hardware not listed above, please open an issue with your results — GPU model, VRAM, model size, and observed throughput.
Related Servers
Kone.vc
sponsorMonetize your AI agent with contextual product recommendations
MCP-PDF2MD
A high-performance PDF to Markdown conversion service powered by MinerU API, supporting batch processing for local files and URLs.
Obsidian MCP Server
Interact with Obsidian vaults using the Local REST API plugin.
GetOutPost MCP Server
Access real-time Indian options market data and volatility analytics. Analyze IV, RV, VRP, and skew with automated token management and percentile-based filtering tools.
Webflow MCP Server
Apify-hosted MCP server for Webflow with 22+ tools. Sites, CMS collections, pages, content management, and publishing. No local setup needed.
MCP Hub
A manager server for MCP servers that handles process management and tool routing.
MCP Easy Copy
Discover and copy available MCP services in Claude Desktop.
Longbridge
US/HK markets — 110 tools: real-time quotes, options, orders, fundamentals, alerts, DCA & portfolio
Excel
Excel manipulation including data reading/writing, worksheet management, formatting, charts, and pivot table
MCP Invoice Parser
Parses invoice data, uploads it to Google Sheets, and answers queries by fetching information from the sheet.
Longhand
Persistent local memory for Claude Code, Zero API calls, zero summaries, zero AI deciding what matters.