azure-speech-to-text-rest-py
par microsoft
API REST simple pour la transcription parole-texte de fichiers audio courts (jusqu'à 60 secondes). Aucun SDK requis - seulement des requêtes HTTP.
npx skills add https://github.com/microsoft/skills --skill azure-speech-to-text-rest-pyAzure Speech to Text REST API for Short Audio
Simple REST API for speech-to-text transcription of short audio files (up to 60 seconds). No SDK required - just HTTP requests.
Prerequisites
- Azure subscription - Create one free
- Speech resource - Create in Azure Portal
- Get credentials - After deployment, go to resource > Keys and Endpoint
Environment Variables
# Required
AZURE_SPEECH_KEY=<your-speech-resource-key>
AZURE_SPEECH_REGION=<region> # e.g., eastus, westus2, westeurope
# Alternative: Use endpoint directly
AZURE_SPEECH_ENDPOINT=https://<region>.stt.speech.microsoft.com
Installation
pip install requests
Quick Start
import os
import requests
def transcribe_audio(audio_file_path: str, language: str = "en-US") -> dict:
"""Transcribe short audio file (max 60 seconds) using REST API."""
region = os.environ["AZURE_SPEECH_REGION"]
api_key = os.environ["AZURE_SPEECH_KEY"]
url = f"https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1"
headers = {
"Ocp-Apim-Subscription-Key": api_key,
"Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000",
"Accept": "application/json"
}
params = {
"language": language,
"format": "detailed" # or "simple"
}
with open(audio_file_path, "rb") as audio_file:
response = requests.post(url, headers=headers, params=params, data=audio_file)
response.raise_for_status()
return response.json()
# Usage
result = transcribe_audio("audio.wav", "en-US")
print(result["DisplayText"])
Audio Requirements
| Format | Codec | Sample Rate | Notes |
|---|---|---|---|
| WAV | PCM | 16 kHz, mono | Recommended |
| OGG | OPUS | 16 kHz, mono | Smaller file size |
Limitations:
- Maximum 60 seconds of audio
- For pronunciation assessment: maximum 30 seconds
- No partial/interim results (final only)
Content-Type Headers
# WAV PCM 16kHz
"Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000"
# OGG OPUS
"Content-Type": "audio/ogg; codecs=opus"
Response Formats
Simple Format (default)
params = {"language": "en-US", "format": "simple"}
{
"RecognitionStatus": "Success",
"DisplayText": "Remind me to buy 5 pencils.",
"Offset": "1236645672289",
"Duration": "1236645672289"
}
Detailed Format
params = {"language": "en-US", "format": "detailed"}
{
"RecognitionStatus": "Success",
"Offset": "1236645672289",
"Duration": "1236645672289",
"NBest": [
{
"Confidence": 0.9052885,
"Display": "What's the weather like?",
"ITN": "what's the weather like",
"Lexical": "what's the weather like",
"MaskedITN": "what's the weather like"
}
]
}
Chunked Transfer (Recommended)
For lower latency, stream audio in chunks:
import os
import requests
def transcribe_chunked(audio_file_path: str, language: str = "en-US") -> dict:
"""Stream audio in chunks for lower latency."""
region = os.environ["AZURE_SPEECH_REGION"]
api_key = os.environ["AZURE_SPEECH_KEY"]
url = f"https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1"
headers = {
"Ocp-Apim-Subscription-Key": api_key,
"Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000",
"Accept": "application/json",
"Transfer-Encoding": "chunked",
"Expect": "100-continue"
}
params = {"language": language, "format": "detailed"}
def generate_chunks(file_path: str, chunk_size: int = 1024):
with open(file_path, "rb") as f:
while chunk := f.read(chunk_size):
yield chunk
response = requests.post(
url,
headers=headers,
params=params,
data=generate_chunks(audio_file_path)
)
response.raise_for_status()
return response.json()
Authentication Options
Option 1: Subscription Key (Simple)
headers = {
"Ocp-Apim-Subscription-Key": os.environ["AZURE_SPEECH_KEY"]
}
Option 2: Bearer Token
import requests
import os
def get_access_token() -> str:
"""Get access token from the token endpoint."""
region = os.environ["AZURE_SPEECH_REGION"]
api_key = os.environ["AZURE_SPEECH_KEY"]
token_url = f"https://{region}.api.cognitive.microsoft.com/sts/v1.0/issueToken"
response = requests.post(
token_url,
headers={
"Ocp-Apim-Subscription-Key": api_key,
"Content-Type": "application/x-www-form-urlencoded",
"Content-Length": "0"
}
)
response.raise_for_status()
return response.text
# Use token in requests (valid for 10 minutes)
token = get_access_token()
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000",
"Accept": "application/json"
}
Query Parameters
| Parameter | Required | Values | Description |
|---|---|---|---|
language | Yes | en-US, de-DE, etc. | Language of speech |
format | No | simple, detailed | Result format (default: simple) |
profanity | No | masked, removed, raw | Profanity handling (default: masked) |
Recognition Status Values
| Status | Description |
|---|---|
Success | Recognition succeeded |
NoMatch | Speech detected but no words matched |
InitialSilenceTimeout | Only silence detected |
BabbleTimeout | Only noise detected |
Error | Internal service error |
Profanity Handling
# Mask profanity with asterisks (default)
params = {"language": "en-US", "profanity": "masked"}
# Remove profanity entirely
params = {"language": "en-US", "profanity": "removed"}
# Include profanity as-is
params = {"language": "en-US", "profanity": "raw"}
Error Handling
import requests
def transcribe_with_error_handling(audio_path: str, language: str = "en-US") -> dict | None:
"""Transcribe with proper error handling."""
region = os.environ["AZURE_SPEECH_REGION"]
api_key = os.environ["AZURE_SPEECH_KEY"]
url = f"https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1"
try:
with open(audio_path, "rb") as audio_file:
response = requests.post(
url,
headers={
"Ocp-Apim-Subscription-Key": api_key,
"Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000",
"Accept": "application/json"
},
params={"language": language, "format": "detailed"},
data=audio_file
)
if response.status_code == 200:
result = response.json()
if result.get("RecognitionStatus") == "Success":
return result
else:
print(f"Recognition failed: {result.get('RecognitionStatus')}")
return None
elif response.status_code == 400:
print(f"Bad request: Check language code or audio format")
elif response.status_code == 401:
print(f"Unauthorized: Check API key or token")
elif response.status_code == 403:
print(f"Forbidden: Missing authorization header")
else:
print(f"Error {response.status_code}: {response.text}")
return None
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
return None
Async Version
import os
import aiohttp
import asyncio
async def transcribe_async(audio_file_path: str, language: str = "en-US") -> dict:
"""Async version using aiohttp."""
region = os.environ["AZURE_SPEECH_REGION"]
api_key = os.environ["AZURE_SPEECH_KEY"]
url = f"https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1"
headers = {
"Ocp-Apim-Subscription-Key": api_key,
"Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000",
"Accept": "application/json"
}
params = {"language": language, "format": "detailed"}
async with aiohttp.ClientSession() as session:
with open(audio_file_path, "rb") as f:
audio_data = f.read()
async with session.post(url, headers=headers, params=params, data=audio_data) as response:
response.raise_for_status()
return await response.json()
# Usage
result = asyncio.run(transcribe_async("audio.wav", "en-US"))
print(result["DisplayText"])
Supported Languages
Common language codes (see full list):
| Code | Language |
|---|---|
en-US | English (US) |
en-GB | English (UK) |
de-DE | German |
fr-FR | French |
es-ES | Spanish (Spain) |
es-MX | Spanish (Mexico) |
zh-CN | Chinese (Mandarin) |
ja-JP | Japanese |
ko-KR | Korean |
pt-BR | Portuguese (Brazil) |
Best Practices
- Pick sync OR async and stay consistent. Do not mix
azure.xxxsync clients withazure.xxx.aioasync clients in the same call path. Choose one mode per module. - Always use context managers for clients. Use
with httpx.Client(...) as client:(sync) orasync with httpx.AsyncClient(...) as client:(async) so connections are pooled and closed deterministically. - Use WAV PCM 16kHz mono for best compatibility
- Enable chunked transfer for lower latency
- Cache access tokens for 9 minutes (valid for 10)
- Specify the correct language for accurate recognition
- Use detailed format when you need confidence scores
- Handle all RecognitionStatus values in production code
When NOT to Use This API
Use the Speech SDK or Batch Transcription API instead when you need:
- Audio longer than 60 seconds
- Real-time streaming transcription
- Partial/interim results
- Speech translation
- Custom speech models
- Batch transcription of many files
Reference Files
| File | Contents |
|---|---|
| references/pronunciation-assessment.md | Pronunciation assessment parameters and scoring |
Plus de skills de microsoft
oss-growth
microsoft
Persona de growth hacker OSS
official
microsoft-foundry
microsoft
Déployer, évaluer et gérer les agents Foundry de bout en bout : build Docker, push ACR, création d’agent hébergé/par prompt, démarrage de conteneur, évaluation par lots, évaluation continue, workflows d’optimisation de prompt, agent.yaml, curation de jeux de données à partir de traces. UTILISER POUR : déployer un agent vers Foundry, agent hébergé, créer un agent, invoquer un agent, évaluer un agent, exécuter une évaluation par lots, évaluation continue, surveillance continue, statut d’évaluation continue, optimiser un prompt, améliorer un prompt, optimiseur de prompt, optimiser les instructions d’un agent, améliorer un agent...
officialdevelopmentdevops
azure-ai
microsoft
Utiliser pour Azure AI : Recherche, Parole, OpenAI, Intelligence documentaire. Aide pour la recherche, la recherche vectorielle/hybride, la reconnaissance vocale, la synthèse vocale, la transcription, l'OCR. QUAND : Recherche AI, recherche par requête, recherche vectorielle, recherche hybride, recherche sémantique, reconnaissance vocale, synthèse vocale, transcrire, OCR, convertir du texte en parole.
officialdevelopmentapi
azure-deploy
microsoft
Exécutez les déploiements Azure pour les applications DÉJÀ PRÉPARÉES disposant de fichiers .azure/deployment-plan.md et d'infrastructure existants. N'utilisez PAS cette compétence lorsque l'utilisateur demande de CRÉER une nouvelle application — utilisez plutôt azure-prepare. Cette compétence exécute les commandes azd up, azd deploy, terraform apply et az deployment avec une récupération d'erreur intégrée. Nécessite .azure/deployment-plan.md de azure-prepare et un état validé de azure-validate. QUAND : "exécuter azd up", "exécuter azd deploy", "exécuter le déploiement",...
officialdevopsaws
azure-storage
microsoft
Services Azure Storage incluant Blob Storage, File Shares, Queue Storage, Table Storage et Data Lake. Répond aux questions sur les niveaux d'accès au stockage (chaud, froid, froid, archive), quand utiliser chaque niveau et comparaison des niveaux. Fournit du stockage d'objets, des partages de fichiers SMB, de la messagerie asynchrone, du NoSQL clé-valeur et de l'analyse de big data. Inclut la gestion du cycle de vie. À UTILISER POUR : stockage blob, partages de fichiers, stockage de files d'attente, stockage de tables, data lake, téléchargement de fichiers, téléchargement de blobs, comptes de stockage, niveaux d'accès,...
officialdevelopmentdatabase
azure-diagnostics
microsoft
Déboguer les problèmes de production Azure à l'aide d'AppLens, Azure Monitor, l'état des ressources et un triage sécurisé. QUAND : déboguer des problèmes de production, résoudre les problèmes d'App Service, CPU élevé d'App Service, échec de déploiement d'App Service, résoudre les problèmes de Container Apps, résoudre les problèmes de Functions, résoudre les problèmes d'AKS, kubectl ne peut pas se connecter, échecs kube-system/CoreDNS, pod en attente, crashloop, nœud non prêt, échecs de mise à niveau, analyser les logs, KQL, insights, échecs de pull d'image, problèmes de démarrage à froid, échecs de sonde de santé,...
officialdevopsdevelopment
azure-prepare
microsoft
Préparer les applications Azure pour le déploiement (infra Bicep/Terraform, azure.yaml, Dockerfiles). Utiliser pour créer/moderniser ou créer+déployer ; pas pour la migration cross-cloud (utiliser azure-cloud-migrate). NE PAS UTILISER POUR : les applications copilot-sdk (utiliser azure-hosted-copilot-sdk). QUAND : "créer une application", "construire une application web", "créer une API", "créer une API HTTP serverless", "créer un frontend", "créer un backend", "construire un service", "moderniser une application", "mettre à jour une application", "ajouter une authentification", "ajouter un cache", "héberger sur Azure", "créer et...
officialdevelopmentdevops
azure-validate
microsoft
Validation pré-déploiement pour la préparation Azure. Effectuez des vérifications approfondies sur la configuration, l'infrastructure (Bicep ou Terraform), les attributions de rôles RBAC, les autorisations d'identité managée et les prérequis avant le déploiement. QUAND : valider mon application, vérifier l'état de préparation au déploiement, exécuter des contrôles préalables, vérifier la configuration, vérifier si prêt à déployer, valider azure.yaml, valider Bicep, tester avant le déploiement, résoudre les erreurs de déploiement, valider Azure Functions, valider l'application de fonction, valider serverless...
officialdevopstesting